GCP Data Governance: Column Level Security Best Practices — Taxonomies, Data Class, Policies, and IAM Roles

Mahendran
7 min readJan 26, 2024

--

Data Governance and Access Control encompasses managing data and its authorized usage, ensuring accuracy, protecting sensitive information, and complying with regulations. This requires clear understanding, control through data governance, and enforced security measures.

Automated policy tag application for sensitive data classification can be a complex process, dependent on the specific cloud provider’s services and tools.

Policy tags in Google Data Catalog can be used to enforce access control on data resources, providing an automated way to handle sensitive data classification. To implement this,

  1. A structured taxonomy is created,
  2. Grouping policy tags to reflect specific data governance needed
  3. Enforcing Access Control

Fine-Grained Access Control and Column Level Restriction

BigQuery’s fine-grained access control and dynamic data masking capabilities are instrumental in securing sensitive table columns through policy tags. Policy tags, which support type-based data classification, provide a flexible framework for implementing data governance policies.

Column-level access control workflow

To restrict data access at the column level:

  1. Define a taxonomy and policy tags. Create and manage a taxonomy and policy tags for your data.
  2. Assign policy tags to your BigQuery columns. In BigQuery, use schema annotations to assign a policy tag to each column where you want to restrict access.
  3. Enforce access control on the taxonomy. Enforcing access control causes the access restrictions defined for all of the policy tags in the taxonomy to be applied.
  4. Manage access on the policy tags. Use Identity and Access Management (IAM) policies to restrict access to each policy tag. The policy is in effect for each column that belongs to the policy tag.

Key Best Practices for Policy Tags in BigQuery

To maximize the benefits of policy tags in BigQuery, consider the following best practices:

1. Build a Hierarchy of Data Classes: Construct a meaningful hierarchy of data classes aligned with your business needs.For example

2. Access Control via Service Account, Workload Identity or Google Workspace Account.

3. Defining Data Stewards and periodic check of accounts with Access

Prerequisite

  1. Enable the Data Catalog and BigQuery Data Policy APIs.
  2. A Service Account or Workspace Account or a Principal to provide access
  3. Biquery Table [Required if you want apply a policy tag]

Taxonomy is a collection of policy tags that classify data along a common axis.

For instance,

data classfication taxonomy could contain policy tags denoting Sensitive PII, Restricted, Confidential etc. [This blog use this example]

A data origin taxonomy could contain policy tags to distinguish user data, employee data, partner data, public data.

1. Create Taxonomy

1.1 Request body

  1. Let’s create a data-classification-taxonomy
{
"activatedPolicyTypes": [
"FINE_GRAINED_ACCESS_CONTROL"
],
"displayName": "Data Classification Taxonomy",
"name": "data-classification-taxonomy",
"description": "A Data Classification for Public, Internal, Restricted, and Confidential data",
"service": {
"name": "MANAGING_SYSTEM_DATAPLEX"
}
}
curl -X POST \                                                                                                                                                         ✔   
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-d @data-classification-taxonomy.json \
https://datacatalog.googleapis.com/v1/projects/data-proc-poc/locations/us/taxonomie

1.2 Go To BigQuery → Policy tags

1.3 Originally there will no policy tags attached to it

1.4 List Taxonomy

 gcloud beta data-catalog taxonomies list --location=us

1.5 REST API to List Taxonomies

curl -X GET \
-H "Content-Type: application/json; charset=utf-8" \
-H "X-goog-api-key: <API_KEY>" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us/taxonomies

Policy tags

Policy tags are tags with access control policies that can be applied to sub-resources, for example, BigQuery columns.

List Policy Tags

Initially, there will no Policy Tags attached

curl -X GET \
-H "Content-Type: application/json; charset=utf-8" \
-H "X-goog-api-key: <API_KEY>" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us/taxonomies/<TAXONOMY_ID>/policyTags

1. Create Policy Tags

Assume, we are creating 3 Parent policy tags

  1. Restricted
  2. Sensitive PII
  3. Confidential

1.1. Create Sensitive PII policy tag.

Repeat: Create the policy tags for Other Data Classes

{
"description": "Data intended for limited use by authorized persons. Data in this class most protection.",
"displayName": "Sensitive PII",
"parentPolicyTag": ""
}

1.2 List all the Policy Tags added to the taxonomy

curl -X GET \
-H "Content-Type: application/json; charset=utf-8" \
-H "X-goog-api-key: <API_KEY>" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us/taxonomies/112233445566778899/policyTags

1.3 Check in the UI

2. Add the sub tags for each of the Data Policy

Let’s say, we add the following child policies to the parent policy

  1. name, date of birth to Sensitive PII
  2. financial information to Confidential

For example, the Request body for the `age` policy tag with Sensitive PII parentPolicyTag

Get Sensitive PI parent policy tag from the list above

‘projects/data-proc-poc/locations/us/taxonomies/1234567890/policyTags/112233445566778899’

{
"displayName":"age",
"description":"age of the employee",
"parentPolicyTag":"projects/data-proc-poc/locations/us/taxonomies/1234567890/policyTags/112233445566778899"
}
curl -X POST \
-H "Content-Type: application/json; charset=utf-8" \
-H "X-goog-api-key: <API_KEY>" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-d @age-policy-tag.json \
https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us/taxonomies/1234567890/policyTags
Response

Repeat this process for the other policyTags

2.1 List all the Policy Tags under the Employee taxonomy `1234567890`

curl -X GET \
-H "Content-Type: application/json; charset=utf-8" \
-H "X-goog-api-key: <API_KEY>" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us/taxonomies/1234567890/policyTags

Refresh the UI

2.2 Set Policy to the PolicyTags in the Taxonomy

projects.locations.taxonomies.setIamPolicy

settingFine-Grained Reader to the Policy Tag Confidential

Assign the `datacatalog.categoryFineGrainedReader` to a service account erviceAccount:some_name_sa@data-proc-poc.iam.gserviceaccount.com

{
"policy": {
"bindings": [
{
"role": "roles/datacatalog.categoryFineGrainedReader",
"members": [
"serviceAccount:some_name_sa@data-proc-poc.iam.gserviceaccount.com"
]
}
]
}
}
curl --request POST \
-H "X-goog-api-key: <API_KEY>" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
--data @iamPolicy.json \
https://datacatalog.googleapis.com/v1/projects/data-proc-poc/locations/us/taxonomies/123456789/policyTags/1122334455667788990:setIamPolicy
Response

Check the policy tag and see the role/principal

Create/Update BigQuery table with policy tags

1. Create schema.json with policy tags

{
"tableConstraints": {
"primaryKey": {
"columns": [
"employee_id"
]
}
},
"schema": {
"fields": [
{
"type": "STRING",
"mode": "NULLABLE",
"name": "employee_name",
"description": "Name of the employee",
"policyTags": {
"names": [
"projects/data-proc-poc/locations/us/taxonomies/1234567890/policyTags/11223344556677889900"
]
}
}
{
"type": "STRING",
"name": "employee_id",
"description": "A unique ID of the employee. A Primary Key"
}
]
},
"description": "An employee table",
"kind": "bigquery#table",
"type": "TABLE",
"id": "employee",
"tableReference": {
"tableId": "employee",
"datasetId": "employeedb",
"projectId": "example-project-id"
}
}

2. Use Respective Table APIs to create table.insert and update tables.patch

curl --request POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
--data @employee-schema.json \
https://bigquery.googleapis.com/bigquery/v2/projects/data-proc-poc/datasets/employeedb/tables\?projectId\=data-proc-poc\&datasetId\=employeedb

3. Go To the Taxonomy and Enable the `access control`

4. The access to the columns are restricted to the service account

BigQuery policy Tags can’t be applied to external tables

{
"error": {
"code": 400,
"message": "Policy tags are not supported on plain external tables. Use a BigLake table instead.",
"errors": [
{
"message": "Policy tags are not supported on plain external tables. Use a BigLake table instead.",
"domain": "global",
"reason": "invalid"
}
],
"status": "INVALID_ARGUMENT"
}
}

Conclusion

In conclusion, implementing policy tags in BigQuery is a strategic move toward robust data governance. By following best practices and leveraging taxonomies and policy tags effectively, organizations can enhance data security, ensure compliance, and unlock the full potential of their data.

References

--

--

Mahendran

A Software/Data Engineer, Photographer, Mentor, and Traveler