Data Discovery in GCP: Dataplex Tags, Tag Templates, Entry Groups and Entries

5 min readJan 16, 2024

A data-driven organization is crucial for business and Efficient data discovery is crucial for business success. As data stewards and owners, our primary responsibility is defining business metadata to meet organizational needs.

The synergy of a well-structured tagging system and organized entry groups establishes a robust environment for comprehensive data exploration, understanding, and utilization. The combination of organized entry groups and a well-structured tagging system in Google Cloud creates a powerful environment, driving organizations towards data-driven excellence by optimizing the full potential of their data assets.

Tag Template

Tag templates serve as the cornerstone in crafting a comprehensive data governance strategy. With the ability to create and manage metadata for data assets in a centralized manner, tag templates act as a database schema for your metadata

By utilizing tag templates, data stewards can create a standardized framework for tagging, promoting consistency and coherence across the entire data landscape. Whether it’s public tags supporting simple searches or private tags enabling predicate-based searches, the flexibility offered ensures that data discovery is not only efficient but also tailored to the specific needs of the organization.

Public tags are useful for a broad set of scenarios and these tags are intuitive to use. Public tags support simple search and search with predicates while private tags support only search with predicates.

1. Create a Tag Template

Lets start with creating a data governance template.

1.1 Create a Tag Template employeeDataGovernanceTagTemplate.json

{
   "name":"CUSTOMER_DATA_GOVERNANCE_TAG_TEMPLATE",
   "displayName":"Customer Data Governance Tag Template",
   "fields":{
      "num_rows":{
         "displayName":"Number of rows in data asset",
         "isRequired":"false",
         "type":{
            "primitiveType":"DOUBLE"
         }
      },
      "source":{
         "displayName":"Source",
         "isRequired":"true",
         "type":{
            "primitiveType":"STRING"
         }
      },
      "tier":{
         "displayName":"Tier that this data asset belong to",
         "isRequired":"true",
         "type":{
            "primitiveType":"STRING"
         }
      },
      "data_rentention_policy":{
         "displayName":"Data Retention Policy",
         "isRequired":"true",
         "type":{
            "primitiveType":"STRING"
         }
      },
      "data_classification":{
         "displayName":"Data Classification",
         "isRequired":"true",
         "type":{
            "enumType":{
               "allowedValues":[
                  {
                     "displayName":"INTERNAL"
                  },
                  {
                     "displayName":"PUBLIC"
                  },
                  {
                     "displayName":"SENSITIVE"
                  },
                  {
                     "displayName":"CONFIDENTIAL"
                  },
                  {
                     "displayName":"NONE"
                  }
               ]
            }
         }
      },
      "has_pii":{
         "displayName":"Has PII",
         "isRequired":"false",
         "type":{
            "primitiveType":"BOOL"
         }
      },
      "pii_type":{
         "displayName":"PII type",
         "isRequired":"false",
         "type":{
            "enumType":{
               "allowedValues":[
                  {
                     "displayName":"CUSTOMER_ID"
                  },
                  {
                     "displayName":"EMAIL_ADDRESS"
                  },
                  {
                     "displayName":"PHONE_NUMBER"
                  },
                  {
                     "displayName":"NONE"
                  }
               ]
            }
         }
      }
   }
}

1.2 Create a Tag `employee_data_governance_tag`

API:

https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us-west1/tagTemplates\?tagTemplateId\=employee_data_governance_tag_template

curl -X POST \
    -H "Content-Type: application/json; charset=utf-8" \
    -H "X-goog-api-key: <API_KEY>" \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -d @EmployeeDataGovernanceTagTemplate.json \
    https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us-west1/tagTemplates\?tagTemplateId\=employee_data_governance_tag_template

2. Lookup the Data Catalog `entry-id` for your BigQuery table

 curl -X GET \
    -H "Content-Type: application/json; charset=utf-8" \
    -H "Authorization: Bearer $(gcloud auth application-default print-access-token)" \
     https://datacatalog.googleapis.com/v1beta1/entries:lookup\?linkedResource\=//bigquery.googleapis.com/projects/data-proc-poc/datasets/employeedb/tables/employee

3. Create a tag from the template and attach it to your BigQuery table

3.1 Request Body

{
  "name": "projects/project-id/locations/US/entryGroups/@bigquery/entries/bigquery/entries/<entry-id>/tags/employee_data_governance",
  "template": "projects/data-proc-poc/locations/us-west1/tagTemplates/employee_data_governance_tag_template",
  "fields": {
    "pii_type": {
      "displayName": "PII Columns",
      "enumValue": {
        "displayName": "BONUS"
      }
    },
    "has_pii": {
      "displayName": "Has PII",
      "boolValue": true
    },
    "source": {
      "displayName": "Source",
      "stringValue": "External"
    },
    "data_rentention_policy": {
      "displayName": "Data Rentention",
      "stringValue": "4 Years"
    },
    "tier": {
      "displayName": "Data tier",
      "stringValue": "2"
    },
    "data_classification": {
      "displayName": "Data Classification",
      "enumValue": {
        "displayName": "INTERNAL"
      }
    }
  },
  "templateDisplayName": "employee data governance"
}

3.2: Post the Tags

 curl -X POST \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -d @employee_tag.json \
  https://datacatalog.googleapis.com/v1beta1/projects/data-proc-poc/locations/us/entryGroups/@bigquery/entries/<entry-id>/tags

3.3 Check the UI for the Tags published and Attached

Entry Groups

Entry Groups: Organizing Data for Intuitive Discovery

In the realm of Data Discovery, creation of custom entry groups to manage entries for Cloud Storage Filesets or custom data resource types ensures that data is organized logically, further enhancing the efficiency of data discovery processes.

1. Creating a Custom Entry Group

1.1 Create an Entry Group `cloud-storage-fileset`

Request body in cloud-storage-entry-group.json

{
  "description": "Cloud Storage fileset entries",
  "displayName": "Cloud Storage Fileset",
  "name": "cloud-storage-fileset"
}

1.2: Post the `cloud-storage-entry-group.json`

curl -X POST \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -d @cloud-storage-entry-group.json \
  https://datacatalog.googleapis.com/v1/projects/data-proc-poc/locations/us-west1/entryGroups?entryGroupId=cloud_storage_fileset

1.3 Open the dataplex UI and to Entry Groups

2. 1. Create an entry under custom entry group

2.1.1. Create a json body for a `Fileset` entry `custom_entry.json`

{
  "fullyQualifiedName": "gcs:<bucket_name>",
  "type": "FILESET",
  "dataSource": {
    "resource": "",
    "service": "CLOUD_STORAGE",
    "storageProperties": {
      "fileType": "CSV",
      "filePattern": [
        "data/input/employee.csv"
      ]
    }
  },
  "linkedResource": "storage.googleapis.com/storage/v1/b/<bucket_name>",
  "labels": {
    "appid": "app1234"
  },
  "gcsFilesetSpec": {
    "filePatterns": [
      "gs://<bucket_name>/data/input/employee.csv"
    ]
  },
  "schema": {
    "columns": [
      {
        "column": "employee_name",
        "type": "STRING",
        "ordinalPosition": 0
      },
      {
        "column": " department",
        "type": "STRING",
        "ordinalPosition": 1
      },
      {
        "column": "state",
        "type": "STRING",
        "ordinalPosition": 2
      },
      {
        "column": "salary",
        "type": "INTEGER",
        "ordinalPosition": 3
      },
      {
        "column": "age",
        "type": "INTEGER",
        "ordinalPosition": 4
      },
      {
        "column": "bonus",
        "type": "INTEGER",
        "ordinalPosition": 5
      },
      {
        "column": " employee_id",
        "type": "STRING",
        "ordinalPosition": 6
      }
    ]
  }
}

2.1.2 Create an Entry `employee_entry`.

curl -X POST \
  -H 'Accept: application/json' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -d @custom_entry.json \
  https://datacatalog.googleapis.com/v1/projects/data-proc-poc/locations/us-west1/entryGroups/cloud_storage_fileset/entries\?entryId\=employee_entry

2.1.3 Open the dataplex UI, Entry Groups and Entries

Conclusion

As data stewards and data owners, it is our responsibility to define the business metadata for the data that serves our business needs. By following the outlined steps for Tags, Tag Templates, and Entry Groups in Google’s Data Catalog, we can effectively enhance data governance, ensuring the proper management and contextualization of our valuable data assets.

Data Discovery in GCP: Dataplex Tags, Tag Templates, Entry Groups and Entries

Tags

Tag Template

1. Create a Tag Template

1.1 Create a Tag Template employeeDataGovernanceTagTemplate.json

1.2 Create a Tag `employee_data_governance_tag`

API:

2. Lookup the Data Catalog `entry-id` for your BigQuery table

3. Create a tag from the template and attach it to your BigQuery table

3.1 Request Body

3.2: Post the Tags

3.3 Check the UI for the Tags published and Attached

Entry Groups

1. Creating a Custom Entry Group

1.1 Create an Entry Group `cloud-storage-fileset`

1.2: Post the `cloud-storage-entry-group.json`

2. 1. Create an entry under custom entry group

2.1.1. Create a json body for a `Fileset` entry `custom_entry.json`

2.1.2 Create an Entry `employee_entry`.

Conclusion

References

Written by Mahendran

No responses yet

Data Discovery in GCP: Dataplex Tags, Tag Templates, Entry Groups and Entries

Tags

Tag Template

1. Create a Tag Template

1.1 Create a Tag Template employeeDataGovernanceTagTemplate.json

1.2 Create a Tag `employee_data_governance_tag`

API:

2. Lookup the Data Catalog entry-id for your BigQuery table

3. Create a tag from the template and attach it to your BigQuery table

3.1 Request Body

3.2: Post the Tags

3.3 Check the UI for the Tags published and Attached

Entry Groups

1. Creating a Custom Entry Group

1.1 Create an Entry Group `cloud-storage-fileset`

1.2: Post the `cloud-storage-entry-group.json`

2. 1. Create an entry under custom entry group

2.1.1. Create a json body for a `Fileset` entry `custom_entry.json`

2.1.2 Create an Entry `employee_entry`.

Conclusion

References

Written by Mahendran

No responses yet

2. Lookup the Data Catalog `entry-id` for your BigQuery table