Big Data
Oct 10, 2024

Data Governance Best Practices for Snowflake

Imagine your data team finally decides to use Snowflake as its data warehouse of choice. After weeks of juggling the pros and cons of each potential solution, excitement easily gets the best of you, springing you into action with setting up your data environment. You ingest all your data sources and give everyone access to query the raw data. 

A few weeks later you realize you have a major problem. You don’t have any way of knowing who queries what, how resources are being used, and what drives the largest costs. Now that many different teams use Snowflake, it’s difficult to restrict access and tighten control of the data. 

This is why, when setting up a data warehouse like Snowflake from scratch, it’s imperative to first focus on a data governance strategy. Documenting and closely following data governance rules ensures your team follows best practices before there’s data in Snowflake and anyone can access it. 

However, not everyone is so lucky to choose their data warehouse vendor and architect it as they see fit. If you’ve been using Snowflake without much thought to governance, it’s never too late to implement some of the best practices I’m about to introduce to you. The best time for data governance is now! 

Why is Snowflake data governance important?

Data governance allows you to maintain tight control over who has access to your data, ensuring the wrong users or tools cannot modify data. In other words, it keeps the integrity of your data. You can see the different objects users access, processes taking up the most resources, and transformations being applied to raw data. These are all data points that then allow you to improve your data governance strategy! 

Data governance not only helps you manage your data warehouse better, but it keeps your data secure. It allows you to follow rules like HIPPA and GDPR, ensuring PII data is hidden according to these standards. Restricting access to your data helps you to prevent security breaches rather than acting on them after the fact. 

Best Practice #1: Creating roles in Snowflake specific to the tools and users who interact with it.

Snowflake allows you to create different roles to which you can assign certain permissions within the warehouse. Roles make it easier to grant user groups access permissions depending on their role within the company.

I recommend creating a role for each of the different types of users and tools that touch your warehouse. 

Here are a few examples:

  • Analyzer (for data analysts reading data and creating new reporting tables)
  • Reader (for business stakeholders who know how to query basic data models)
  • Loader (for data ingestion tools loading in raw data) 
  • Transformer (for transformation tools reading from your raw data and creating new tables)

Now, when you create a user within Snowflake, you can assign them a role with a specific set of permissions instead of granting individual permissions to each user. This also allows you to keep track of the resources being used by certain types of roles. For example, if you see the loader role spending a large amount of credits, you can then investigate a potential bottleneck in the ingestion part of your data pipeline.

How to create users and roles in Snowflake

Let’s create a user and role within Snowflake for a data exploration tool like Gigasheet. For the user, we will simply call it Gigasheet. Since this is primarily a tool for analyzing data, we will create the analyzer role and then assign it to the Gigahseet user. 

To create a role in Snowflake, run the following command:

CREATE ROLE analyzer; 

Grant the role permissions with the following commands:

GRANT USAGE ON DATABASE prod TO ROLE analyzer;
GRANT USAGE ON DATABASE rda TO ROLE analyzer;
GRANT MODIFY ON DATABASE rda TO ROLE analyzer;
GRANT MODIFY ON FUTURE SCHEMAS IN DATABASE rda TO ROLE analyzer;

Keep in mind that these permissions depend on the purpose of the role and what resources it should and shouldn’t have access to. USAGE grants allow you to read from a database and MODIFY allows you to write to it. In this case, we are allowing the ANALYZER role to read from prod and read and write to the reporting and data analysis (rda) database. 

To create a user and assign it a role within Snowflake, run the following command:

CREATE USER gigasheet SET DEFAULT_ROLE=analyzer;

Now, this user automatically inherits the permissions given to the analyzer role! 

With users and roles in Snowflake, you always control who and what has access to certain pieces of data. With tight governance of permissions you can ensure your data is secure and easily measurable. 

Best Practice #2: Masking PII data such as social security and credit card numbers.

Data masking is a data warehouse feature that allows you to hide values in certain fields depending on the role of the user viewing your data. It’s a great way to hide PII data such as social security numbers and credit card numbers from people who do not need to be seeing that information. 

Snowflake offers Dynamic Data Masking which allows you to write a masking policy once and apply it to as many fields as needed. This helps to keep data governance centralized, preventing data masking from falling into the hands of object owners. 

Let’s say we have a field in one of our tables that contains credit card numbers. Data analysts and analytics engineers will never need to know a customer’s credit card number. For this reason, you can create a masking policy to apply to this field for all analytics roles within Snowflake

However, it may be important for someone from your support team to see a customer’s credit card number in case of returning funds or validating their identity. In this case, you would want to leave the credit card number field visible for the SUPPORTER role in your data warehouse. 

Keep in mind that whoever is creating masking policies must have specific permissions to do so. I recommend granting this to an ACCOUNT ADMIN, or a role specific for setting data governance standards (like one called GOVERNOR) within your data warehouse. 

How to create a Dynamic Data Masking policy in Snowflake

To grant the ability for a role to apply a policy to a specific schema, use the following command:

GRANT CREATE MASKING POLICY on SCHEMA prod.mart to ROLE governor;

And, if you want to give the role the ability to apply one to any object in the account, use this command:

GRANT APPLY MASKING POLICY on ACCOUNT to ROLE governor;

Masking policies can be relatively simple, or complex. In my opinion, the simpler the better. If someone doesn’t need to see any information, it doesn’t make sense to try to display even a piece of it. In some scenarios, displaying the last 4 digits of a credit card number may be necessary, but in most cases (especially for analytics), it won’t be. 

Creating a masking policy works similarly to creating any new object within Snowflake. 

Use the following command but replace <name> with the name of your policy. Choose a name that represents the type of information the policy should be masking. 

CREATE OR REPLACE MASKING POLICY <name> AS (val string) RETURNS string ->

Next, you want to include the SQL logic to apply to the value, or column, you set the masking policy on. 

CREATE OR REPLACE MASKING POLICY pii_masker AS (val string) RETURNS string ->
CASE
    WHEN CURRENT_ROLE() IN ('SUPPORTER') THEN val
    ELSE '*********'
  END;

With this pii_masker policy, only users assigned the SUPPORTER role can see the value of the field. For all other users, they will simply see the starred version of the value. 

After creating this policy, you still need to apply it to the fields you want to be masked. To apply the policy to a field called credit_card_number in the customers table, run the following command:

ALTER TABLE IF EXISTS customers MODIFY COLUMN credit_card_number SET MASKING POLICY pii_masker;

Now, whenever a data analyst or analytics engineer accesses the customers table, they will be able to see the values in all fields except for the credit_card_number field. This also acts as a lyer of protection to prevent this information from getting into the hands of an external tool that accesses your data warehouse. 

Best Practice #3: Add tags to objects based on their usability.

A tag is a schema-level object within Snowflake that can be assigned to another Snowflake object. Tags allow you to closely monitor resources within your warehouse. They are ideal for tracking costs, maintaining compliance, and managing policies. In fact, you can use tags to more easily apply dynamic data masking policies!

How to create object tags in Snowflake

To create a tag within Snowflake, run the following command:

CREATE TAG <name> COMMENT=’<comment>’ 

If we were to create a tag to help identify schemas with PII data, you could do so like this:

CREATE TAG pii COMMENT=’tag type of PII data’; 

To alter an already-existing object and set a tag on it, you run the following command:

ALTER SCHEMA mart SET TAG pii=’social security’; 

This now tags the mart schema as containing social security PII data. Over time, you can track how many tables contain different types of PII data and how frequently they are accessed.

Choosing a tool that maintains this governance

When you spend so much time upholding data governance within your Snowflake data warehouse, you want to make sure that governance applies all across your data stack. The data governance practices in your data warehouse are only as good as the data governance practices of the tools that sit on top of it. 

This is why it’s so important to look for tools that work with the secure measures and tight access control that you have worked so hard to maintain. Tools that access your warehouse should inherit the users and roles you create, respect the data masking you add on PII data, and work with your object tags. 

Modern data stack tools used by stakeholders especially need to follow these guidelines, as you don’t want them to accidentally overwrite or delete important data. Luckily, tools like Gigasheet respect the work done by analytics engineers in the data warehouse. 

Gigasheet is a data exploration tool with a spreadsheet interface that sits on top of Snowflake, allowing stakeholders to gather insights on the company data without risking governance. The tool inherits the data governance rules set on the underlying data warehouse. It is an SOC 2 accredited environment meaning it’s been audited for security, availability, processing integrity, confidentiality, and privacy. In other words, it only strengthens your data governance strategy. 

Even with advanced capabilities like Live Query Mode and write-backs, where stakeholders can perform transformations on fresh data and write their analyses directly back to tables in Snowflake, Gigasheet respects permissions set in place. Data can be analyzed without any sensitive information ever leaving the data warehouse environment. To add, stakeholders have full reign to write and save their own transformations, while doing so in the database and schemas that they have access to. 

Conclusion

When setting up Snowflake, be sure to do so with data governance top of mind. Following best practices like creating users and roles, writing dynamic data masking policies, and tagging objects will allow you to maintain a transparent, secure, and safe Snowflake environment for not only analytics engineers but the stakeholders exploring the data sitting on top of it. 

Carrying these practices through all other tools in your data stack, especially your data exploration tools, allows you to create a safe yet friendly environment that stakeholders can use and analytics engineers can feel secure in letting them use. With tools like Gigasheet, that only make data governance easier, we can start putting data governance first without sacrificing the data exploration experience.

The ease of a spreadsheet with the power of a data warehouse.

No Code
No Training
No Installation
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.