Fostering a culture of self-service analytics has always been one of the greatest challenges for data teams. Data teams are busy and creating a scalable, accurate data environment is hard work, requiring extra time they don’t often have. Data teams want to keep data quality high, ensuring metrics are correctly calculated and the right data sources are used. However, this leads to slow processes that make stakeholders feel bottlenecked by the data team.
This is why self-service analytics is so important. It has the power to allow data teams to build slowly, empowering stakeholders to answer their business questions, all while maintaining the highest quality of data.
Self-service analytics put business stakeholders in the driver’s seat. They no longer need to wait on the data team to deliver what they need to progress on their project. Instead of waiting for an analytics engineer or data analyst to write a query or provide them with the dashboard they need, they can search for the answer themselves.
Self-service analytics allows stakeholders to bypass the data team and get their hands dirty with the data available to them. This is done through easy-to-use tools that stakeholders feel comfortable using, clean datasets and data models, and a governed data warehouse environment. It’s important to note that it’s NOT a “free for all”- making the right tools and resources available is vital when it comes to self-service analytics.
When you try to create a self-service analytics culture without first thinking about the tools and resources available, you are setting yourself up for failure. Remember- insights produced are only as valuable as the quality of the data they’re built on!
It’s important to think through the tooling that stakeholders need to be successful, and even teach them how to use it if necessary. You also need to ensure you have accurate and clean data available to them to play around with.
With self-service analytics, implementing the needed guardrails and data governance with your tools such as Gigasheet and Snowflake is critical. If this isn’t managed from the start, your data environment can grow messy and unreliable from too many cooks in the kitchen!
Data quality becomes even more important when stakeholders are going to be the ones directly accessing your data. Data teams need to provide them with datasets to help answer all of their questions. Not to mention, these datasets need to be easy to understand and well-documented so that they can figure out what different fields mean and how to use them.
When thinking about the datasets to build for self-service analytics, think about the different dimensions that stakeholders will be interested in. What timestamp fields do they need to analyze how data changes over time? What categorical fields will they want to slice and dice the data by? Anticipating these before making data available to them will save you work in the long run, allowing you to properly build your datasets from the start.
For example, let’s say we were building a dataset that contained all the information on orders placed on a website. Marketers would care about the referrer that brought the customer to the website or how many previous purchases the customer has made. The finance team would care about order value, discounts, tax, and customer acquisition cost. Product teams would want to know what the customer bought, how much, and in what colors. It’s important to look ahead to forecast the data that these different stakeholders care most about.
As you build out this data, documentation and testing become even more important as you want to ensure stakeholders can have their questions answered accurately. The data team needs to have tests in place to detect any data quality issues as soon as they happen, ensuring stakeholders are alerted. Using a data transformation tool that combines docs and tests, like dbt, can make it easy to stay on top of data quality and ensure stakeholders have a first-hand look into the quality status.
With stakeholders closely working with data, more things can go wrong. You no longer have just the experts dealing with the data, but open your pipeline up to those with less technical experience. As the data person, your job is to ensure things can’t go wrong, even when less technical folks are poking around in the data. The best way to ensure this is to anticipate it.
First, start by giving stakeholders the least amount of permissions that they need to do their job. Only give them READ access to pre-approved datasets in a data warehouse like Snowflake. Don’t give them any WRITE access to tables and views in your warehouse, unless you’ve created a specific schema for them to save their experimental queries. Make sure you limit all access to raw data, as this is data that hasn’t been cleaned, meaning it could be incorrectly used by stakeholders.
If using a data warehouse like Snowflake, you may want to tag your tables and views with what is ready to be used by the business and what is still in development. For example, raw data could be tagged with development to indicate that it is not to be used by stakeholders. Datasets created by the data team but have yet to be validated could be tagged with stg_prod. Datasets ready to be used by the business could have a production label.
Snowflake makes it super easy to tag your tables and views and configure permissions to users and roles based on these tags. You could configure the role you give your stakeholders to only access tables tagged with production! This makes it super easy to manage in the case that something slips off your radar. Read more about data governance best practices with Snowflake.
If a stakeholder doesn’t know how to use the self-service analytics tool you give them, problems are guaranteed to arise. They are more likely to perform incorrect joins, aggregate the wrong fields, and create long-running queries. You need to meet stakeholders where they are at.
If a stakeholder doesn’t know SQL, don’t expect them to learn SQL. Provide a tool that has an interface they are comfortable with. Because most stakeholders are trained in Excel, and this is where they most likely perform their current data analysis, it makes sense to use a tool that mimics the functionality of a spreadsheet.
Gigasheet is a tool with a spreadsheet-like interface that connects directly to data warehouses like Snowflake and handles large volumes of data. Unlike Excel, stakeholders can quickly process large amounts of data without any impact on performance. This allows them to get the data they need, in the way they know how to get it, but much faster and more efficiently.
Because of its spreadsheet-like interface, Gigasheet is a tool that stakeholders quickly feel comfortable with while having access to advanced data features that they otherwise wouldn’t. Any data team member would also be happy to learn that Gigasheet is secure and inherits the data governance strategy of your data warehouse. This means that the data team doesn’t need to worry about access control or data getting in the wrong hands.
Another beneficial feature of Gigasheet is the ability for stakeholders to write back data to Snowflake if they need to save a particular dataset to a sandbox schema. This allows them to store the reports that they frequently access within Snowflake for convenient use.
Self-service analytics can make or break how data is used within an organization. There are instances where I’ve seen it implemented very poorly, ending in misguided business decisions. However, there are also times when it frees up work from the data team, allowing them to focus on more important initiatives for the business, while empowering stakeholders to move fast in their strategic choices.
The difference between these two scenarios is that one doesn’t follow the best practices I’ve outlined and the other does. To successfully implement self-service analytics, you need to create usable datasets, properly govern your data, and choose a tool that stakeholders understand without the data team needing to train them. Tools like Gigasheet and Snowflake have made data governance and stakeholder workflows easier and more secure than ever. Together, these practices and tools reduce the likelihood of bad-quality data being used.