Big Data
Sep 10, 2024

Best Practices for Fostering Data Collaboration

As data people, we want to spend time working through problems, coding solutions, and providing the highest quality data possible. However, constant back-and-forth communication with stakeholders takes up all our time, leaving none for the work that matters. To do our best work, we need to find ways to collaborate with our business stakeholders in a way that empowers them while giving us the clarity we need to do our jobs well. 

Too many times, I’ve received requests from stakeholders asking me to provide a dataset of active accounts or an account’s total number of sessions. While these sound like simple requests, they quickly snowball. Because we never started with the problem and instead jumped right to the ask, neither of us got the information needed to come to a solution.

This created a constant cycle of questions that needed more clarity. I kept continually pulling new data and calculating various forms of the same metric while they kept changing their request based on what they were receiving. Ultimately, the stakeholders would have been better off investigating the data themselves based on the problem they were trying to solve. 

In this article, I will share the best practices you need to implement into your workflow so that you don’t face the same problems I did. With some work upfront, you can save yourself and your stakeholders many hours of frustration, leaving time to solve the highly technical problems that only you can solve.  

Best Practice #1: Create a system for submitting and prioritizing data requests.

The data team sits at the center of any organization, serving many different teams. This means we get a lot of requests from everyone. How many times have you been asked, “Can you quickly add this field?”, or “Do you mind pulling this data for me real quick?”. Only we know that these asks are never truly quick tasks. 

While it sounds counterintuitive, it’s important to say no and force stakeholders to take a step back. If we don’t say no to “small asks” like this, we will never have time to focus on the work that moves the needle, like building complex data models or adding tests to our data sources. 

Before taking on any ad-hoc task you need to:

  • Ask about the problem trying to be solved 
  • Collect as many details as possible upfront before trying to solve this problem

Asking a stakeholder to define the problem and how it affects the business is sometimes enough to allow them to come to a solution without the help of the data team. 

Gaging details like the due date of the request, its urgency level, and the platforms involved can also provide a lot of clarity on a request’s importance, allowing you space to properly allocate time to solving it. 

I recommend creating an intake system that guides stakeholders through their requests, helping groom them before they land directly in your lap. A template with pre-set questions that they must answer will allow you to collect most of the information you need without facilitating a back-and-forth conversation over Slack. 

Some examples of questions to add to your intake form include:

  • What problem are you trying to solve?
  • Are there any projects that are dependent on this?
  • What outcome are you looking for? (dashboard, analysis, data pull, new field, bug report, etc.)
  • How do you plan on using this?
  • Does this affect any other teams or stakeholders?
  • Is there anything else we should know?
Data Request intake system

Every time a stakeholder asks you for something, direct them to a form with these questions, letting them know that this will be reviewed before the upcoming sprint. 

This being said, many data teams work in 1-month long sprints. If this is you, I highly recommend challenging this idea. As an analytics engineer, I found myself overwhelmed by 1-month long sprints. This period felt too long to only prioritize planned tasks, meaning I had to squeeze in “urgent” tasks so that stakeholders weren’t waiting an entire month. However, this led to overwhelm and didn’t give space for proper planning. 

Switching to 2-week long sprints has allowed me to say no more often and to wait until the end of the sprint to take on new requests. This amount of time allows problems to work themselves out when possible but also allows you to work on them promptly when needed. 

Since switching our teams’ sprint lengths, I’ve had urgent requests land in my lap that end up not so urgent by the time I can scope them. This has saved me a lot of mental strain worrying about things that aren’t as big of a deal as they may seem at a given moment. 

Best Practice #2: Write thorough documentation of data models and teach stakeholders how to use it. 

I’m a huge believer in giving business stakeholders the information they need to be successful in data collaboration. This means you need to take time to document data in your data models, dashboards, and data platforms. You can’t expect stakeholders to understand your code or assume the definition behind a certain metric. Stakeholders should be able to look up exactly what a metric or field means, in the context that matters to them, without needing to message someone from the data team.

When stakeholders can find data definitions on their own, you have well-written documentation. 

During my first few months as an analytics engineer working closely with reverse ETL tools, stakeholders kept asking me, “Is this field being synced?” and “What does this field mean?”. Once I realized how much time I spent answering these questions,  I created a Notion database that lists a property, its business definition, its calculation, how often it updates, and when it was last synced.

This documentation empowers stakeholders to seek the answer themselves. They can get the information they need in a way that's predictable and not reliant on me being available to answer their question. 

It’s easy to assume that stakeholders have enough information available to them about a field since they were the ones who requested it. However, it’s really important to document all aspects of a data point to minimize their dependency on you. 

When doing this, make sure you are using a platform that stakeholders are familiar with and enjoy using. Preferably, it’s something already embedded in their daily workflows. 

While I always document data in dbt, stakeholders don’t know how to use dbt, and they definitely don’t use it every day. Instead of choosing to document fields here, I document them in Notion, the company’s documentation tool of choice. 

Documentation in itself is a great tool to have, but it also enables the next best practice. Without it, adding the right tools to your tech stack would be a whole lot harder. 

Best Practice #3: Find a self-service tool that allows business stakeholders to explore data in the way they need, while still governing your data. 

While stakeholders would love access to all of our data, any data person knows this is a bad idea. Not only can data be wrongly deleted or updated, but the wrong data can then be used to make strategic decisions. Business stakeholders often lack the context behind data and the idea of data quality.

For this reason, giving stakeholders free rein over your data can be daunting; it’s a fine balance between empowering them but also ensuring your data is safe and secure. 

Rather than giving stakeholders access to raw data, it’s important to give them access to data models written by analytics engineers. This way, any edge cases have been handled and quirks in the data have already been worked out. In the ideal world, data models should produce data clean enough to be queried by all end users. 

Because data models live within the data warehouse, it’s important to find a tool for business stakeholders to use that directly integrates with your data warehouse. It’s also essential to find a tool that doesn’t give full access to your data warehouse but rather allows you to tightly control the access that each user has. This allows stakeholders to access the data they need but nothing that you don’t feel comfortable giving them access to. 

As an analytics engineer, I work with all types of stakeholders from sales to marketing to product, most of which have no experience using SQL. This means they need a tool that will allow them to explore the data in ways that they feel comfortable. In situations like these, you need to meet stakeholders where they’re at and find a tool that they don’t need to relearn. Ultimately, this is what will allow them to feel comfortable seeking out answers themselves rather than relying on you. 

Gigasheet fosters data collaboration by empowering stakeholders to answer their own data questions in a way they are comfortable with, without sacrificing data governance and performance on high volumes of data.

Gigasheet has a spreadsheet interface similar to Excel and Google Sheets, allowing stakeholders to use the functions and features they know and love. It lives on top of your data warehouse but follows data governance best practices, allowing you to control the schemas and tables stakeholders have access to. Rather than working with a subset of data, stakeholders can access ALL the data (yes, even millions of rows) they need without sacrificing performance. 

Use Gigasheet for collaboration on big data
3M rows of Invoice Data for Collaboration

It’s rare to find a data collaboration tool that checks all of the boxes, so when you do, it’s important to leverage it! Try it for free today.

Conclusion 

Data collaboration requires the right systems and tools in place to be successful. While it will never be perfect, it is something that you can constantly iterate on to learn what is working and what is not. 

At the heart of data collaboration is self-empowerment. Data teams need to give stakeholders the tools they need to feel confident in seeking out answers themselves. They need to have the documentation that helps them find exactly what they are looking for and tools like Gigasheet in place that help them do it in the way they know how to. 

When documenting things the right way and choosing the best tools there are, you don’t need to sacrifice analytics best practices. You can have your cake and eat it too! Analytics engineers can be proud to introduce a tool like Gigasheet into the tech stack knowing their data is secure and tightly managed. No accidental deletions around here!

When you create a data intake form, document your data models, and use the best self-service tools, you can expect:

  • Less ad-hoc data questions or requests for data pulls 
  • More thoughtful data questions from stakeholders 
  • Increased use of high-quality data 
  • A better understanding of the data 

And who doesn’t want that?

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Training
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.