Analyzing a spreadsheet file with 10-20 rows is very straightforward.
But what if you were asked to analyze one with over 1,000 rows? Or 100,000? Maybe a million? Or a billion?
To make sense of large spreadsheet files, it’s a good practice to structure them into smaller units. By breaking down these spreadsheet files into groups and sub-groups, you can identify patterns and generate insights otherwise not apparent to the naked eye.
The best way to structure these spreadsheet files is using the Grouping feature in Gigasheet, the big data spreadsheet.
In this guide, we’ll show you how to group spreadsheet files in Gigasheet to manage your data more efficiently.
Let's dive in!
Creating groups and sub-groups in Gigasheet is a simple process.
Here’s how to get started:
Log in to your Gigasheet account and click ‘New’ to start working on your data set.
Or, if you’re working with multiple files, create a folder, and then upload the file.
You can use a locally stored file, directly import one from cloud storage, or use a link.
Ours is a .xlsx file, but .csv, .json, .zip, and other commonly used file formats are supported too.
You can see the file in your library once it has been uploaded.
The spreadsheet we’re going to use has over 4 million rows.
This would have maxed out constraints on programs like Excel (1 million rows) and Google Sheets (10 million cells).
But that’s not a problem with Gigasheet, which supports files containing up to 1 billion rows.
After importing your data set to your Gigasheet account, run a few basic hygiene checks to prepare it for grouping.
This involves:
For example, in our spreadsheet containing eCommerce sales data, unit prices were being reflected as plain text instead of numeric data.
To fix an error like this, click on the column heading and select the data type.
And change the data type to an appropriate format.
Gigasheet also provides a Data Cleanup feature that fixes errors such as:
in one click.
Once you’re certain your data is clean, you can start the grouping process.
First, decide which attribute you want to base your groups on.
In our eCommerce data set, we have several options.
We can sort the sales data based on country, customer ID, or customer cohort.
In our case, we want to get an idea of country-wise sales, so we’ll be selecting the ‘Country’ column.
Just click the “Group” option in the header:
Grouping in extensive spreadsheets like this one gives you a bird’s eye view of the data.
Where earlier we had rows upon rows to scroll through, the grouping action has facilitated first-glance understanding by condensing everything into 6 rows.
You can further sort this data in Gigasheet by creating sub-groups, i.e., groups within groups.
To process our data further, we decided to nest a sub-group according to the customer cohort within the main country groups.
The process to be followed is the same as for creating groups.
Just click the “Group” header in the header and select “Add Column.” Next, select the attribute for the sub-group from the drop-down menu.
You’ll see the sub-group attribute appear next to the main group attribute. In our case, these are the customer cohort and the country respectively.
When you click on each row in the main group, u’ll see the data sorted according to the sub-group attribute.
If you want this data to be sorted differently, say, the customer cohorts first, and then countries as a sub-group, just toggle between the two column names, and Gigasheet will reverse the group hierarchy.
Just don’t forget to make a copy of the sheet before hitting toggle if you want to preserve both views.
You’ll now be able to see customer cohorts as the main group, and on clicking on the drop-down button in each row, the country data for each.
You can then perform operations like SUM, Min, Max, and more to extract insights out of your arranged data.
For example, after structuring our data here, we wanted to gain insights into the total quantity and unit price for each customer cohort for different countries.
So, we just applied the “SUM” function under “Quantity” and “Unit Price” columns.
If you're looking for a good shortcut (who isn't?), Gigasheet has an AI Sheet Assistant feature to which you can delegate data engineering tasks like grouping.
You can access it by clicking on this button on the panel to your right.
Type in your grouping command. "Show me the count of rows grouped by country." Or something like this.
And let the Sheet Assistant work its magic.
You’ll see the results are the same as the ones we got from our manual grouping task.
We took things a step further and saw if the Sheet Assistant could also make the customer cohort sub-group we created earlier.
Here are the results:
The Sheet Assistant explains the steps it has followed to pivot the data, so you can easily identify and fix discrepancies.
With a clean data set and clear commands, you’ll easily be able to harness the power of AI to streamline data processing.
When it comes to big data processing and analysis, Gigasheet offers unmatched versatility and efficiency.
You get to enjoy large-scale data support beyond anything offered by traditional software, at the speed of cloud solutions, and backed by the intuitiveness of AI.
We've shown you just how easy Gigasheet makes it to perform data analysis tasks as integral as grouping.
You can process large files, clean up data, create groups and sub-groups, and assign tasks to a nifty little AI assistant without juggling the risks of system crashes and memory shortages.
Want to try it for yourself? Sign up for a free account here.