How To
Jan 12, 2024

How to Group Large Datasets in Four Easy Steps

Analyzing a spreadsheet file with 10-20 rows is very straightforward.

But what if you were asked to analyze one with over 1,000 rows? Or 100,000? Maybe a million? Or a billion?

To make sense of large spreadsheet files, it’s a good practice to structure them into smaller units. By breaking down these spreadsheet files into groups and sub-groups, you can identify patterns and generate insights otherwise not apparent to the naked eye. 

The best way to structure these spreadsheet files is using the Grouping feature in Gigasheet, the big data spreadsheet.

In this guide, we’ll show you how to group spreadsheet files in Gigasheet to manage your data more efficiently. 

Let's dive in!

How to group large datasets in 4 easy steps  

Creating groups and sub-groups in Gigasheet is a simple process. 

Here’s how to get started:

Step 1: Upload or import your spreadsheet 

Log in to your Gigasheet account and click ‘New’ to start working on your data set. 

Or, if you’re working with multiple files, create a folder, and then upload the file.

Upload a spreadsheet file to Gigasheet to group large datasets

You can use a locally stored file, directly import one from cloud storage, or use a link. 

Ours is a .xlsx file, but .csv, .json, .zip, and other commonly used file formats are supported too.

You can see the file in your library once it has been uploaded. 

The spreadsheet we’re going to use has over 4 million rows. 

eCommerce large spreadsheet file

This would have maxed out constraints on programs like Excel (1 million rows) and Google Sheets (10 million cells).

But that’s not a problem with Gigasheet, which supports files containing up to 1 billion rows.

Step 2: Clean your data

After importing your data set to your Gigasheet account, run a few basic hygiene checks to prepare it for grouping. 

This involves:

  • checking for inconsistencies (e.g. numeric data being shown as non-numeric and vice versa),
  • locating and filling missing values in the data, and 
  • ensuring rows and columns are aptly titled.

For example, in our spreadsheet containing eCommerce sales data, unit prices were being reflected as plain text instead of numeric data.

To fix an error like this, click on the column heading and select the data type. 

Gigasheet change data type

And change the data type to an appropriate format.

Gigasheet also provides a Data Cleanup feature that fixes errors such as:

  • highlighting duplicate data, 
  • standardizing organization names, 
  • converting UNIX time, and 
  • decoding Base64 data to plain text

in one click.

Gigasheet data cleanup

Step 3: Group data using a particular attribute

Once you’re certain your data is clean, you can start the grouping process. 

First, decide which attribute you want to base your groups on. 

In our eCommerce data set, we have several options. 

We can sort the sales data based on country, customer ID, or customer cohort.

In our case, we want to get an idea of country-wise sales, so we’ll be selecting the ‘Country’ column.

Just click the “Group” option in the header: 

Grouping in extensive spreadsheets like this one gives you a bird’s eye view of the data.

Where earlier we had rows upon rows to scroll through, the grouping action has facilitated first-glance understanding by condensing everything into 6 rows.

Data grouped by country using Gigasheet's grouping feature

Step 4: Create groups within groups (nesting) 

You can further sort this data in Gigasheet by creating sub-groups, i.e., groups within groups. 

To process our data further, we decided to nest a sub-group according to the customer cohort within the main country groups.

The process to be followed is the same as for creating groups.

Just click the “Group” header in the header and select “Add Column.” Next, select the attribute for the sub-group from the drop-down menu.

You’ll see the sub-group attribute appear next to the main group attribute. In our case, these are the customer cohort and the country respectively.

Create groups within groups

When you click on each row in the main group, u’ll see the data sorted according to the sub-group attribute. 

If you want this data to be sorted differently, say, the customer cohorts first, and then countries as a sub-group, just toggle between the two column names, and Gigasheet will reverse the group hierarchy.

Interchanging the arrangement of grouping

Just don’t forget to make a copy of the sheet before hitting toggle if you want to preserve both views.

You’ll now be able to see customer cohorts as the main group, and on clicking on the drop-down button in each row, the country data for each. 

You can then perform operations like SUM, Min, Max, and more to extract insights out of your arranged data.

For example, after structuring our data here, we wanted to gain insights into the total quantity and unit price for each customer cohort for different countries.

So, we just applied the “SUM” function under “Quantity” and “Unit Price” columns.

You can also use the Sheet Assistant for grouping large datasets

If you're looking for a good shortcut (who isn't?), Gigasheet has an AI Sheet Assistant feature to which you can delegate data engineering tasks like grouping. 

You can access it by clicking on this button on the panel to your right.

Gigasheet Sheet Assistant to group large datasets

Type in your grouping command. "Show me the count of rows grouped by country." Or something like this.

Gigasheet Sheet Assistant Grouping Prompt

And let the Sheet Assistant work its magic.

You’ll see the results are the same as the ones we got from our manual grouping task.

We took things a step further and saw if the Sheet Assistant could also make the customer cohort sub-group we created earlier.

Gigasheet Sheet Assistant Creating Sub-Group

Here are the results:

The Sheet Assistant explains the steps it has followed to pivot the data, so you can easily identify and fix discrepancies. 

With a clean data set and clear commands, you’ll easily be able to harness the power of AI to streamline data processing.

Group your data with Gigasheet!

When it comes to big data processing and analysis, Gigasheet offers unmatched versatility and efficiency. 

You get to enjoy large-scale data support beyond anything offered by traditional software, at the speed of cloud solutions, and backed by the intuitiveness of AI.

We've shown you just how easy Gigasheet makes it to perform data analysis tasks as integral as grouping.

You can process large files, clean up data, create groups and sub-groups, and assign tasks to a nifty little AI assistant without juggling the risks of system crashes and memory shortages.

Want to try it for yourself? Sign up for a free account here.

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Training
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.