Look around - we’re living in the big data realm. There’s so much data to go around to perform analytics on and derive insights from that small-scale companies can rarely excel at this. This is where Gigasheet comes in.
We aim to help democratize big data so everyone has a fair shot at deriving business decisions, identifying trends, and insights from this vast amount of data from digital systems. So, how exactly do we do that? Keep reading!
If you’re just getting started with Data Warehousing, you might question - what’s ETL?
Short for "Extract, Transform, & Load", ETL is a process by which we can extract data from a system, transform it based on our needs, and later load it into another system (possibly a Data Warehouse). It is, by far, the most commonly used process for ingesting data into large data warehouses for processing of big data.
Gigasheet excels at transform operations which may include:
You might think - why can’t we utilize the good ol’ Excel for it?
Excel was meant to be a simple spreadsheet processor while assisting people in traditional data operations. That’s a thing of the past.
It can’t process large files and mostly fails. Not to mention, it doesn’t operate on JSON data from the get-go and requires great effort for this conversion to take place.
Gigasheet has an automated parser to convert your JSON data into a tidy spreadsheet for quick transform (ETL) operations. I’ll demonstrate that in a bit.
To briefly touch base on what a Data Warehouse is – it’s a repository or system which holds data from all digital systems acting as a central source of all cleansed and processed information which can later be used to make business decisions, generate insights, and more.
For example, as a retail store, a few source systems for the data would be:
ETL operations are used to process data into information, making it ready for consumption by a warehouse. Next, we’ll discuss how you can utilize Gigasheet for these operations.
Let’s break down ETL into its three phases:
Extraction is used to export data from a system or a source location to another system to perform transformation operations. This could be any internal system and depending on the exporting capabilities of that system, you can dump that information out in any format.
Gigasheet supports over 100+ data connectors and readily integrates data from these solutions to our data processor. Some examples include:
Don’t wish to use integrations and want to use flat files to import data into Gigasheet for processing? You can use these file formats to do so:
For a quick test run, you can use a dataset from Gigahseet’s newly launched Data Community. Just open a dataset, press Open in Gigasheet, and copy the sheet over to your account and get started with transforming the data.
I’ve got a sample JSON file containing data related to Dunkin’ stores in the United States which I’ll be uploading to Gigasheet.
All I’ve done is used the File Upload functionality to upload that JSON file in – and that’s it – the parser did the rest in converting the data. JSON data can be quite variable and it can quickly get out of hand if it’s not parsed and managed properly. If you were to manually convert the data from JSON into a spreadsheet-like data format, it’d require a lot of effort!
Using Gigasheet, we went from this:
To this spreadsheet with nicely categorized data in a matter of minutes:
Now that we’ve got the extraction part covered, let’s take a look at Transformation operations in Gigasheet.
First in line, I don’t want to export all this data for my warehouse. I’d rather select a few columns to be loaded. For that, I can choose one of two options:
If you choose the hide column feature, it won’t be included in the final export (which we’ll cover later). Removing the column, however, removes it from the sheet entirely and won’t be recoverable.
To hide a column, open the Columns menu from the right side of the screen. Press the Blue tick-marks to successfully hide the column from your exportable data.
To remove a column, select the column using the hamburger icon and press Delete to remove it from your dataset permanently.
Perhaps you’d like to split a few columns to better categorize your data? Take a look at the FRI_HOURS column. How about we split it using the ‘-’ as the separator into two columns? Opening and closing times on Friday. That might help compare it against times on other days.
Select the column, press Apply Function, and select Split column. Using the hyphen as the separator and this is how our data looks:
We can hide the original column from our dataset now as it’s split into two.
Well, this is good but I see the closing time for some stores is empty as they’re open 24 hours. How about we apply a filter to remove such stores from our dataset? Simply select the row and select the Filter to exclude this option from the menu.
There’s a lot more that we can do depending on the data we have. For example, you might want to:
Done transforming your data? It’s time to export it out of Gigasheet to load it into a tool or system of your choice. To export the data, simply press the Export button from the navigation bar at the top.
Once done, a prompt will show up saying Find your export in the Library. Head over to the Library or My Files page and you should see your data exported in a ZIP (archive containing a CSV of your data) ready to be downloaded (it might take some time depending on the size of your data).
Click on it and you should be able to download it. That’s it – it’s this easy to convert a JSON dataset into CSV using Gigasheet while having the freedom to apply transformations of your choice!
We’re simplifying big data processing, data warehousing, and data science operations by removing complexities from the entire process. No-code data science is the future and we’re happy to be working on this front.
Wish to be a step closer to the team behind all this greatness? Join our Slack server to get in touch with the team and people from different markets using Gigasheet for so many cool things!