Gigasheet is constantly improving our product with the goal of making analyzing Big Data easy and accessible to everyone. Recently, you may have noticed an update to the Menu Bar, which makes it easier to find file operations, functions, and enrichments. This article focuses on the first option and the power of "Save As."
Big Data means big files, but sometimes only a few rows and columns are relevant to the question you are trying to answer. As the researcher, you goal is to trim your data set down to what is important, and the present "just the facts" to others as a simple narrative. Gigasheet makes this easy to do using filters and groups to eliminate rows, and icons to hide, delete, or rearrange columns. Once you have the file just how you want it, simply hit "Save As" to create a new file that only contains the visible rows and columns. As the unnecessary data is removed, the file size drops, and you are left with a file that perfect for sharing!
To illustrate the power of "Save As" we are going to work with the Housing Market Data from Redfin that is available in our Data Community. This is a huge file - 1.9 GB - with 2 million rows and 99 columns. To put that into perspective, the limit for email attachments in Gmail is 25MB, and this file is 1,900 MB (1GB = 1,000 MB). So, we have our work cut out for us!
Let's say we want to focus only on Housing Data for Austin, TX, in the year 2020. We must create a filter with multiple conditions, first on region name containing "Austin, TX", and then on date filters using after 1/1/2020 as our start date and before 1/1/2021 as our end date. Here is what that filter would look like:
As you can see at the bottom of the screen, our file has been filtered down from 2 million rows to just 139 rows specific to Austin, TX, in 2020.
As a reminder, you can also Group data, to summarize it by values in a column. For instance, I am curious about the duration that these houses remained on the market. I hit Group and then select the duration column, and get these results:
Interestingly, the houses on the market the longest only have a slightly higher median price. These are the types of insights that Groups can provide, so if you haven't tried it yet, start grouping!
Now that we have our data on just Austin in 2020, and we have poked around it a bit, let's save it as a new file to reduce the file size. Hit "Save As" and create a new file, with just these 139 rows. You are given the opportunity to name your file, in this case "'21 Austin Sales - Filtered/Grouped", and then will receive a notification that a new file has been created.
Note, Filters control which rows will be included in the new file during a "Save As." Groups have no impact on inclusion/exclusion and will have to be recreated in the new file.
Click "Open it now" to view your new file. This will open in a new tab, leaving the original file open. There is now a significant difference in the amount of data in each file. If we view the new file in our Library, we see that we have reduced it from 2 million rows to 139 rows, and the size has gone from 1.9GB to 122 kB ( 1 GB = 1 million kB)!
The file is tin ynow, but we can go even further by removing unnecessary columns in Step 2.
This data set has 99 columns, but we are only interested in two critical data points: homes sold and median sales price for each period. Removing unnecessary columns is as easy as unchecking a box. Locate the list of columns and remove ones that we do not need, keeping the 6 below.
Fun Fact: You can also reorder the columns by dragging them up and down in the columns list.
Once complete, let's use "Save As" again to create another new file called "'21 Austin Sales - Removed Columns."
The resulting file will only have the 6 columns we selected above in Step 2, and the 139 rows we filtered to in Step 1.
Looking at our library, we can now see all 3 files and their sizes. Going from 98 columns to 7 reduced the file size from 122.87kb to 8.78kb. Tiny!
Only the critical data remains, and it can easily be shared and opened by others. At this point, the file is so small that any software running on any device could open it. Start exploring your data set with Gigasheet and use "Save As" to create the perfect output.