The JSON data format has taken over the world for a reason. By being human-readable and system-independent, it lets applications store massive amounts of nested data. Those who can code in languages like Python love JSON. But there are a lot more people who need to work with JSON files but who cannot script in Python, and therein lies the rub. If you're searching for an easy way to view and open large JSON files, look no further!
A non-programmer trying to open a large JSON file, or open many JSON files, will quickly grow frustrated. The first tactic is always trying to open a JSON file in Microsoft Excel. Either opening JSON on Windows Excel or opening JSON on Mac Excel, opening JSON in Excel is a frustrating experience. Even if the JSON is flattened in Excel, the nested nature of the data will not work with a rows-and-columns interface. In this article we'll show you how to open large JSON files on your own, without coding or installing anything.
Check out our other articles:
So what can a user do to view and open large JSON files? Up until now, there was no great option. There are a few web-based tools that will let you view JSON, but searching for "How To Work With JSON" or "How to open large JSON files" returns lots of Stackoverflow pages and Github links that essentially boil down to: write a Python script.
As an example, let’s look at this JSON file using the command line (aka, Terminal).
This is a dataset of 8,145,323 IOT records, each in its own JSON array:
And looking at the data, you see each record has a distinct combination of sub-nesting, as sometimes it has sender and sometimes recipient. And let’s say you want to answer a simple question like: which domain shows up the most in all email addresses, either as sender or recipient?
Opening this JSON in Excel won’t work. Because of the non-standard nesting, any sort of text-to-columns will fail.
Cutting and pasting this into a web-based tool won’t work. The dataset is nearly 2GB in size, and will most likely crash a browser.
So, how do we open this JSON file so we can analyze it? Let's dig in!
At this point, it has likely become clear that — for our long-suffering analyst, who just wants to answer a simple question — workarounds don’t get the job done. If you want to open big JSON files (potentially running to hundreds of millions of rows) you need to take a completely different approach.
One such option is to use Python, or another similarly powerful coding or scripting language.
Python is a general-purpose programming language that, among other uses, has historically seen a lot of uptake in the scientific and mathematical communities. Its high-performance nature and built-in library of useful modules make Python an extremely powerful tool for interrogating and visualizing huge datasets.
And Python is far from the only option. Just about every programming language has ways to interrogate even the largest JSON datasets. However, they all run into the same problems: time and complexity.
As an analyst, do you have the capacity (or inclination) to learn one or more programming languages just to analyze large JSON files? Even if you do, will you ever be completely confident that your custom-written scripts and tools are watertight? Ultimately, while Python and other scripting languages are undoubtedly an option, they don’t fulfill our criteria.
Analysts don’t need to be full-on data scientists or programming experts - they need to be analysts! So many large datasets are stored in and output to JSON, and analysts need simple, powerful solutions that allow them to work with these huge JSON datasets, just as easily as they would with a smaller file, using a universal spreadsheet-like application.
If you’ve been waiting for the pitch, here it is.
Gigasheet is a no-code analyst workbench that allows analysts to work efficiently with even the largest datasets.
No longer will you be pressured to be a ‘unicorn analyst’ who can code, manage databases, and perform data science tasks. With Gigasheet, you can open large JSON files with millions of rows or billions of cells, and work with them just as easily as you’d work with a much smaller file in Excel or Google Sheets.
So in our IOT data example, it’s easy enough to upload our large JSON file into Gigasheet. You can even zip your file before uploading to save time. Once it’s loaded, we can open it and see the flattened JSON structure in a rows-and-columns format:
In one spot, the JSON data is loaded, flattened, and ready for analysis. Easy enough to see how the sender and recipient addresses are made distinct.
Now that we’re in a spreadsheet, analysis is straightforward. Apply a function to split the email columns at the @ symbol:
Do that on both sender and recipient, and it looks like this:
We can likewise use the combine column function in a similar way:
Which results in one column with either the sender’s or recipient’s domain.
A simple Group by that new column produces the answer to the original question, showing how many of each domain appear in the dataset:
There you have it, an easy way to view and open large JSON files! We can further break this down by how many are sender and how many are recipient by using an aggregation:
We think this is the best way out there to open large JSON files and work with nested JSON datasets. We believe we've finally created an easy solution to the age old “how to open a large JSON file" question.
Your first 3GB are free. Sign up here.
We want to know exactly what problems you face so we can make sure Gigasheet is equipped to solve as many of them as possible. If you’d like to help us make Gigasheet the ideal solution for your large JSON problems, create a free account.