In 1824, Thomas Jefferson declared "Coffee is the favorite drink of the civilized world!” And nearly 2 centuries later, Americans continue to prove him right. The National Coffee Association reports that Americans consume a whopping 517 million cups of coffee every day!
But what makes coffee so special that practically every adult runs on it? Is it the sweetness, flavor, or aroma which gives us the kick we need to get through the day? And which coffee is the best?
For this blog, we have taken data about Arabica and Robusta coffee from the Coffee Quality Institute and merged them into one dataset using Gigasheet, a free online CSV viewer, that allows for easy data analysis and visualization.
Let us look at some stimulating data about the elixir of modern life. So grab your cup of coffee, and let us get started.
Often times, all of the data you need to answer a question is not in one file and you need to first combine them before proceeding. If you have multiple datasets with the same columns, you can upload them on Gigasheet and merge CSV files into one file for easy data analysis.
Learn more about merging data files in Gigasheet.
The merged file containing both Arabica and Robusta coffee looks something like this:
This comprehensive dataset has 44 columns, out of which we require only a few at a time.
Viewing selected columns in Gigasheet is easy - just check off the ones you need in the side panel.
(P.S. If you are a coffee maniac, you can jump ahead and start exploring the data. Or, continue reading for a crash course in coffee quality scores.)
Here is a brief explanation of the selected data fields. They are the parameters used to assess coffee quality, and each one lies in the range of 1-10.
These values are then added to evaluate the Total Cup Points (out of 100). Then the defects are subtracted to get the final score.
(So the next time you buy that fancy gourmet coffee, rest assured that it has undergone a variety of tests that justifies its price.)
The coffee is classified based on the quality score as follows:
Okay, enough about coffee quality. Time to explore the data.
Filter the dataset by the ICQ's criteria for 'Outstanding,' i.e. the total cup points should be greater than 90. (Pro Tip: With the save filter option, you can also save and reuse this filter later.)
Here are the results. Only 1 row is in the top category! Clearly, the Arabica coffee from Ethiopia is the winner. ☕
What if you broaden your search to include the top two categories? Add another filter where the Total Cup Points are greater than or equal to 84.99.
This gives us 106 rows, and a lot more data to work with.
Group this data by 'Country of origin.' Once again, Ethiopia comes on top with 25 of the 106 varieties.
Visualizing the data (top 10 rows):
From this chart, it's clear that Ethiopia produces the best coffee in the world!
You can dive in further to figure out which region in Ethiopia produces the best coffee.
Right-click on the group name, and select 'Filter to this.'
Now, you are left with one group. Next, add another group by condition and further group the rows by regions in Ethiopia:
Add a bar graph to visualize this data, and you will see that most of the 25 best coffees produced in Ethiopia come from the Oromia region.
Specialty coffee brands mention the "Region of Origin" on their packaging. So, the next time you go to the store, you know what to look for!
But, maybe you don't really care about the coffee's origin. You like to wake up to the smell of fresh coffee or enjoy the flavor. But what do coffee testers think? Does good-smelling coffee always taste good?
Are aroma and flavor linked? Let us see what the data says.
First, filter the dataset for coffees that score more than 7 on the aroma scale. (Of course, you can also take lower scores when you analyze the data.)
After filtering, you'll see that several values are repeated. To simplify, group these rows by the value of the column Aroma. Then, find the average of the flavor score.
The resulting view contains 23 groups.
And now, add a chart plotting the average flavor score for a group of coffees that have a particular aroma score.
The aroma and flavor increase proportionally. So, the conclusion is: coffee that smells good, tastes good!
Traditionally, coffee beans are sun-dried and fermented (the natural/dry method). The modern process involves machines that de-pulp coffee cherries to extract the coffee bean (the washed/wet method.)
Does the processing method make a difference in the coffee quality?
Select only those rows where the processing method is 'Natural/Dry' or 'Washed/Wet.' Then, group by 'Processing Method' and calculate the average value of each quality score metric.
(Note: The 'Uniformity' score in this dataset is incorrectly entered as a plain-text format. Plain text values won't let you perform aggregate functions that work on numbers. Before calculating the average, convert it to Number->Decimal. Learn more about converting data types here.)
You will get something like this:
Creating a chart with these two groups:
There is hardly any difference in numbers according to this dataset. Coffee snobs are wrong about this one, and that's the tea. 🍵
Crunching numbers all day? Coffee will solve most of your problems. For the rest, there is Gigasheet.
Gigasheet is a free-to-use data platform that lets you view, explore, merge, and analyze large datasets. What's more, you can upload and share them with your friends and colleagues - even if they don't have a Gigasheet account.
Whether your data is stored in a JSON file or a CSV file, whether it is located on your cloud drive or local drive, Gigasheet can unroll even the largest of files within minutes.
Create your free account today.
For more analytical fun and inspiration, stick around and read more blogs or check out our data community.