Analyzing Quality Coffee: The World's Best Coffee (According to The Data)

In 1824, Thomas Jefferson declared "Coffee is the favorite drink of the civilized world!” And nearly 2 centuries later, Americans continue to prove him right. The National Coffee Association reports that Americans consume a whopping 517 million cups of coffee every day!

But what makes coffee so special that practically every adult runs on it?  Is it the sweetness, flavor, or aroma which gives us the kick we need to get through the day? And which coffee is the best?

For this blog, we have taken data about Arabica and Robusta coffee from the Coffee Quality Institute and merged them into one dataset using Gigasheet, a free online CSV viewer, that allows for easy data analysis and visualization.

Let us look at some stimulating data about the elixir of modern life. So grab your cup of coffee, and let us get started.

Quality Coffee Dataset

Often times, all of the data you need to answer a question is not in one file and you need to first combine them before proceeding. If you have multiple datasets with the same columns, you can upload them on Gigasheet and merge CSV files into one file for easy data analysis.

Merging CSV  files in Gigasheet containing data about quality coffee

Learn more about merging data files in Gigasheet.

The merged file containing both Arabica and Robusta coffee looks something like this:

The merged CSV with 1339 rows

This comprehensive dataset has 44 columns, out of which we require only a few at a time.

Viewing selected columns in Gigasheet is easy - just check off the ones you need in the side panel.

How to view selected columns in Gigasheet

(P.S. If you are a coffee maniac, you can jump ahead and start exploring the data. Or, continue reading for a crash course in coffee quality scores.)

Here is a brief explanation of the selected data fields. They are the parameters used to assess coffee quality, and each one lies in the range of 1-10.

  1. Aroma
  2. Flavor
  3. Aftertaste
  4. Acidity
  5. Body
  6. Balance
  7. Uniformity
  8. Clean cup
  9. Sweetness
  10. Cupper points

These values are then added to evaluate the Total Cup Points (out of 100). Then the defects are subtracted to get the final score.

(So the next time you buy that fancy gourmet coffee, rest assured that it has undergone a variety of tests that justifies its price.)

The coffee is classified based on the quality score as follows:

A table describing how coffee quality is rated as per the Coffee Quality Institute (CQI)

Okay, enough about coffee quality. Time to explore the data.

Which Country Produces The Best Coffee?

Filter the dataset by the ICQ's criteria for 'Outstanding,' i.e. the total cup points should be greater than 90. (Pro Tip: With the save filter option, you can also save and reuse this filter later.)

Using a filter in Gigasheet

Here are the results. Only 1 row is in the top category! Clearly, the Arabica coffee from Ethiopia is the winner. ☕

What if you broaden your search to include the top two categories? Add another filter where the Total Cup Points are greater than or equal to 84.99.

This gives us 106 rows, and a lot more data to work with.

The resulting dataset after applying one filter

Group this data by 'Country of origin.' Once again, Ethiopia comes on top with 25 of the 106 varieties.  

Grouping the data by country of origin

Visualizing the data (top 10 rows):

Gigasheet UI
A graph generated through Gigasheet. It shows the top 10 countries that produce excellent or outstanding coffee.

From this chart, it's clear that Ethiopia produces the best coffee in the world!

You can dive in further to figure out which region in Ethiopia produces the best coffee.

Right-click on the group name, and select 'Filter to this.'

Using "filter to this option" on Grouped Data in Gigasheet

Now, you are left with one group. Next, add another group by condition and further group the rows by regions in Ethiopia:

Nested groups in Gigasheet
The resulting dataset after two group by conditions

Add a bar graph to visualize this data, and you will see that most of the 25 best coffees produced in Ethiopia come from the Oromia region.

A bar chart generated by Gigasheet. It shows the top coffee producing regions in Ethiopia.

Specialty coffee brands mention the "Region of Origin" on their packaging. So, the next time you go to the store, you know what to look for!

But, maybe you don't really care about the coffee's origin. You like to wake up to the smell of fresh coffee or enjoy the flavor. But what do coffee testers think? Does good-smelling coffee always taste good?


Aroma and Flavor: The Big Question

Are aroma and flavor linked?  Let us see what the data says.

First, filter the dataset for coffees that score more than 7 on the aroma scale. (Of course, you can also take lower scores when you analyze the data.)  

After filtering, you'll see that several values are repeated. To simplify, group these rows by the value of the column Aroma. Then, find the average of the flavor score.

The resulting view contains 23 groups.

Filtered dataset grouped by Aroma

And now, add a chart plotting the average flavor score for a group of coffees that have a particular aroma score.

A line graph generated by Gigasheet.

The aroma and flavor increase proportionally. So, the conclusion is: coffee that smells good, tastes good!

Natural vs Washed: Which Processing Method is The Best?

Traditionally, coffee beans are sun-dried and fermented (the natural/dry method). The modern process involves machines that de-pulp coffee cherries to extract the coffee bean (the washed/wet method.)

Does the processing method make a difference in the coffee quality?

Select only those rows where the processing method is 'Natural/Dry' or 'Washed/Wet.' Then, group by 'Processing Method' and calculate the average value of each quality score metric.

(Note: The 'Uniformity' score in this dataset is incorrectly entered as a plain-text format. Plain text values won't let you perform aggregate functions that work on numbers. Before calculating the average, convert it to Number->Decimal. Learn more about converting data types here.)

You will get something like this:

Filtering and grouping by Processing method.

Creating a chart with these two groups:

A graph generated through Gigasheet. It shows the quality score distribution across processing methods.

There is hardly any difference in numbers according to this dataset. Coffee snobs are wrong about this one, and that's the tea. 🍵

Brew Data Insights With Gigasheet

Crunching numbers all day? Coffee will solve most of your problems. For the rest, there is Gigasheet.

Gigasheet is a free-to-use data platform that lets you view, explore, merge, and analyze large datasets. What's more, you can upload and share them with your friends and colleagues - even if they don't have a Gigasheet account.

Whether your data is stored in a JSON file or a CSV file, whether it is located on your cloud drive or local drive, Gigasheet can unroll even the largest of files within minutes.

Create your free account today.

For more analytical fun and inspiration, stick around and read more blogs or check out our data community.

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Training
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.