Amazon Book Reviews: Exploration and Sentiment Analysis

How do people select a book as their next read? They ask their peers, search the internet, and look at book reviews before making up their minds. Reviews may help a reader pick up their next book, but for publishers and booksellers, they are a gold mine of insights into readers' behavior.

If readers like a book, they want to talk about it, recommend it to others, and they leave book reviews on Amazon and other e-commerce platforms. And if they don't, they make sure others do not have to go through the misery they did. For publishers, it is an opportunity to analyze what kind of books readers prefer and authors they like and answer more such questions.

So, let us use Gigasheet to analyze a dataset with book reviews on the Amazon website, explore trends and perform sentiment analysis.

Analyzing Amazon Book Reviews with Gigasheet

Gigasheet is a free-to-use no-code data analytics tool making it perfect for quick analysis, especially if coding and database are not your forte. Also, you work with a familiar spreadsheet interface that allows you to filter data, group columns, and perform calculations without hassle.

Being a cloud-based solution, Gigasheet can process enormous datasets without requiring heavy processing power on the users' end. You are good to go as long as you have an Internet connection.

So, we will head straight to the Gigasheet web app, log in to our account, and upload the book reviews dataset. (If you don't have a Gigasheet account? Sign up for free here!)

We use a dataset containing book reviews on Amazon for our book review analysis. It is a CSV file with 3 million rows of data with the following columns of information:

  1. The Id column contains the Id of books.
  2. The Title column contains the name of the book.
  3. The User_id column contains the Id of the user who rates the book
  4. The profileName column includes the name of the user who reviews the book
  5. The review/helpfulness column contains a helpfulness rating of the review, e.g., 2/3.
  6. The review/score column contains ratings from 0 to 5 for the book.
  7. The review/time column contains the time of the given review.
  8. The review/summary column contains the summary of a text review.
  9. The review/text column contains the full text of a review.

Cleaning Up the Amazon Book Reviews Dataset

Before beginning our analysis, let us do a hygiene check on our Amazon book reviews dataset. First, we will look for rows with empty column cells to have accurate data for our analysis. For that, we will use Gigasheet's Filters to exclude all rows with even one column without any entry.

Gigasheet provides a pre-built condition to filter empty or filled rows, and we just have to select the columns we need to check for. Here is what our Filter to exclude rows with blank cells looks like:

Filters in Gigasheet on amazon book reviews

If you want to know how many empty cells your dataset has but retain an entire row, Gigasheet has a handy quick calculation feature. Head to the bottom of each column, bring up the drop-down menu and select from the list of calculations like Average, Sum, Empty, Filled, etc.

Quick calculations in Gigasheet of amazon book review data

Popular Books on Amazon

Now that we have filtered and excluded empty cells, we can move to analyze our dataset. First, we want to know which books are popular on Amazon. In our case, we can consider books with the most reviews as books with as popular books.

For this analysis, we need to group our data by book names. So, we will head over to Gigasheet's menu bar and create a group by Title column. Make sure to select row count for any one column to sort data in ascending or descending order for grouped rows. And here are the top five books in our Amazon book review dataset and their total review count:

Group by function in Gigasheet.
Context menu in Gigasheet allowing us to sort amazon book reviews

Our Amazon book review dataset also has a column review/rating that contains a star rating for every book. So while we are at it, we will also see a breakdown of all reviews (1-star to 5-star rating) of the top five books in our Amazon book reviews dataset. For that, we will create a group for ratings within the group for books, as shown below.

Group by function in Gigasheet on  multiple columns

Now, let us visualize our findings. We have the top five books and a breakdown of their ratings. We will simply select the row count for each review/score group for the top five books, right-click, Chart Range > Column > Grouped.

Data visualization of amazon book reviews using Gigasheet

Finding Book Recommendations for Frequent Reviewers

Let's say we are booksellers or publishers who want to recommend books to our frequent readers. Analyzing their reviews can tell us a lot about their favorite genres, authors, or what aspect of a book they like most. Then, we can recommend books from our catalog that fit their taste based on our findings.

So, let us analyze our Amazon book reviews dataset and see if we can find our frequent readers and recommend the books. We will group data by profileName column and sort rows in descending order to find our frequent readers. These are our top frequent reviewers:

Grouped rows in Gigasheet showing amazon book reviews

Let us now see which books we can recommend to Shalom Freedman. First, we will copy the name to our clipboard. Then, we will create a filter to show books reviewed by Shalom Freedman that also have a 5-star rating.

Filters in Gigasheet.

Researching just a handful of books reviewed by reader Shalom Freedman tells us a great deal about their reading taste. Here is what the dataset tells us:

  1. Most books read and rated 5-star by Shalom fall into fiction and short story categories.
  2. Shalom has read multiple books by novelist Franz Kafka who can be one of their favorite authors.
  3. Shalom Freedman has read multiple books on Jewish culture.

So, after analyzing our Amazon book review dataset, we can recommend short stories, fiction, Jewish literature, or the work of Franz Kafka to Shalom Freedman.

Sentiment Analysis of Amazon Book Reviews

Sentiment analysis is exclusively a Natural Language Processing (NLP) technique that uses language models and machine learning to analyze and classify sentiment in pieces of text. But we can also perform basic sentiment analysis to validate simple hypotheses without requiring any AI model or writing any code.

First, we will need a set of rules to help us identify sentiments in reviews. For instance, reviews with words like hate, awful, terrible, disappointing, etc., strongly indicate that readers did not have a good time with a book. Therefore, we can classify such reviews as negative. While words like love, wonderful, amazing, great, etc. point toward a positive sentiment.

So, let us create a filter to show only reviews that have words and phrases with negative connotations, as shown below.

Filters in Gigasheet.

Then we will group the rows by column review/score. We can see reviews with a rating of 3 or below have more words that have negative connotations.

Similarly, we can create a filter to show reviews with words and phrases with positive connotations. We can see reviews with 5-star ratings have more positive words compared to lowly rated reviews.  

Grouped rows in Gigasheet.

Now, let us visualize this trend using a pie chart. Make sure the row count is visible for anyone column and select all the rows, right-click> Chart Range > Pie > Pie.

Data visualization using Gigasheet.

Get Started with Gigasheet

Analyzing customer reviews on e-commerce platforms helps businesses understand the needs and wants of their customers. Large companies for years have been using big data analytics to gain an edge over the competition. But the availability of free-to-use tools allows even small business and startups to understand their customers, provide the best customer experience, and grow their businesses.

That concludes our analysis of 3 million book reviews on Amazon. Wasn't it quick and easy? Gigasheet makes it easier for non-tech marketers, beginner-level analysts, and casual users to analyze big data. You can work with datasets with billions of rows of data and visualize your findings with a click of a button.

Sign-up to get your free Gigasheet account today and explore more such interesting big data analysis on our blogs.

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Training
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.