Shopify Fraud Analysis

Shopify’s a name you can’t miss if you’re in the e-Commerce business. After all, the B2B e-Commerce platform has a whopping 1.7 million customers and is used to automate much of the electronic shopping experience for customers and businesses alike. If you haven't already be sure to check out our other post on Analyzing Shopify Data Exports.

If you’re looking to upscale your shop, learn trends and patterns from your customers, and analyze your Shopify big data, Gigasheet can greatly help you out! In this article, I’ll explore a sample sales dataset that I’ve acquired from Shopify using standard data analysis techniques. Let’s see what we find! 🚀

If you're not already familiar with Gigasheet, it's a free big data analysis spreadsheet. You can use it as an online CSV viewer of course, or to do things like merge tons of CSV files. But perhaps most relevant for this post, we'll be using it for big data analysis. The great thing about Gigasheet is that it works much like a spreadsheet, so no training is required.

Analyzing Shopify’s Data

The dataset we’ll be analyzing was part of Shopify’s recruitment challenges from 2021. If you’d like to follow along or get a sampler, you can acquire the dataset from Kaggle here. Once done, head over to Gigasheet, upload the dataset, and you’re good to follow along – starting with Data Exploration.

Upload file to analyze fraud data

Shopify Data Exploration

Let’s first take a quick look at the data and see what we’ve got here. From the UI itself, we can see it has 7 columns and 5,000 rows. Out of the 7 columns, three of them are ID fields (shop, user, and order) and the remaining 4 show more information about the orders themselves.

As part of exploration, we can also look for missing values which can throw off our analysis moving forward.

There are several ways to look for missing values in Gigasheet:

  • You can apply filters to your columns to see if the value is empty
  • You can group values for the column and see if any grouping has an empty label

We’ll look at missing values in the Data Cleanup section in a little more detail.

Shopify Data Cleanup

If you’ve had any hits against the filter or grouping, you can now delete all the matching rows or use a substitute value. Say you’d like to use the average value – you can group the values against that column and calculate the average. Here’s an example of the data grouped by “shop_id” and the average of “order_amount” (click the down arrow for more options).

Grouped Shopify data in Gigasheet

Luckily, this dataset doesn’t have any missing value. Neither does the rest of the dataset need cleanup. For a bonus or if you’d like to map the dataset against days, months, or year, you can explode the date. Here’s an example of how the “exploded” date looks:

Explode data breaks Shopify dates into its components

No more Python functions, memorizing datetime functions, or using the tz library to modify the time zones – it’s super simple to work on dates using Gigasheet!

Shopify Data Analysis

Grouping the shop_id column, we see the dataset contains sales data from 100 stores (the “Rows” count shows the number of grouped values). Using the “Row Count” aggregation on the order_id column, we can retrieve the number of orders against each of these shops. Here are the top five stores:

Shopify Fraud Data grouped by the stores

However, there’s no anomaly against the number of orders from a store. They’re all averaging the same figures. We can pivot from this field for now.

Next, let’s sort the data against the average of the order_amount column. Two shops stick out of the dataset and are clear outliers – shop 42 and 78 with an order amount of 235,101 and 49,213. How about we check which users account for these insanely large orders?

Average order amount of the Shopify Data: Detecting Fraud

We can group the data against the user ID’s and then use the average aggregation on the order_amount column. It’s likely we’re observing some form of fraud from the user against the ID, 607 and 878 (most likely a normal user spending more than the average figure).

User ID Groups for Shopify Data Analysis

We can also use a sweet bar graph to view this data visually using charts:

Creating Charts of Shopify Data in Gigasheet

Did you know? You can create all kinds of visual diagrams and graphs in Gigasheet, customize them, and save them as PNGs for your reporting!

Let’s first pivot our analysis on the user 607 because the user clearly indicates fraud and patterns of abnormal usage. I’ll use a filter to select all data from the user.

Once filtered, we can see an even clearer pattern of some form of automated fraud activity where the user purchases 2000 items every day at 04:00 UTC from the shop, 42, using a credit card. This is millions of dollars in potentially bogus sales!

A clear pattern of fraud activity is apparent in the Shopify Data

Now that the analysis on 607 is concluded, let’s pivot to the second most suspicious user – 878. Filtering results in a few rows with one outlier – the total order_amount is 154350 from the shop with the ID, 78. Well, this isn’t substantial evidence of anything at the moment.

Getting closer to the fraudulent Shopify data

Finally, I’ll pivot into the third user with the ID – 766. Again, another order looks suspicious with the total order_amount being 77175 from the shop with the ID, 78. Well, the store seems to be behind several suspicious orders. Maybe this should be our next pivot.

The fraud is found!

Filtering on the shop_id of 78, we get 46 results with almost all order_amount values pointing to an anomalous figure. Some of these order amounts are the same against multiple user IDs . The shop seems to be the hub for fraud as well.

Fraudulent transactions in shop id 78

Reporting

At the start of our analysis, the average order value from the dataset of 100 stores selling shoes was around $3145.13 which was clearly anomalous for something like shoes. As part of our data analysis using Gigasheet, we uncovered two scenarios of fraud from the shop ID, 78, and user ID, 607.

After removing the two outliers from our dataset, we get an average order value of 302.58 which is how it should’ve been before.

What’s Next?

That’s all for this Shopify fraud data analysis, folks! Hopefully techniques like this will help you prevent fraud in your shop. Beyond fraud prevention, Gigasheet offers a wide range of features for Shopify owners and online sales analysts to explore large datasets, create graphs, and report results in beautiful visuals.

Want to take Gigasheet out on a test ride? Well, you can… for free! Sign up for an account today, upload your data, and go crazy – billions of rows and we don’t even break a sweat!

The ease of a spreadsheet with the power of a data warehouse.

No Code
No Training
No Installation
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.