Remove Duplicate Rows in Excel Based on Two Columns, and more!

The power of deduplication of rows in a spreadsheet and how it helps with big data sets cannot be overstated! Whether it’s a spreadsheet of customer data or a large dataset with millions of records, data deduplication can help you save time and reduce errors while ensuring that you have accurate, reliable data.

Data Deduplication is the process of removing duplicate entries from a dataset based on a single column, on two columns, or even multiple columns. It can help to reduce the size of a dataset, improve data integrity and accuracy, and improve the performance of analytics. It can also reduce the cost of storage and help you gain insights more quickly by eliminating redundant data.

Removing duplicate rows can be easily accomplished in most spreadsheets, and has been a top requested feature from Gigasheet users, who rely on our product to manage data that has too many rows for Microsoft Excel or over the cell limit in Google Sheets. Gigasheet is a free big data spreadsheet that makes analyzing big data easy, no code required.

In this blog, we will explore how deduplication can help you get the most out of your data by removing duplicates! We will cover:

How Does Removing Duplicates Work?
How Sorting Your Data affects Deduplication
How to Deduplicate based on Multiple Columns
How to Remove Duplicates in Excel based on One Column
Remove Duplicate Rows in Excel based on Two Columns
Google Sheets Remove Duplicate Rows based on One Column
Remove Duplicate Rows in Google Sheets based on Two Columns
Remove Duplicates in Gigasheet

How Does Removing Duplicates Work?

In its simplest form, removing duplicates works by checking the values of a single column for duplicates, and then deletes the rows containing duplicate instances of the value.

To illustrate this, we are going to look at a very simple file with columns containing Letters, Numbers, and Colors. For a simple deduplication, let's use the Letter column to remove duplicates.

How to Remove Duplicates in excel, 1 column example

We can immediately see that there are 3 unique values in the letter column:

A - 3 instances
B - 2 Instances
C - 5 Instances

When we run a deduplication of rows, the first instance of each value will be kept, along with the rest of the data in the row. The repeated instances of the value (highlighted in red) will be deleted, along with the rest of the data in the row.

Here are the results of running deduplication on the Letter column. As you can see, only the rows containing first instance of each letter remain:

Excel Remove Duplicate Rows based on one column

Pro Tip: Sort to Desired Order Before Removing Duplicates

In the example above, we demonstrated that the first row containing a duplicate value is retained, while the remaining rows with duplicate values are deleted. What if you want a particular value to remain?

You can control the first row by applying a sort to your data prior to removing duplicates from the file!

In our file above, A is paired with the colors Red and Yellow. Let's say Yellow is more important than Red. We can first sort the file on the column Color, with Yellow appearing above Red, which would look like this:

Sorting before remove duplicates google sheets

And removing duplicates on the Letter column would yield the following results:

Different Data Duplication Results based on Sorting

As you can see, as result of sorting, the letters A and C both now have the color Yellow in the Color column, whereas before the value Red was retained.

Real world examples would include sorting on a Date column to retain the most recent transaction by customers, or sorting on a Purchase Amount column to retain the highest price paid by customers.

Removing duplicates keeps the row associated with the first instance of a value, so we can use this to our advantage and sort to produce desired results.

Removing Duplicate Rows Based on Two or More Columns

A more advanced version of removing duplicate rows uses a combination of multiple columns as the value to deduplicate. Instead of looking for a single column, the Remove Duplicates function considers multiple columns when looking for duplicate values to remove.

Sound complicated? It's probably easier to grasp in a picture, so let's use the same sample file as above and remove duplicates based on Letter and Number.

Remove Duplicate Rows in Excel based on two columns

When factoring in both the Letter and Number columns, we have more unique values to consider because we treat the combined value as a instance:

A1 - 2 Instances
A2 - 1 Instance
B1 - 1 Instance
B2 - 1 Instance
C1 - 2 Instances
C1 - 2 Instances

As a result, there are less duplicates even though the file is exactly the same as before.

When we run a deduplication using both Letter and Number, the first instance of each value will be kept, along with the rest of the data in the row. The repeated instances of the value (highlighted in red) will be deleted.

Here is the output:

Results of Removing duplicate rows in excel based on two columns

Removing duplicates can be performed on any number of column selections. Here is an example of using our same file, but factoring in all 3 columns, Letter, Number, and Color, as part of the deduplication process.

As you can see there are 2 instances that are repeated, when considering all 3 columns: A1Red and C1Red:

Here is the output after deduplication has been run using all 3 columns:

Results of Data Deduplication using all columns

Behind the Scenes: Combining Columns before Removing Duplicate Rows

Find deduplicating multiple columns confusing? Let's provide some additional clarity by demonstrating what is happening behind the scenes. Basically, the deduplication function is combining the columns Letter and Number prior to looking for duplicates.

In this file I have combined the Letter and Number columns into a single column called Sample Combined Column. This combination simply takes the Letter value and appends the Number value. For Example, A and 1 become A1. And it continues for each value.

Combined columns are the same as duplication based on two columns

Now let's remove duplicates from our new column! Here we can clearly see that we are sorting through the same values that were considered in the example above, when we were removing duplicates on Letter and Number:

A1 - 2 Instances
A2 - 1 Instance
B1 - 1 Instance
B2 - 1 Instance
C1 - 2 Instances
C1 - 2 Instances

The output file will match the output of our example of removing duplicate rows based on two or more columns, the only difference being we have a new column called Sample Combined Data :

Results of data deduplication based on two columns

Hopefully it's clear now how selecting multiple columns works when removing duplicates. Behind the scenes, it combines columns and then searches those results for duplicates. We can do this manually, but it's always nice to skip a step!

How to Remove Duplicates in Excel Based on One Column

Removing duplicates is a quick process using Excel once you locate the Remove Duplicates button, which can be found on the right side of the Data tab.

Data Toolbar in Excel Containing Remove Duplicates in Excel

Remove Duplicates Button in Microsoft Excel

In this example, we are going to use the exact same example file from above. I loaded it into Microsoft Excel:

Remove duplicates excel based on one column

Clicking Remove Duplicates will bring up a new menu where you can select the columns to be included in the data deduplication calculation. Here, we are using the same file from our example above, removing duplicates on Letter:

Excel does have a checkbox for My data has headers that needs to remain checked if your data does include headers. This tells Excel to keep the first row and not try to find duplicates of the header values.

Click OK to proceed with the deduplication, and Excel will notify you how many rows include duplicates, and how many will remain once the process completes.

Hit OK to remove duplicates in Microsoft Excel

Hit OK again and the deduplicated output file will be ready for analysis!

How to Remove Duplicate Rows in Excel Based on Two Columns

Removing duplicates based on two columns in Excel is easy. You can either use the Remove Duplicates Feature and select two columns, or manually combine columns and then use Remove Duplicates on that single column. Let's walk through both examples!

Use Remove Duplicates Feature and Select Two or More Columns

Removing duplicate rows in Excel based on two columns is executed the same as removing duplicates from a single column. Begin by clicking Remove Duplicates on the Data tab.

Microsoft Excel Remove Duplicates Button on the Data Tab

This time, instead of selecting just one column, we are going to select both the Letter and Number columns and hit OK.

Behind the scenes, Excel will combine the two columns into a single value and then search the list of the combined values for duplicates.

Removing Duplicates based on two columns in Excel

A pop up will appear indicating that 3 duplicate values have been found, and 7 unique values remain.

In the examples above where we deduplicated data based on a single column, you will recall that 7 duplicate values were found. There are less duplicates because the deduplication is considering Letter + Number, so values like A1.

Pop-up window when Removing Duplicates Excel based on multiple columns

Hit OK to remove duplicates from your file. The output in Excel will look like this, which no longer contains any duplicated Letter + Number combinations.

Data Deduplication Excel based on Two Columns

Use formulas to first Combine Columns, and Then use Remove Duplicates

Removing duplicate rows in Excel based on two columns can also be done manually by combining the columns prior to running Remove Duplicates. All it takes is a simple formula.

Not comfortable with formulas? This can be your first!

In our sample file, let's create a new column called Combined Letter and Number. In the box to the right of our first value, type:

=A2&B2

Combining columns in order to remove duplicates based on two columns in excel

Once you hit Enter, this will create the combined Letter + Number Value. The "&" basically tells Excel to stick two values together. In this case, it's A2 and B2. The resulting value is A1.

Either drag or copy and paste the formula down to the other rows, and the results will look like this:

Deduplication of Values based on 2 columns

Here we can clearly see that there are 3 duplicates in our new Combined Letter and Number column. Let's hit Remove Duplicates and select this column to clean up our file!

Select the combined column for deduplication

Hit OK, and the output is the same as just selecting two values in the Remove Duplicates tool, but with our new column on the right side.

The output of deduplication of Values based on 2 columns

Now that we have removed duplicates based on one and many columns from Excel, let's take a tour at removing duplicates from other spreadsheets.

Google Sheets Remove Duplicate Rows based on One Column

Removing duplicates is also easy in Google sheets. We will take the same example file through the deduplication process. To begin, go to Data and then click on Data Cleanup and select Remove Duplicates.

How to remove duplicates in google sheets

The remove duplicates tool will appear and ask you to select which columns contain duplicates.

If Sheets does not automatically recognize the data headers, check the Data has header row box and the data headers will be added to the column letters in the drop-down box. To repeat our example, we are going to select Letter and then hit Remove Duplicates.

Sheets will alert you with how many duplicates are found, and how many rows will remain. Hit OK.

Hit OK to remove duplicates in google sheet

And there you have it! A clean file, free of duplicates.

Results of remove duplicate google sheets

Remove Duplicate Rows in Google Sheets based on Two Columns

Similar to Excel, there are two ways to remove duplicate values from Google sheets based on two or more columns.

Use Remove Duplicates Feature and Select Two or More Columns

Removing Duplicates based on two columns is as simple as selecting two columns when running the Remove Duplicates function. Here is the same sample file loaded into Google Sheets, with the duplicates highlighted based on two columns.

Duplicate Rows in Google Sheets based on two columns

Let's check both Letter and Number:

Remove Duplicate Rows in Google Sheets based on Two Columns

And the pop-up notification indicates that the same 3 duplicates are found.

Results Pop Up indicating duplicates removed based on two columns

Here is our clean file:

Output file of data deduplication based on multiple columns

Use formulas to first Combine Columns, and Then use Remove Duplicates

If we want to remove duplicates based on multiple rows manually, we can first combine the Letter and Number columns and then deduplicate our combined column.

We are going to use the same formula with the "&" that we used in Excel to create a new column called Combined Letter and Number. In the box to the right of our first value, type:

=A2&B2

Creating a formula in Google sheets to deduplicate based on multiple columns

Drag or copy and paste the values down into the remaining rows and the file will look like this:

Combining columns for data deduplication in Google Sheets

Now run Remove Duplicates on the Combined Letter and Number column, and we should get the same 3 duplicates removed with 7 rows remaining.

Removing Duplicates Google Sheets based on two columns combined by a formula

There they are! Just like in Microsoft Excel, Google Sheets has found and the 3 duplicates in our new column.

Data Deduplication notification in Google Sheets

And here is the final file, which matches the one above, with the exception of the Combined Letter and Number column that we added.

Rows removed based data deduplication of a combined column in Google Sheets

Removing Duplicates in Gigasheet based on One Column

Lastly, we want to show you how this works in Gigasheet. By the way, you can open Excel files directly in Gigasheet, if you want to try it for free!

Gigasheet makes it easy to remove duplicates from any file, and we support up to a billion rows of data! Throw your biggest data sets at Gigasheet, clean up the file, and use the simple tools for data analysis.

Removing duplicates is as easy as selecting the Data Cleanup menu and clicking Delete Duplicates.

Data Deduplication in Gigasheet using Delete Duplicates function

A pop-up box appears and you can select the columns to include in the remove duplicates process. Here we are using the same file as in our examples above and selecting the Letter column to search for duplicates. If we wanted to include multiple columns, I would simply select more.

Remove Duplicates based on one column in Gigasheet

Click Remove and a warning will appear notifying you how many values will be deleted. Click Proceed to remove duplicates.

Preview of Removing Duplicates in Gigasheet

And just like that, you will be taken back to your file, which will be sparkling clean and free of duplicate values. This is the same output we created in our first example above! Now you know how to do it in Gigasheet.

Results of Removing Duplicates from Gigasheet

Removing Duplicates in Gigasheet based on Two Columns

Removing Duplicates based on multiple values is easy in Gigasheet as well. Simply select multiple columns when running the Delete Duplicates function. For Example, here I am selecting Letter and Number.

Remove duplicates in Gigasheet based on two columns

Use Combine Columns and then Remove Duplicates

If after this entire blog, you still want to combine columns before running Delete Duplicates, Gigasheet has a very simple combine column feature, and no formulas are required. Let's combine columns Letter and Number into a new column called Sample Combined Column.

Combining Columns prior to Data Deduplication

The new column is added to our file and the values from Letter and Number are combined. Now we can remove duplicates from Sample Combined Column.

Remove duplicates on a combined column in Gigasheet

And remember, Gigasheet can easily handle data that has too many rows for Microsoft Excel or over the cell limit in Google Sheets.

Final Thoughts About Using a Spreadsheet to Deduplicate Data

Spreadsheets are a wonderful way to manipulate data, and removing duplicates is a function that makes data cleanup easy. Duplicates can be identified in one column, or across multiple columns, and the entire row is removed.

Therefore, it's not a surprise that Microsoft Excel, Google Sheets, and Gigasheet make it quick and easy to remove duplicates from a file. However, only one allows you to remove duplicates from millions of rows of data. Try Gigasheet now for free!

‍

How To

Remove Duplicate Rows in Excel Based on Two Columns, and more!

How Does Removing Duplicates Work?

Pro Tip: Sort to Desired Order Before Removing Duplicates

Removing Duplicate Rows Based on Two or More Columns

Behind the Scenes: Combining Columns before Removing Duplicate Rows

How to Remove Duplicates in Excel Based on One Column

How to Remove Duplicate Rows in Excel Based on Two Columns

Use Remove Duplicates Feature and Select Two or More Columns

Use formulas to first Combine Columns, and Then use Remove Duplicates

Google Sheets Remove Duplicate Rows based on One Column

Remove Duplicate Rows in Google Sheets based on Two Columns

Use Remove Duplicates Feature and Select Two or More Columns

Use formulas to first Combine Columns, and Then use Remove Duplicates

Removing Duplicates in Gigasheet based on One Column

Removing Duplicates in Gigasheet based on Two Columns

Use Combine Columns and then Remove Duplicates

Final Thoughts About Using a Spreadsheet to Deduplicate Data

The ease of a spreadsheet with the power of a data warehouse.

Similar posts

How To

How to Automate HubSpot Imports with Gigasheet

How To

Easy Fuzzy Matching: Calculate Levenshtein Distance Without Code

How To

How to Connect Excel to Databricks

Webinar: AI for Big Data Analysis
Aug 7 @ 1PM ET

How To

Remove Duplicate Rows in Excel Based on Two Columns, and more!

Will Andrews

How Does Removing Duplicates Work?

Pro Tip: Sort to Desired Order Before Removing Duplicates

Removing Duplicate Rows Based on Two or More Columns

Behind the Scenes: Combining Columns before Removing Duplicate Rows

How to Remove Duplicates in Excel Based on One Column

How to Remove Duplicate Rows in Excel Based on Two Columns

Use Remove Duplicates Feature and Select Two or More Columns

Use formulas to first Combine Columns, and Then use Remove Duplicates

Google Sheets Remove Duplicate Rows based on One Column

Remove Duplicate Rows in Google Sheets based on Two Columns

Use Remove Duplicates Feature and Select Two or More Columns

Use formulas to first Combine Columns, and Then use Remove Duplicates

Removing Duplicates in Gigasheet based on One Column

Removing Duplicates in Gigasheet based on Two Columns

Use Combine Columns and then Remove Duplicates

Final Thoughts About Using a Spreadsheet to Deduplicate Data

The ease of a spreadsheet with the power of a data warehouse.

Similar posts

How To

How to Automate HubSpot Imports with Gigasheet

How To

Easy Fuzzy Matching: Calculate Levenshtein Distance Without Code

How To

How to Connect Excel to Databricks

Webinar: AI for Big Data AnalysisAug 7 @ 1PM ET

Webinar: AI for Big Data Analysis
Aug 7 @ 1PM ET