How To
Sep 26, 2023

No Code Web Scraping Using Gigasheet

Imagine you’re performing competitive analysis and want to verify data published on the internet. For instance, let’s say we want to find original publish dates of several blog posts. Now, even if you try to do it for 100 articles, it may take you more than a day to analyze every article’s source code and find their respective publish dates manually.

Wouldn’t it be great if you could just put a list of URL’s into Microsoft Excel and it would scrape the HTML from each location into a cell? Then you could use Excel formulas to extract the data from the HTML that you need, such as the publish date in our example above.

While it is technically possible in Excel, it requires coding using Visual Basic and the steps go something like this:

  • Open Visual Basic Editor (VBE) by pressing Alt + F11
  • Create a new module
  • Copy and paste some sample scraping code that you found on the internet
  • Try to modify the code so that it runs on your particular sheet
  • Run the macro
  • Debug the macro
  • Repeat until you get the data

Great news! There’s a better, faster way to to perform no code web scraping.

We ran an experiment on over 500 articles published on Neil Patel’s website and found the original publish date for all these articles—in under 10 minutes. And it is all done in a familiar spreadsheet format, that any Microsoft Excel user will love.

Here’s a breakdown of the steps we followed to accomplish this with Gigasheet, the big data spreadsheet and the custom enrichment feature.

No Code Web Scraping Using the Custom Enrichment Feature

Whether you want to make a list of price points for all your competitors’ products or create a database of topics you’ve covered on your site so far, you can collect any data in minutes with Gigasheet. 

Here’s how we ran an experiment to find the publish date for over 500 articles on a website:

Step 1: Curate all the links in a new Gigasheet file

To start the experiment, we created a new file on Gigasheet and made a list of links for all the blogs we wanted to analyze. 

You can log into your Gigasheet account (or create a new one) and click on the Blank Sheet option to get started. 

Create a blank sheet in Gigasheet

Then give your file a name and keep adding all the links in any column. Be sure to give a relevant name to your column as well. 

Add blog posts URL to Gigasheet for no code web scraping

Step 2: Create a new custom enrichment request 

Once you’ve added all the links, go to Enrichments and scroll down to select the Custom Enrichments option. 

You can also run enrichment requests for other apps, like, Agify, ChatGPT, etc. 

Gigasheet create a new custom enrichment request

Step 3: Add ScrapingBee API key to the Custom Enrichment

Next, you’ll be prompted to add an API key link for your enrichment request. We are going to use the ScrapingBee API to perform the no code web scraping for us, extracting the HTML source code for all the articles. ScrapingBee comes with a free version, so you can try this out yourself before ever having to pay. The nice thing about using a service like ScrapingBee is that it handles headless browsers and rotates proxies for you so you don't have to worry about getting blocked while scraping. We add the ScrapingBee API key link in this step. Then hit next. 

Gigasheet custom enrichment request API key

Step 4: Customize the curl Request with our URL from the Sheet

In this step, you can customize the API key URL to scrape data for the links you added to your list in step 1. To do that, select the part of the link starting after the = sign

Customize the API key based on our data

Then click on the drop-down menu on the right and select the column with all the links. Hit Insert Column Reference and your API key link will change to this. Now, press Test to run the request. 

Customize the API key based on our data  for no code web scraping

Step 5: Run the enrichment process

The custom enrichment process will start immediately. It’ll take a few minutes to complete, depending on the number of data points you’re covering. 

Custom enrichment complete

Before displaying the data, Gigasheet will show you a preview of what the results will look like. Click on Next to continue the enrichment process and get the entire source code for the selected blogs. 

Here’s how the results will appear:

Custom enrichment results with Scraping Bee data after no code web scraping

We successfully used a spreadsheet to perform no code web scraping. Now that it’s in our sheet, let’s pull out the relevant data!

Extracting Information from the Web Scraping Results

Use the split column feature to separate relevant data

Once all the data is available in your Gigasheet file, go to the Data Cleanup menu and select the Split Column feature. This will help you segregate your columns and get whatever data you need.

Split your custom enrichment results to find original publish dates of content pieces

Here, you have to select the column containing the source code for each article. In separator, we added the text just before the published date. Then hit Apply

Gigasheet split column feature

This step divided the HTML code into two columns. Part 1 had the code till the publish date, and part 2 had the code starting from the publish date. Check the results below:

Split column results

Use a formula to identify the data you need

Now, click on the plus icon on the top-right corner of your file and add a formula. 

We used a formula to get the exact publish date from the part 2 column. With this formula, you can create a new column including the first n number of characters in the defined column. 

So, in our case, we gave the new column the name “Published Date”. Then we added the formula LEFT and added the part 2 column as reference. We also added a criteria to select the first 10 characters from this column. 

Use a formula to identify the data you need

Here are the results: a complete list of only the published dates for all the blogs. 

a complete list of only the published dates for all the blogs.

We repeated these steps to find the meta descriptions for all the blogs. Here are the results:

Finding meta-descriptions in bulk

Find data points from anywhere on the internet with Gigasheet! 

While our experiment presented a great use case for content managers, you can replicate these steps for extracting data for virtually any task, especially from web pages. No code web scraping is a powerful tool! 

Once you have the data in spreadsheet form, you can organize it database by filtering, sorting, or grouping results. Besides these basic features, you can try the IF/THEN function or data cleanup functions to better represent your dataset. Gigasheet’s AI Sheet Assistant is another efficient way to navigate a large dataset and find what you need.  

Sign up for free to extract data from anywhere with Gigasheet.

The ease of a spreadsheet with the power of a database, at cloud scale.

No Code
No Database
No Training
Sign Up, Free

Similar posts

By using this website, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.