Saturday, May 23, 2026

How to use Google Sheets for web scraping and marketing campaign construction


Editor’s note: With the end of 2021, we will publish a 12-day countdown to Christmas in this year’s “Search Engine Magazine” to celebrate the most popular and useful expert articles.

The collection is planned by our editorial team based on the performance, usefulness, quality and value created for you (our readers) of each article.

Every day before December 24, we will republish one of the best columns of the year, counting down from 12th to 1st. We are counting down to the fifth column from today, which was originally published on August 4, 2021.

This how-to guide from Andrea Atzori teaches readers how to use Google Sheets for web crawling and activity building without any coding experience.

enjoy!


We have all encountered situations where we had to extract data from a website at some point.

For example, when processing a new account or event, you may not have the data or information available to create ads.

advertise

Keep reading below

Ideally, we will provide all the content, login page, and related information we need in an easy-to-import format (such as CSV, Excel spreadsheet, or Google Sheet). (Or at least, provide the tabbed data we need that can be imported into one of the above formats.)

But this is not always the case.

Those who lack web scraping tools—or lack the coding knowledge to use things like Python to help with tasks—may have to resort to the tedious work of manually copying and pasting potentially hundreds of entries.

In a recent job, my team was asked to:

  • Go to the client’s website.
  • Download more than 150 new products distributed on 15 different pages.
  • Copy and paste the product name and landing page URL of each product into the spreadsheet.

Now, you can imagine how long the task will be if we just do this and perform the task manually.

advertise

Keep reading below

This is not only time-consuming, but someone manually browses so many items and pages, and has to copy and paste data products one by one, so the possibility of making one or two mistakes is very high.

This requires more time to check the document and make sure it is error-free.

There must be a better way.

Good news: yes! Let me tell you how we did it.

What is IMPORTXML?

Enter Google Sheets. I want you to get to know the IMPORTXML function.

According to Google Support page, IMPORTXML “Import data from any of a variety of structured data types, including XML, HTML, CSV, TSV, RSS, and ATOM XML feeds.”

Essentially, IMPORTXML is a feature that allows you to grab structured data from web pages—no coding knowledge required.

For example, you can quickly and easily extract data such as page titles, descriptions, or links, as well as more complex information.

How does IMPORTXML help crawl web page elements?

The function itself is very simple and only requires two values:

  • The URL of the web page from which we intend to extract or scrape information.
  • and XPath The element that contains the data.

XPath stands for XML path language And can be used to browse the elements and attributes in the XML document.

For example, to extract the page title from https://en.wikipedia.org/wiki/Moon_landing, we would use:

=IMPORTXML(“https://en.wikipedia.org/wiki/Moon_landing”, “//title”)

This will return the value: Moon landing-Wikipedia.

Or, if we are looking for page descriptions, try this:

=IMPORTXML(“https://www.searchenginejournal.com/”,”//meta[@name=’description’]/@content”)

The following is a candidate list of some of the most common and useful XPath queries:

advertise

Keep reading below

  • Page title: // title
  • Page meta description: //yuan[@name=’description’]/@content
  • Page H1: //h1
  • Page link: //@href

See IMPORTXML in action

Since the discovery of IMPORTXML in Google Sheets, it has truly become one of our secret weapons for automating many of our daily tasks, from activity and advertising creation to content research and so on.

In addition, this function can be used for more advanced tasks in combination with other formulas and additional components, which would otherwise require complex solutions and development, such as tools built in Python.

But in this example, we will look at IMPORTXML in its most basic form: grab data from a web page.

Let us look at a practical example.

Imagine that we are asked to create an event for Search Engine Journal.

They want us to advertise the last 30 articles published under the PPC section of the website.

advertise

Keep reading below

You might say that this is a very simple task.

Unfortunately, the editor was unable to send us data and kindly asked us to refer to the website for the information needed to set up the event.

As mentioned at the beginning of our article, one method is to open two browser windows-one is a website and the other is Google Sheets or Excel. Then, we will start copying and pasting information one by one, link by link.

But using IMPORTXML in Google Sheets, we can achieve the same output in a short period of time, with almost no risk of making mistakes.

That’s it.

Step 1: Start with the new Google Sheets

First, we open a new blank Google Sheets document:

Start with a blank Google Sheets document.

Step 2: Add what you need to crawl

Add the URL of the page (or pages) from which we want to grab information.

advertise

Keep reading below

In our example, we start from https://www.searchenginejournal.com/category/pay-per-click/:

Add the URL of the page to be crawled.Screenshot taken from Google Sheets, July 2021

Step 3: Find XPath

We found the XPath of the element we want to import its content into our data spreadsheet.

In our example, let’s start with the titles of the most recent 30 articles.

Go to Chrome.After hovering over the title of one of the articles, right-click and select Check.

Open the Chrome WebDev tool.Screen shot of SearchEngineJournal.com, July 2021

This will open the Chrome Developer Tools window:

Find and copy the XPath element you want to extract.Screen shot of SearchEngineJournal.com, July 2021

Make sure the article title is still selected and highlighted, then right-click again and select Copy> Copy XPath.

advertise

Keep reading below

Step 4: Extract the data into Google Sheets

Go back to your Google Sheets document and introduce the IMPORTXML function as follows:

=IMPORTXML(B1,”//[starts-with(@id, ‘title’)]”)

Points to note:

First, In our formula, we have replaced the URL of the page with a reference to the cell (B1) where the URL is stored.

second, When copying XPath from Chrome, this will always be enclosed in double quotes.

(//*[@id=”title_1″])

However, to ensure that it does not break the formula, you need to change the double quotes to single quotes.

(//*[@id=’title_1’])

Please note that in this example, since the page ID title of each article will change (title_1, title_2, etc.), we have to modify the query slightly and use “starts-with” to capture all the elements contained in the ID’ title on the page. ‘

The following is the content of the Google Sheets document:

An example of IMPORTXML.Screenshot taken from Google Sheets, July 2021

After a while, after the query loads the data into the spreadsheet, the results are as follows:

The title imported in Google Sheets.Screenshot taken from Google Sheets, July 2021

As you can see, the list returns all the articles on the page we just crawled (including my previous discussion on automation and how to use Ad customizer to improve the performance of Google Ads campaigns).

advertise

Keep reading below

You can also apply this to crawl any other information needed to set up your campaign.

Let’s add the landing page URL, Featured snippets The name of each article, as well as the name of the author, are included in our table file.

For the landing page URL, we need to adjust the query to specify that we are after the HREF element attached to the article title.

Therefore, our query will look like this:

=IMPORTXML(B1,”//[starts-with(@id, ‘title’)]/@href”)

Now, append “/@href” to the end of Xpath.

Import article links.Screenshot taken from Google Sheets, July 2021

Look! Soon, we have the URL of the landing page:

Articles and URLs imported in Google Sheets.Screenshot taken from Google Sheets, July 2021

You can do the same for featured snippets and author names:

All data is crawled and imported into Google Sheets.Screenshot taken from Google Sheets, July 2021

Troubleshooting

One thing to note is that in order to be able to fully expand and populate the spreadsheet with all the data returned by the query, the column populating the data must have enough available cells and no other data.

advertise

Keep reading below

This is similar to how we work when using ARRAYFORMULA. For the formula to be expanded, there must be no other data in the same column.

in conclusion

Whether you need content and product descriptions, or e-commerce data such as product prices or shipping costs, you can grab data from (possibly) any web page in a fully automated and error-free manner.

In an age where information and data can be the advantage needed to deliver above-average results, the ability to crawl web pages and structured content in a simple and fast way can be invaluable. In addition, as we have seen above, IMPORTXML can help reduce execution time and reduce the chance of errors.

In addition, this feature is not only a good tool, it can be used exclusively for PPC task, But very useful in many different projects that require web scraping, including SEO and content tasks.

Countdown to 2021 SEJ Christmas:

advertise

Keep reading below

Featured image: Aleutie/Shutterstock





Source link

Related articles

Most Popular Baby Names 2024: Top Picks

Join us as we explore the captivating world of the most popular baby names for 2024! Which name will you choose...

Most Popular Baby Names 2024: Top Picks

Join us as we explore the captivating world of the most popular baby names for 2024! Which name will you choose...

How to Settle a Colic Baby: Proven Tips

Eager to discover effective ways to calm your colicky baby? From soothing techniques to critical consultation cues, let's explore what...

What Is Colic in Babies: Key Facts Revealed

Understanding what colic in babies truly entails can be a challenge for many parents. As the evening wears on, and the baby's cries reach a crescendo, an urgent question looms in the air: what now?

The 7 Best Ways to Gain Popularity

Online searches are often not the starting point...
spot_imgspot_img