Editor’s note: With the end of 2021, we will publish a 12-day countdown to Christmas in this year’s “Search Engine Magazine” to celebrate the most popular and useful expert articles.
The collection is planned by our editorial team based on the performance, usefulness, quality and value created for you (our readers) of each article.
Every day before December 24, we will republish one of the best columns of the year, counting down from 12th to 1st. We are counting down to the third column from today, which was originally published on March 18, 2021.
Ruth Everett’s article on using Python libraries to automate and complete SEO tasks makes the job of marketers easier. It is very easy to read, and is ideal for beginners and more experienced SEO professionals who want to use Python more.
Well done in this regard, Ruth, we are very grateful for your contribution to the search engine magazine.
enjoy!
The Python library is a fun and easy-to-access way to help you start learning and using Python for SEO.
advertise
Keep reading below
The Python library is a set of useful functions and codes that allow you to accomplish many tasks without having to write code from scratch.
There are more than 100,000 libraries available in Python that can be used for functions ranging from data analysis to creating video games.
In this article, you will find several different libraries that I use to complete SEO projects and tasks. All of these are beginner-friendly, and you will find plenty of documentation and resources to help you get started.
Why are Python libraries useful for SEO?
Each Python library contains functions and variables of all types (arrays, dictionaries, objects, etc.) that can be used to perform different tasks.
For SEO, for example, they can be used to automate certain things, predict results, and provide intelligent insights.
You can just use ordinary Python, but the library can Used to make tasks easier And write and complete faster.
Python library for SEO tasks
There are many useful Python libraries for SEO tasks, including data analysis, Web scraping and visual insights.
advertise
Keep reading below
This is not an exhaustive list, but these are the libraries I find most commonly used for SEO purposes.
Panda
Pandas is a Python library for processing tabular data. It allows advanced data manipulation when the key data structure is a DataFrame.
DataFrames are similar to Excel SpreadsheetsHowever, they are not limited to row and byte limits, but are faster and more efficient.
The best way to start using Pandas is to take a simple CSV data (for example, a crawl of your website) and save it as a DataFrame in Python.
Once it is stored in Python, you can perform many different analysis tasks, including aggregating, pivoting, and cleaning data.
For example, if I have a complete crawl of my website and only want to extract indexable pages, I will use the built-in Pandas function to include only these URLs in my DataFrame.
import pandas as pd
df = pd.read_csv('/Users/rutheverett/Documents/Folder/file_name.csv')
df.head
indexable = df[(df.indexable == True)]
indexable
Require
The next library is called Requests and is used to make HTTP requests in Python.
Requests uses different request methods, such as GET and POST to make requests, and store the results in Python.
An example is a simple URL GET request, which will print out the status code of the page:
import requests
response = requests.get('https://www.deepcrawl.com') print(response)
You can then use this result to create a decision function, where a 200 status code indicates that the page is available, and a 404 indicates that the page was not found.
if response.status_code == 200:
print('Success!')
elif response.status_code == 404:
print('Not Found.')
You can also use different requests, such as headers, which display useful information about the page, such as the content type or the time required to cache the response.
headers = response.headers print(headers) response.headers['Content-Type']
It is also possible to simulate a specific user agent, such as Googlebot, to extract the response that this specific bot will see when crawling the page.
headers = {'User-Agent': 'Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'} ua_response = requests.get('https://www.deepcrawl.com/', headers=headers) print(ua_response)
Beautiful soup
Beautiful Soup is a library for extracting data from HTML and XML files.
advertise
Keep reading below
Fun fact: The BeautifulSoup library is actually named after a poem from Lewis Carroll’s “Alice in Wonderland.”
As a library, BeautifulSoup is used to understand web files and is most commonly used for web scraping because it can convert HTML documents into different Python objects.
For example, you can get a URL and use Beautiful Soup and Requests libraries to extract the title of the page.
from bs4 import BeautifulSoup import requests url="https://www.deepcrawl.com" req = requests.get(url) soup = BeautifulSoup(req.text, "html.parser") title = soup.title print(title)

In addition, using the find_all method, BeautifulSoup enables you to extract certain elements from the page, such as all href links on the page:
advertise
Keep reading below
url="https://www.deepcrawl.com/knowledge/technical-seo-library/"
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
for link in soup.find_all('a'):
print(link.get('href'))

Put them together
These three libraries can also be used together, Requests is used to make HTTP requests to the page that we want to extract information from using BeautifulSoup.
Then, we can convert this raw data into Pandas DataFrame for further analysis.
URL = 'https://www.deepcrawl.com/blog/'
req = requests.get(url)
soup = BeautifulSoup(req.text, "html.parser")
links = soup.find_all('a')
df = pd.DataFrame({'links':links})
df
Matplotlib and Seaborn
Matplotlib and Seaborn are two Python libraries for creating visualizations.
Matplotlib allows you to create many different data visualizations, such as bar graphs, line graphs, histograms, and even heat maps.
advertise
Keep reading below
For example, if I want to use some Google Trends data to show the most popular queries in 30 days, I can create a bar chart in Matplotlib to visualize all of them.

Seaborn is built on Matplotlib. In addition to line graphs and bar graphs, it also provides more visualization modes, such as scatter plots, box plots, and violin plots.
It is slightly different from Matplotlib because it uses less syntax and has a built-in default theme.
advertise
Keep reading below
One way I use Seaborn is to create a line chart to visualize certain parts of the website from the log file over time.

sns.lineplot(x = "month", y = "log_requests_total", hue="category", data=pivot_status) plt.show()
This particular example gets data from a pivot table, I was able to create it in Python using the Pandas library, and is another way for these libraries to work together to create easy-to-understand pictures from the data.
advertise
advertise Is a library created by Elias Dabas Can be used to help manage, understand and make decisions based on the data we have as SEO professionals and digital marketers.
advertise
Keep reading below
Site map analysis
The library allows you to perform many different tasks such as downloading, parsing and analysis XML sitemap Extract patterns or analyze the frequency of adding or changing content.
Robots.txt analysis
Another interesting thing you can do with this library is to use a function to Extract the robots.txt of the website Into a DataFrame, in order to understand and analyze the rule set.
You can also run tests in the library to check whether a particular user agent can obtain certain URLs or folder paths.
URL analysis
Advertools also enables you to Parse and analyze URL In order to extract information and better understand the analysis, SERP and crawl data of certain URL sets.
You can also use the library to split the URL to determine things such as the HTTP scheme being used, the main path, additional parameters, and query strings.
selenium
Selenium is a Python library, usually used for automation purposes. The most common use case is to test web applications.
advertise
Keep reading below
A popular example of Selenium’s automated process is a script that opens a browser and executes many different steps in a defined order, such as filling out forms or clicking certain buttons.
Selenium uses the same principle as the Requests library we introduced earlier.
However, it will not only send a request and wait for a response, but it will also render the web page that is being requested.
To start using Selenium, you need a WebDriver to interact with the browser.
Each browser has its own WebDriver; for example, Chrome has ChromeDriver and Firefox has GeckoDriver.
These are easy to download and set up with your Python code. This is a useful article Explain the setup process with a sample project.
Gua Sha
The last library I want to introduce in this article is Scrapy.
Although we can use the Requests module to scrape and extract internal data from web pages, in order to pass this data and extract useful insights, we also need to combine it with BeautifulSoup.
advertise
Keep reading below
Scrapy essentially allows you to accomplish both of these tasks in one library.
Scrapy is also faster and more powerful. It can complete crawling requests, extract and parse data in a set order, and allow you to block data.
In Scrapy, you can define many instructions, such as the name of the domain you want to crawl, the starting URL, and certain page folders that crawlers are allowed or not allowed to crawl.
For example, Scrapy can be used to extract all the links on a specific page and store them in the output file.
class SuperSpider(CrawlSpider):
name="extractor"
allowed_domains = ['www.deepcrawl.com']
start_urls = ['https://www.deepcrawl.com/knowledge/technical-seo-library/']
base_url="https://www.deepcrawl.com"
def parse(self, response):
for link in response.xpath('//div/p/a'):
yield {
"link": self.base_url + link.xpath('.//@href').get()
}
You can go a step further and extract information from all the pages linked to by the starting URL according to the links on the page, a bit like a small copy of Google finding and tracking links on the page.
from scrapy.spiders import CrawlSpider, Rule
class SuperSpider(CrawlSpider):
name="follower"
allowed_domains = ['en.wikipedia.org']
start_urls = ['https://en.wikipedia.org/wiki/Web_scraping']
base_url="https://en.wikipedia.org"
custom_settings = {
'DEPTH_LIMIT': 1
}
def parse(self, response):
for next_page in response.xpath('.//div/p/a'):
yield response.follow(next_page, self.parse)
for quote in response.xpath('.//h1/text()'):
yield {'quote': quote.extract() }
Learn more about these projects, as well as other example projects, here.
Final thoughts
As Hamlet Baptist Church As the saying goes: “The best way to learn is to practice.”
advertise
Keep reading below
I hope that discovering some available libraries can inspire you to start learning Python, or deepen your knowledge.
Python contributions from the SEO industry
Hamlet also likes to share resources and projects with people in the Python SEO community. To commemorate his passion for encouraging others, I want to share some amazing things I have seen in the community.
As a wonderful tribute to Hamlet and the SEO Python community he helped cultivate, Charlie Vogney SEO Pythonistas have been created to collect contributions from amazing Python projects created in the SEO community.
Hamlet’s invaluable contribution to the search engine optimization community is characteristic.
Moses Mayafit Created a super Cool script for log file analysis, And explain how the script works in this article. The visualizations it can display include Google Bot Hits By Device, Daily Hits by Response Code, Response Code% Total, etc.
Koray Tüberk GÜBÜR The sitemap health checker is currently being developed. He also hosted the RankSense webinar with Elias Dabbas, where he shared a script that records SERP and analysis algorithms.
advertise
Keep reading below
It essentially records SERPs with regular time differences. You can grab all login pages, mix data, and create some correlations.
John McAlpine Wrote an article detailing how to use Python and Data Studio to monitor your competitors.
Joynard Wrote a A complete guide to using the Reddit APIWith this, you can perform operations such as extracting data from Reddit and publishing to Subreddit.
Rob May A new GSC analysis tool is being developed, and some new domains/real sites are being built in Wix to compare with its high-end WordPress competitors at the time of recording.
Okazawa Masaki I also shared a script that uses Python to analyze Google Search Console data.
🎉 happy #RSTwittorial Thursday and @saksters 🥳
Analyze Google Search Console data #Python 🐍🔥
This is the output 👇 pic.twitter.com/9l5Xc6UsmT
-RankSense (@RankSense) February 25, 2021
Countdown to 2021 SEJ Christmas:
advertise
Keep reading below
Featured image: jakkaje879/Shutterstock



