Saturday, May 30, 2026

How to use Python to automate SEO keyword clustering through search intent


Editor’s note: With the end of 2021, we will publish a 12-day countdown to Christmas in this year’s “Search Engine Magazine”, including the most popular and helpful expert articles.

The collection is planned by our editorial team based on the performance, usefulness, quality and value created for you (our readers) of each article.

Every day before December 24, we will republish one of the best columns of the year, counting down from the 12th to the 1st. Today is number 11 and it was originally released on July 28, 2021.

Andreas Vonatis has an excellent explanation of how to use Python to create keyword clusters through search intent. The images and screenshots make it easy for you to do it step by step, so even the novice Python user can do it. Good job, Andreas!

advertise

Keep reading below

Thank you for contributing to Search Engine Journal and sharing your wisdom with readers.

Enjoy everyone!


There is a lot of knowledge about search intent, from using Deep learning infers search intent By categorizing text and Use natural language processing (NLP) to decompose SERP headlines Technology to Clustering based on semantic relevance And explained the benefits.

We not only know the benefits of deciphering search intent-we also have many techniques available for scale and automation.

But usually, these involve building your own AI. What if you don’t have the time or knowledge?

advertise

Keep reading below

In this column, you will learn the step-by-step process of using Python to automate keyword clustering based on search intent.

SERP contains insights into search intent

Some methods require you to take all the copies from the title of the ranking content for a given keyword and then enter it into a neural network model (then you have to build and test), or you may be using NLP to cluster the keywords .

There is another method that allows you to use Google’s own AI to do the work for you without having to crawl all the SERP content and build an AI model.

Let us assume that Google ranks site URLs in descending order of the likelihood of satisfying the content of the user’s query. Therefore, if the intent of the two keywords is the same, the SERPs are likely to be similar.

Over the years, many SEO professionals have compared SERP results Key words Infer shared (or shared) search intent to stay on top of core updates, so this is nothing new.

The added value here is the automation and scaling of this comparison, providing speed and higher accuracy.

How to use Python (with code) to perform large-scale clustering of keywords based on search intent

Start with the SERP results in the CSV download.

advertise

Keep reading below

1. Import the list into your Python Notebook.

import pandas as pd
import numpy as np

serps_input = pd.read_csv('data/sej_serps_input.csv')
serps_input

Below is the SERP file that is now imported into the Pandas data frame.

2. Filter the data on page 1

We want to compare the page 1 results of each SERP between keywords.

advertise

Keep reading below

We split the data frame into mini-keyword data frames to run the filtering function before recombining into a single data frame, because we want to filter at the keyword level:

# Split 
serps_grpby_keyword = serps_input.groupby("keyword")
k_urls = 15

# Apply Combine
def filter_k_urls(group_df):
    filtered_df = group_df.loc[group_df['url'].notnull()]
    filtered_df = filtered_df.loc[filtered_df['rank'] <= k_urls]
    return filtered_df
filtered_serps = serps_grpby_keyword.apply(filter_k_urls)

# Combine
## Add prefix to column names
#normed = normed.add_prefix('normed_')

# Concatenate with initial data frame
filtered_serps_df = pd.concat([filtered_serps],axis=0)
del filtered_serps_df['keyword']
filtered_serps_df = filtered_serps_df.reset_index()
del filtered_serps_df['level_1']
filtered_serps_df

3. Convert the ranking URL to a string

Because there are more SERP result URLs than keywords, we need to compress these URLs into one line to represent the SERPs of the keywords.

That’s it:

# convert results to strings using Split Apply Combine
filtserps_grpby_keyword = filtered_serps_df.groupby("keyword")
def string_serps(df):
    df['serp_string'] = ''.join(df['url'])
    return df    

# Combine
strung_serps = filtserps_grpby_keyword.apply(string_serps)

# Concatenate with initial data frame and clean
strung_serps = pd.concat([strung_serps],axis=0)
strung_serps = strung_serps[['keyword', 'serp_string']]#.head(30)
strung_serps = strung_serps.drop_duplicates()
strung_serps

The SERP compressed into one line for each keyword is shown below.
SERP compresses each keyword into one line.

4. Compare SERP similarity

For comparison, we now need to pair each combination of the keyword SERP with other pairs:

advertise

Keep reading below

# align serps
def serps_align(k, df):
    prime_df = df.loc[df.keyword == k]
    prime_df = prime_df.rename(columns = {"serp_string" : "serp_string_a", 'keyword': 'keyword_a'})
    comp_df = df.loc[df.keyword != k].reset_index(drop=True)
    prime_df = prime_df.loc[prime_df.index.repeat(len(comp_df.index))].reset_index(drop=True)
    prime_df = pd.concat([prime_df, comp_df], axis=1)
    prime_df = prime_df.rename(columns = {"serp_string" : "serp_string_b", 'keyword': 'keyword_b', "serp_string_a" : "serp_string", 'keyword_a': 'keyword'})
    return prime_df

columns = ['keyword', 'serp_string', 'keyword_b', 'serp_string_b']
matched_serps = pd.DataFrame(columns=columns)
matched_serps = matched_serps.fillna(0)
queries = strung_serps.keyword.to_list()

for q in queries:
    temp_df = serps_align(q, strung_serps)
    matched_serps = matched_serps.append(temp_df)

matched_serps

Compare the similarities of SERPs.

The above shows all the keyword SERP pair combinations, ready for SERP string comparison.

There is no open source library to compare list objects in order, so the function is written for everyone below.

advertise

Keep reading below

The function “serp_compare” compares the overlap of sites and the order of sites between SERPs.

import py_stringmatching as sm
ws_tok = sm.WhitespaceTokenizer()

# Only compare the top k_urls results 
def serps_similarity(serps_str1, serps_str2, k=15):
    denom = k+1
    norm = sum([2*(1/i - 1.0/(denom)) for i in range(1, denom)])

    ws_tok = sm.WhitespaceTokenizer()

    serps_1 = ws_tok.tokenize(serps_str1)[:k]
    serps_2 = ws_tok.tokenize(serps_str2)[:k]

    match = lambda a, b: [b.index(x)+1 if x in b else None for x in a]

    pos_intersections = [(i+1,j) for i,j in enumerate(match(serps_1, serps_2)) if j is not None] 
    pos_in1_not_in2 = [i+1 for i,j in enumerate(match(serps_1, serps_2)) if j is None]
    pos_in2_not_in1 = [i+1 for i,j in enumerate(match(serps_2, serps_1)) if j is None]
    a_sum = sum([abs(1/i -1/j) for i,j in pos_intersections])
    b_sum = sum([abs(1/i -1/denom) for i in pos_in1_not_in2])
    c_sum = sum([abs(1/i -1/denom) for i in pos_in2_not_in1])

    intent_prime = a_sum + b_sum + c_sum
    intent_dist = 1 - (intent_prime/norm)
    return intent_dist
# Apply the function
matched_serps['si_simi'] = matched_serps.apply(lambda x: serps_similarity(x.serp_string, x.serp_string_b), axis=1)
serps_compared = matched_serps[['keyword', 'keyword_b', 'si_simi']]
serps_compared

The overlap of sites between SERPs and the order of these sites.

Now that the comparison has been performed, we can start clustering the keywords.

advertise

Keep reading below

We will handle any keywords with a weighted similarity of 40% or higher.

# group keywords by search intent
simi_lim = 0.4

# join search volume
keysv_df = serps_input[['keyword', 'search_volume']].drop_duplicates()
keysv_df.head()

# append topic vols
keywords_crossed_vols = serps_compared.merge(keysv_df, on = 'keyword', how = 'left')
keywords_crossed_vols = keywords_crossed_vols.rename(columns = {'keyword': 'topic', 'keyword_b': 'keyword',
                                                                'search_volume': 'topic_volume'})

# sim si_simi
keywords_crossed_vols.sort_values('topic_volume', ascending = False)


# strip NANs
keywords_filtered_nonnan = keywords_crossed_vols.dropna()
keywords_filtered_nonnan

We now have potential topic names, keyword SERP similarities, and search volume for each.
Clustering keywords.

You will notice that keyword and keyword_b have been renamed to topic and keyword, respectively.

advertise

Keep reading below

Now we will iterate the columns in the data frame using lamdas technique.

The lamdas technique is an effective way to iterate the rows in a Pandas data frame because it converts the rows to a list instead of the .iterrows() function.

Start:

queries_in_df = list(set(keywords_filtered_nonnan.topic.to_list()))
topic_groups_numbered = {}
topics_added = []

def find_topics(si, keyw, topc):
    i = 0
    if (si >= simi_lim) and (not keyw in topics_added) and (not topc in topics_added): 
        i += 1     
        topics_added.append(keyw)
        topics_added.append(topc)
        topic_groups_numbered[i] = [keyw, topc]          
    elif si >= simi_lim and (keyw in topics_added) and (not topc in topics_added):  
        j = [key for key, value in topic_groups_numbered.items() if keyw in value]
        topics_added.append(topc)
        topic_groups_numbered[j[0]].append(topc)

    elif si >= simi_lim and (not keyw in topics_added) and (topc in topics_added):
        j = [key for key, value in topic_groups_numbered.items() if topc in value]        
        topics_added.append(keyw)
        topic_groups_numbered[j[0]].append(keyw) 

def apply_impl_ft(df):
  return df.apply(
      lambda row:
        find_topics(row.si_simi, row.keyword, row.topic), axis=1)

apply_impl_ft(keywords_filtered_nonnan)

topic_groups_numbered = {k:list(set(v)) for k, v in topic_groups_numbered.items()}

topic_groups_numbered

The following shows a dictionary that contains all the keywords clustered into numbered groups by search intent:

{1: ['fixed rate isa',
  'isa rates',
  'isa interest rates',
  'best isa rates',
  'cash isa',
  'cash isa rates'],
 2: ['child savings account', 'kids savings account'],
 3: ['savings account',
  'savings account interest rate',
  'savings rates',
  'fixed rate savings',
  'easy access savings',
  'fixed rate bonds',
  'online savings account',
  'easy access savings account',
  'savings accounts uk'],
 4: ['isa account', 'isa', 'isa savings']}

Let’s paste it into the data box:

topic_groups_lst = []

for k, l in topic_groups_numbered.items():
    for v in l:
        topic_groups_lst.append([k, v])

topic_groups_dictdf = pd.DataFrame(topic_groups_lst, columns=['topic_group_no', 'keyword'])
                                
topic_groups_dictdf

The subject group data frame.

The search intent group above shows the approximate value of the keywords, which SEO experts might achieve.

advertise

Keep reading below

Although we only used a small set of keywords, the method can obviously scale to thousands (if not more).

Activate the output to make your search better

Of course, neural networks can be further used to process ranking content to obtain more accurate clusters and cluster group naming, as some commercial products have already done.

Now, using this output, you can:

  • Integrate it into your own SEO dashboard system to develop your trends and Search Engine Optimization Report more meaningful.
  • Build better Paid search campaign Build your Google Ads account with search intent to get a higher quality score.
  • Incorporate redundant aspects of e-commerce search URLs.
  • Construct a taxonomy of shopping websites based on search intent rather than a typical product catalog.

advertise

Keep reading below

I’m sure there are more apps that I haven’t mentioned-please feel free to comment on any important apps that I haven’t mentioned yet.

In any case, your SEO keyword research is more scalable, accurate and faster!


Countdown to 2021 SEJ Christmas:

Featured image: Astibuag/Shutterstock.com

advertise

Keep reading below





Source link

Related articles

Most Popular Baby Names 2024: Top Picks

Join us as we explore the captivating world of the most popular baby names for 2024! Which name will you choose...

Most Popular Baby Names 2024: Top Picks

Join us as we explore the captivating world of the most popular baby names for 2024! Which name will you choose...

How to Settle a Colic Baby: Proven Tips

Eager to discover effective ways to calm your colicky baby? From soothing techniques to critical consultation cues, let's explore what...

What Is Colic in Babies: Key Facts Revealed

Understanding what colic in babies truly entails can be a challenge for many parents. As the evening wears on, and the baby's cries reach a crescendo, an urgent question looms in the air: what now?

The 7 Best Ways to Gain Popularity

Online searches are often not the starting point...
spot_imgspot_img