Monday, May 25, 2026

“Crawling all over the web is impossible”


In response to a question about why SEO tools don’t show all backlinks, Google’s search advocate John Mueller said it’s impossible to crawl the entire web.

This is illustrated in a comment to Reddit in a post started by a frustrated SEO professional.

They asked why all the links to the site were not found by the SEO tools they used.

It doesn’t matter which tool the person uses.As we’ve learned from Mueller, it’s impossible any A tool for discovering 100% of your website’s inbound links.

that’s why.

There is no way to crawl the web “properly”

Mueller said there is no objectively correct way to scrape the web because it has an infinite number of URLs.

No one has the resources to keep endless URLs in a database, so web crawlers try to determine what’s worth crawling

As Mueller explains, this inevitably results in URLs being crawled infrequently or not at all.

“There is no objective way to properly scrape the web.

It is theoretically impossible to crawl them all, as the number of actual URLs is practically infinite. Since no one can afford to keep an infinite number of URLs in the database, all web crawlers make assumptions, simplifications, and guesswork about what’s really worth crawling.

Even then, for practical purposes, you can’t crawl all of them all the time, the internet doesn’t have enough connection and bandwidth, and if you want to visit a lot of pages on a regular basis, it costs a lot of money (for crawlers and website owners).

On top of that, some pages change quickly, and some haven’t changed in 10 years – so crawlers try to save work by focusing more on pages they expect to change rather than pages they expect not to change. “

How web crawlers determine what’s worth crawling

Mueller goes on to explain how web crawlers (including search engines and SEO tools) determine which URLs are worth crawling.

“Then we hit the part where crawlers try to figure out which pages are really useful.

The web is full of junk that no one cares about, being spammed to useless pages. These pages may still change regularly, they may have reasonable URLs, but they’re just doomed to landfill and ignored by any search engine that cares about users.

Sometimes it’s not just obvious garbage either. More and more, sites are technically okay, but just not up to the “standard” from a quality standpoint and deserve to be crawled more. “

Web crawlers use a limited set of URLs

Mueller concluded his response by saying that all web crawlers work on a “simplified” set of URLs.

Since there is no right way to crawl the web, as mentioned earlier, every SEO tool has its own way of deciding which URLs are worth crawling.

That’s why one tool might find backlinks that another tool doesn’t.

“So all crawlers (including SEO tools) work on a very simplified set of URLs, and they have to figure out how often to crawl, which URLs to crawl more often, and which parts of the web to ignore. There is no set rules, so each tool has to make its own decisions along the way. That’s why search engines index content differently, why SEO tools list links differently, and why any metrics built on top of those are so different .”


resource: Reddit

Featured image: rangizzz/Shutterstock





Source link

Related articles

Most Popular Baby Names 2024: Top Picks

Join us as we explore the captivating world of the most popular baby names for 2024! Which name will you choose...

Most Popular Baby Names 2024: Top Picks

Join us as we explore the captivating world of the most popular baby names for 2024! Which name will you choose...

How to Settle a Colic Baby: Proven Tips

Eager to discover effective ways to calm your colicky baby? From soothing techniques to critical consultation cues, let's explore what...

What Is Colic in Babies: Key Facts Revealed

Understanding what colic in babies truly entails can be a challenge for many parents. As the evening wears on, and the baby's cries reach a crescendo, an urgent question looms in the air: what now?

The 7 Best Ways to Gain Popularity

Online searches are often not the starting point...
spot_imgspot_img