In response to a question about why SEO tools don’t show all backlinks, Google’s search advocate John Mueller said it’s impossible to crawl the entire web.
This is illustrated in a comment to Reddit in a post started by a frustrated SEO professional.
They asked why all the links to the site were not found by the SEO tools they used.
It doesn’t matter which tool the person uses.As we’ve learned from Mueller, it’s impossible any A tool for discovering 100% of your website’s inbound links.
that’s why.
There is no way to crawl the web “properly”
Mueller said there is no objectively correct way to scrape the web because it has an infinite number of URLs.
No one has the resources to keep endless URLs in a database, so web crawlers try to determine what’s worth crawling
As Mueller explains, this inevitably results in URLs being crawled infrequently or not at all.
“There is no objective way to properly scrape the web.
It is theoretically impossible to crawl them all, as the number of actual URLs is practically infinite. Since no one can afford to keep an infinite number of URLs in the database, all web crawlers make assumptions, simplifications, and guesswork about what’s really worth crawling.
Even then, for practical purposes, you can’t crawl all of them all the time, the internet doesn’t have enough connection and bandwidth, and if you want to visit a lot of pages on a regular basis, it costs a lot of money (for crawlers and website owners).
On top of that, some pages change quickly, and some haven’t changed in 10 years – so crawlers try to save work by focusing more on pages they expect to change rather than pages they expect not to change. “
How web crawlers determine what’s worth crawling
Mueller goes on to explain how web crawlers (including search engines and SEO tools) determine which URLs are worth crawling.
“Then we hit the part where crawlers try to figure out which pages are really useful.
The web is full of junk that no one cares about, being spammed to useless pages. These pages may still change regularly, they may have reasonable URLs, but they’re just doomed to landfill and ignored by any search engine that cares about users.
Sometimes it’s not just obvious garbage either. More and more, sites are technically okay, but just not up to the “standard” from a quality standpoint and deserve to be crawled more. “
Web crawlers use a limited set of URLs
Mueller concluded his response by saying that all web crawlers work on a “simplified” set of URLs.
Since there is no right way to crawl the web, as mentioned earlier, every SEO tool has its own way of deciding which URLs are worth crawling.
That’s why one tool might find backlinks that another tool doesn’t.
“So all crawlers (including SEO tools) work on a very simplified set of URLs, and they have to figure out how often to crawl, which URLs to crawl more often, and which parts of the web to ignore. There is no set rules, so each tool has to make its own decisions along the way. That’s why search engines index content differently, why SEO tools list links differently, and why any metrics built on top of those are so different .”
resource: Reddit
Featured image: rangizzz/Shutterstock
!function(f,b,e,v,n,t,s) {if(f.fbq)return;n=f.fbq=function(){n.callMethod? n.callMethod.apply(n,arguments):n.queue.push(arguments)}; if(!f._fbq)f._fbq=n;n.push=n;n.loaded=!0;n.version='2.0'; n.queue=[];t=b.createElement(e);t.async=!0; t.src=v;s=b.getElementsByTagName(e)[0]; s.parentNode.insertBefore(t,s)}(window,document,'script', 'https://connect.facebook.net/en_US/fbevents.js');
if( typeof sopp !== "undefined" && sopp === 'yes' ){ fbq('dataProcessingOptions', ['LDU'], 1, 1000); }else{ fbq('dataProcessingOptions', []); }
fbq('init', '1321385257908563');
fbq('track', 'PageView');
fbq('trackSingle', '1321385257908563', 'ViewContent', { content_name: 'googles-john-mueller-its-impossible-to-crawl-the-whole-web', content_category: 'news digital-marketing-tools ' });



