Google addressed the topic of robots.txt files and whether keeping them to a reasonable size is a good SEO practice.
Google’s search advocate, John Mueller, discussed this topic in a Google Search Central SEO office hour hangout recorded on January 14th.
Joining the livestream was David Zieger, SEO manager at a major German news publisher, who was concerned about the “huge” and “complex” robots.txt file.
How old are we talking here?
There are more than 1,500 lines with a “large number” of disallowed lines, which have grown over the years, Zieger said.
Forbidden prevents Google from indexing HTML fragments and URLs called using AJAX.
Zieger said that noindex can’t be set, which is another way to exclude snippets and URLs from Google’s index, so he took the approach of adding disallowed in the site’s robots.txt.
Does a huge robots.txt file negatively impact SEO?
This is what Mueller said.
SEO considerations for large Robots.txt files
Large robots.txt files will not directly Any negative impact on the SEO of the website.
However, large files are more difficult to maintain, which can cause unexpected problems.
Mueller explained:
“It has no immediate negative SEO issues, but it makes it harder to maintain. And it makes it a lot easier to accidentally push things that do cause problems.
So just because it’s a large file doesn’t mean it’s a problem, but it makes it easier for you to create problems. “
Zieger followed up to ask if there were any issues with not including the sitemap in the robots.txt file.
Mueller said it wasn’t a problem:
“No. Those different ways of submitting a sitemap are all equivalent to us.”
Zieger then asks several follow-up questions, which we discuss in the next section.
Does Google recognize HTML fragments?
Zieger asked Mueller how radically shortening the robots.txt file would affect SEO. For example remove all disallowed content.
Ask the following questions:
- Can Google recognize HTML fragments that are irrelevant to site visitors?
- If HTML snippets are not banned in robots.txt, will they end up in Google’s search index?
- How does Google handle pages that use AJAX calls? (such as header or footer elements)
He summarizes his problem by stating that most of what is not allowed in his robots.txt file are header and footer elements that are not of interest to the user.
Mueller said it’s hard to know exactly what would happen if those clips were suddenly allowed to be indexed.
Mueller explained that trial and error may be the best way to approach this problem:
“It’s hard to say what you mean by those pieces
My idea is to try to figure out how these fragment URLs are used. If you’re not sure, you can take one of the fragment URLs and allow it to crawl, look at the content of that fragment URL, and see what’s going on in the search.
Does it have any effect on the indexed content on your site?
Is it suddenly possible to find some of this content on your website?
Is this a problem?And try to work on top of that, because it’s easy to get blocked by robots.txt, not actually used for indexing, and then you spend a lot of time maintaining this big robots.txt file that doesn’t actually change that much for your site. “
Additional considerations for building the Robots.txt file
Zieger made a final follow-up to the robots.txt file, asking if there were any specific guidelines to follow when building the file.
Mueller said there is no specific format to follow:
“No, it’s basically up to you. Just like some sites have large files and some sites have small files, they should all work fine.
We have an open source code for the robots.txt parser we use. So you can also have your developer run that parser for you, or set it up so you can test it, then use that parser to check the URLs on your site to see which ones are actually blocked and what would that change. That way you can test them before making them work. “
The robots.txt parser mentioned by Mueller can be found at Github.
Listen to the full discussion in the video below:
Featured image: Screenshot from YouTube.com/GoogleSearchCentral, January 2022.



