Is Your Blog’s Crawl Budget So Low? Here’s How You Optimize

A crawling budget problem doesn’t necessarily mean a technical problem.

Aug 07, 2024

One budget that every SEO prefers to have unlimited: is the crawl budget.

It’s a term you see regularly on SEO blogs. The crawl budget is the amount of time and resources Google spends crawling a blog. Should you worry about this and optimize your blog according to your crawl budget? In this article, you’ll find out!

Let me start by reassuring you. Your blog doesn’t have many (more than 10,000) unique pages that change daily or more than a million pages that change regularly. Then you generally don’t need to worry about your crawl budget.

However, it’s a hot topic. SEOs often think they have problems with their crawl budget because their pages aren’t being crawled or indexed.

Crawl rate, crawl demand, and crawl budget

Crawl budget is a term coined by the SEO community. There were several definitions on the internet, which caused people to ask Google questions. For this reason, Google later adopted the term crawl budget in its own documentation. They don’t talk about a term, but about a crawl budget that consists of two parts: crawl rate and crawl demand.

Crawl rate/capacity limit

A blog is hosted on a server. When Google crawls a page, it requests the server. The server then provides a response within a certain time, which we call the response time. When Google starts crawling a website, it starts increasing the number of requests until it realizes that the response time is increasing. Google wants to prevent your server from becoming overloaded and to stop further overloading your server, a limit has been reached. Google talks about a crawl rate limit or crawl capacity limit.

More capacity on your server and lower response times can increase this limit. Are you increasing your capacity? Google will gradually increase the number of requests. More requests for a while: back to the old level. More requests for a while: back to the first level.

They continue like this until they realize that a new limit has been reached. It could also be that nothing happens and the crawl demand is less than Google can handle.

Crawl demand

Google doesn’t always use all its resources to continuously crawl your blog. This is where crawl demand plays a role. The fact that Google can crawl URLs on your blog doesn’t necessarily mean that Google wants to.

Many factors determine whether Google wants to crawl your site:

Quality of the site
Your authority
The number of changes to existing pages
The amount of new content you publish

The web is huge and it costs Google time and money to crawl and store all the data. Based on many factors, they determine whether it is necessary to crawl your site frequently or not.

Are you creating low-quality content? Then Google doesn’t want to waste time on it.
Do you publish little new content? Then Google won’t need to return to your site as often to find out if there is new content.
Does your current content change a little? Then Google won’t have to return as often to see if anything has changed.

A higher crawl frequency doesn’t mean that your content is of better quality. Google doesn’t look at how often content is updated, but rather at the quality of the content.

Crawl budget

The crawl rate and crawl demand together make up the final crawl budget: the amount of time and resources Google spends on crawling your site.

How can I test if my crawl budget is low?

You can test whether your crawl budget is too low by inspecting new or changed pages in Google Search Console.

Google didn’t crawl your page (again) a week after you made a request? Then a possible conclusion would be that your crawl budget is too low.

Another way to see this is by accessing the indexing report in Google Search Console. In the report, look at the excluded URLs and then “Found — currently not indexed”. Have a large proportion of your pages been found but not indexed? Then you can conclude that your crawl budget is too low. Google has already found the URL but has postponed the crawl to prevent its server from being overloaded.

It’s good to see if this concerns a large part of the site. Is it a smaller section and the pages aren’t important? One possible reason could be that Google simply doesn’t want to crawl the page.

Another way to see if the problem is the crawl rate or demand is to look at the crawl statistics in Google Search Console. You can find this by going to the settings and clicking on ‘crawl statistics’.

In this report, you will find the number of requests, the total download size, and the average response time. Not only can you see this in a clear graph, but it can also be broken down by:

Response
File type
Meta
Googlebot type

Analyze the data and ask yourself if you can recognize yourself in the following situations:

The number of crawl requests is very low compared to the number of pages on your site.
Googlebot Smartphone does not have the largest share of all crawl requests.
You publish a lot of content, but in some cases, Google tracks it for location purposes.

The above are some examples where your crawl rate could be a problem. In most cases, Google should be primarily concerned with OK responses (200), HTML files, renewal, and location as a Googlebot Smartphone.

How can I optimize my crawl budget?

Unfortunately, increasing your crawl budget doesn’t mean moving the slider left or right. Do the analyses above give you the idea that the crawl budget is a problem for your site? Then there are a few things you can do to increase it.

Server capacity

Do you realize that Google wants to crawl, but is hindered by your server capacity? Then it’s a good idea to discuss with your developers how your server capacity can be increased.

Optimize your blog’s speed

Another optimization you should discuss with your developers is optimizing the speed of the site. When pages load faster, Google can also crawl more pages.

Robots.txt

Through your robots.txt file, you can instruct Googlebot not to crawl certain URLs. In this way, you prevent Google from spending time and resources crawling unnecessary pages.

Are you hosting your site on a popular CMS? There’s a good chance that your robots.txt is already filled with a few lines. Do you still want to start using robots.txt? Then read how you can draw up these rules.

Common rules you can create:

Exclude sorting in online stores. For example, ?sort= has been added to the URL? Then you can exclude them: Disallow: /*?sort.
Exclude login URLs. Sometimes login URLs want to be long and unique. For example, does it start with /login/ followed by a unique value? Then you can exclude it: Disallow: /login/.
Exclude from search results pages. After searching on a website, you are directed, for example, to /search?q=search term? Then you can exclude them: Disallow: /search.

Warning! Only use robots.txt if you are sure that URLs are being crawled unnecessarily.

Blog hygiene

Maintaining the hygiene of your site is generally a good thing to do. By correcting errors on your site, you can also optimize your crawl budget.

Avoid duplicate content

Are there several pages that are very similar? Google then determines the main page that will be included in the index.

To take ranking again as an example. https://www.example.ao/category/ and https://www.example.ao/category/?sort=popular are practically the same pages, Google will not index both pages. By adding a canonical URL at https://www.example.ao/categorie/?sort=popular to https://www.example.ao/categorie/ you give Google two signals:

I am aware that this page is suspiciously similar to another page. I want this URL in the index. That’s how you help Google with this.
As this page is a duplicate, you don’t need to crawl it as often as the canonical URL. Google crawls canonical URLs less often.

Avoid (soft) 404s and 301s

When crawling your site, Google looks for internal links on your page. For example, is there a 404 here? Then Google won’t be able to proceed. Is there a 301 here? Then Google checks where this URL was moved to.

With a program like Screaming Frog, you can map where 404 and 301 errors can be found on your site.

Where did that URL go and can we replace it with a link that needs to be crawled? In this way, you ensure that Google only sees URLs with a 200 status code on a website. The chance of Google crawling these URLs is higher and even better for the user.

TIP! Note URLs that return more frequently, for example, redirects in your navigation or footer. You can easily adjust these internal links.

Also, read my other article on how to better configure Screaming Frog so that you can find all URLs.

It’s also good for resolving soft 404s. These are pages with a status code of 200, but which Google sees as 404. A list of URLs that Google sees as soft 404 can be found in the indexing report. Here you go to excluded and then to ‘soft 404’s’.

Example of a soft 404. Screenshot by the author

Better content

You can do any SEO optimization on your site to get better content. The better the content and the more links to your site, the higher Google’s crawl demand will be.

A crawl budget problem doesn’t necessarily mean a technical problem.

In addition, other site-specific solutions can be applied to optimize a crawl budget.

Not sure if you have a problem with your tracking budget? Then it’s advisable to have it investigated.

The coined term

Crawl budget is a term we see often, but it’s not always a problem for websites. It’s a term invented by the community, Google talks about crawl rate and crawl demand. Do you have a problem with your crawl budget? Then you can optimize your crawl budget to ensure that your pages are crawled.

This Is The Point

Discussion about this post