What is a Crawl Budget?

Search engines such as Google don't always spider every page on a site instantly. Unfortunately, sometimes, it can take weeks. The collection and indexing of websites are known as crawling. Thus, the crawl budget is the maximum number of pages that can be searched by a particular website.

Crawl budget can be described as the level of attention search engines give your site. It’s one of the key factors determining your search visibility - if your pages don’t get crawled, they won’t be indexed and displayed in the search results. A site's crawl budget is determined by the search engine, and once it is up, the crawler will stop accessing content on the site.

For most sites, crawl budget is not something to worry about. It's doubtful that Google would omit any pages on a typical small business website with tens of pages only. For really large sites, it becomes something to consider looking at.

On a large site, only the URLs that are assigned a high enough priority get crawled by Googlebot as Google has limited resources. There is a lot of spam on the web, so Google needs to develop mechanisms that let it avoid visiting low-quality pages.

Also, Google prioritizes crawling the most important pages. Googlebot is designed to be a good citizen of the web and doesn't want to waste the resources of all the websites out there.

A site's crawl budget (sometimes referred to as crawl space or crawl time) is based on two factors:

  • Crawl limit: this is how much crawling can a website handle, and what are its owner's preferences?
  • Crawl demand: which URLs are worth (re)crawling the most, based on its popularity and how often it's being updated.

To observe a site’s crawl budget, you can use Google’s Search Console, Ziyuan Baidu (tools for webmasters for the Chinese engine) or Yandex Webmaster. Your website's log files also provide a wealth of information to detect and help understand how engine robots pass and visit the URLs of a site.