Crawl Budget
Crawl budget is the limited resource of crawler requests and attention a site receives, influencing how quickly changes are discovered and indexed.
Also known as: crawler-budget · crawl-allocation
Why it matters
Large sites, faceted navigation, and stale parameter URLs can soak crawls on junk while money pages wait. Efficient architecture ensures crawlers spend time on URLs that should rank and convert.
How it works
Crawlers prioritize based on popularity, freshness, internal links, and sitemap signals. Low-value duplicates, 404 storms, and infinite faceted paths drain budget. Robots.txt, canonicals, noindex, and consolidation direct crawlers toward important URLs.
Common mistakes
- Leaving internal search result pages crawlable and linked.
- Publishing massive low-value tag pages with no unique content.
- Ignoring 404 and 5xx spikes after deploys.
- Blocking CSS/JS needed for rendering while expecting full indexing.
Best practices
- Log crawl stats in GSC after major template launches.
- Canonicalize or noindex parameterized duplicates.
- Fix internal links pointing at dead or redirected chains.
- Keep XML sitemaps limited to indexable, canonical URLs.
Learn Domains perspective
Faceted search created thousands of thin URLs while your pricing page waited days to get recrawled. Learn Domains surfaces index coverage anomalies from Search Console and keeps your URL Library focused on pages worth linking to, so internal link suggestions do not send crawlers into dead ends.
FAQ
- Do small sites need to worry about crawl budget?
- Usually less than enterprise sites, but fixing obvious duplicate paths still helps discovery speed.
- Does robots.txt save crawl budget?
- It prevents crawling of disallowed paths, but links to disallowed URLs may still waste some attention.
- How fast will Google recrawl after a fix?
- Depends on site authority and signals, days to weeks; request indexing for critical URLs.
Next steps
- 1Identify top excluded URL patterns in GSC.
- 2Noindex or canonicalize low-value parameter pages.
- 3Clean sitemap entries that reference non-indexable URLs.
Knowledge graph
Parent terms
Related concepts
index-coverage · programmatic-seo · url-library