Understanding the Index Coverage Gap
The gap between pages you submit in your sitemap and pages Google actually indexes is one of the most revealing SEO diagnostics available. In GSC's Pages report, you will see four categories: Error, Valid with warnings, Valid, and Excluded. The difference between your total submitted URLs and the "Valid" count is your coverage gap. A healthy site maintains a coverage ratio above 85%. Anything below 70% signals systemic problems.
Why Pages Get Excluded
Crawled - Currently Not Indexed
Google crawled the page but decided not to index it. This is the most frustrating exclusion because it means Google saw your content and deemed it insufficient. Causes include thin content (under 300 words with no unique value), duplicate or near-duplicate content across your own site, pages with a high ratio of boilerplate to unique content, and pages that load critical content via JavaScript that Googlebot fails to render.
Discovered - Currently Not Indexed
Google knows the URL exists (found it in a sitemap or internal link) but has not crawled it yet. On large sites, this backlog can grow to thousands of pages. It means your crawl budget is exhausted before Google reaches these URLs. Fix this by improving internal linking to priority pages, reducing crawl waste on low-value URLs, and ensuring your server responds fast enough to handle Googlebot's crawl rate.
Duplicate Without User-Selected Canonical
Google found duplicate content and chose a canonical different from what you specified (or you did not specify one at all). Audit your canonical tags across all templates. Every indexable page needs an explicit self-referencing canonical tag.
Alternate Page With Proper Canonical Tag
This status is typically fine. It means Google found the URL, recognized the canonical points elsewhere, and is respecting your directive. Check that the canonical target is actually the page you want indexed.
Diagnosing the Gap
- Export both reports: Download your sitemap URL list and the GSC Pages exclusion report
- Cross-reference: Match excluded URLs against your sitemap to find which submitted pages are not being indexed
- Categorize by exclusion reason: Group the gaps by GSC's stated reason
- Prioritize by traffic potential: Use keyword data to identify which excluded pages would drive the most traffic if indexed
- Check rendering: Use GSC's URL Inspection "View Tested Page" to see if JavaScript content is rendering for Googlebot
Common Fixes by Exclusion Type
| Exclusion Reason | Root Cause | Fix |
|---|---|---|
| Crawled, not indexed | Thin or duplicate content | Add 500+ words of unique content or consolidate pages |
| Discovered, not indexed | Crawl budget exhaustion | Improve internal linking, block low-value URLs |
| Duplicate without canonical | Missing canonical tags | Add self-referencing canonicals to all templates |
| Blocked by robots.txt | Overly broad disallow rules | Narrow robots.txt rules to target only non-indexable paths |
| Excluded by noindex tag | Accidental noindex deployment | Audit meta robots tags across all page templates |
Monitoring the Ratio Over Time
Track your index coverage ratio weekly: (Valid pages / Total submitted URLs) * 100. Plot this on a time-series chart alongside deployment dates. Any drop exceeding 5 percentage points within a week warrants immediate investigation.
For large sites, segment this ratio by section (blog, products, categories) to pinpoint which area is losing coverage. The GSC API lets you pull this data programmatically for automated dashboards.
When Exclusions Are Acceptable
Not every exclusion is a problem. Intentionally noindexed pages (admin panels, thank-you pages, internal search results), properly canonicalized variants (HTTP to HTTPS, www to non-www), and paginated archives all appear in the excluded count. Document your expected exclusions so you can distinguish intentional from accidental.