Google Analytics FAQs


Google's Page Indexing Non-Issue

I occasionally receive panicked emails from clients when they are notified by Google that many of their website pages are not being indexed (meaning that Google is not including them in their search results). The limited context that Google provides in their reports makes it seem like something has gone terribly wrong, when in reality there is actually no problem at all and everything is perfectly fine and running as intended and expected. In this article below I will explain in detail.

Google Page Indexing chart screenshot
A screenshot of an example Google Page Indexing chart. The chart is misleading because it makes it seem like only 819 out of 5,320 website pages are being indexed. In reality the website only has around 819 or so actual pages; the remaining 4,000+ pages are actually mostly just URL address variations that are not meant to be indexed to begin with.
Screenshot of Google Page Indexing reporting
This example screenshot of a Google Page Indexing report is more helpful because it lists the reasons why pages aren't indexed, but it's also misleading because it implies that these are all problems that need to be fixed. When you understand each listing, however, you can see that these are almost all valid, intended reasons for non-indexing.

Let's go through each of Google's listed reasons why pages aren't being indexed, to explain in detail what's going on.

Alternate page with proper canonical tag

→ INTENTIONAL, EXPECTED, NO PROBLEM HERE

This is by far the biggest reason why many page addresses aren't indexed, and it's fully intentional and expected - in fact this is a feature that prevents massive "duplicate content" ranking problems.

Keep in mind that many pages of your website potentially have several variations of their URL addresses. For example, a photo page in my "Colorado" gallery might have an address like:

https://www.mountainphotography.com/photo/capitol-peak-sunset/


But I've also included that photo in my "Fourteeners" gallery. When I'm browsing through the photos in the Fourteeners gallery, the photo page address has an additional parameter to mark that we are browsing the Fourteeners gallery, so that the website knows which photo is next and which gallery page to return to (rather than the photo's default Colorado gallery). So in this circumstance, the same photo page will have a slightly different URL address of:

https://www.mountainphotography.com/photo/capitol-peak-sunset/?gallery=fourteeners


See the difference there? The second address has "?gallery=fourteeners" appended to the end. It's the same page, but the extra parameter is necessary for the proper gallery browsing functionalities. This poses a dilemma, though, because in Google's eyes these are two different page addresses, but in reality there's only one page address that should be indexed.

To solve this confusion, the page has a canonical meta tag that tells Google to only index the base URL address (the first address above). Therefore, both these addresses (and any other potential variations) will always be indexed and ranked using the one main canonical address.

Without this canonical tag, Google would see two different pages with identical content, and they would penalize your website for having "duplicate content". But the canonical tag solves this issue and keeps everything nice and tidy for Google.

So, in a nutshell, those 4,622 "pages" listed in this category of the report above are not actually unique pages - they are just URL address variations of other existing indexed pages, and are not meant to be indexed. Google is stating that these addresses have proper canonical tags, and that is a good thing.

Not found (404)

→ LIKELY NOT AN ISSUE

404 pages are page addresses that do not exist on your website. There could be various reasons for this, including:

  • Perhaps you deleted a page (a photo, a gallery, a product, etc.).
  • Perhaps you changed the URL address of a page, then the old address no longer exists.
  • Perhaps these are old addresses from a previous website that no longer exist on the new website.
  • Perhaps you manually inserted an incorrect link someplace in the text on your own website.
  • Perhaps somebody else created an incorrect link to your website from some other website.

Whatever the case, since these pages don't exist on your site, of course they should not and will not be indexed by Google. So for the vast majority of these you can simply ignore them, as a 404 response is the correct response in this case.

The only scenario when action should be considered if is any of the listed 404 addresses are very important pages - in this case let me know the old address and the corresponding new address, then I can set up a redirect for you.

Page with redirect

→ INTENTIONAL, EXPECTED, NO PROBLEM HERE

These are page addresses that have been redirected to a different page. This is always intentional.

Excluded by 'noindex' tag / Blocked by robots.txt

→ INTENTIONAL, EXPECTED, NO PROBLEM HERE

These are pages that are purposely excluded from search indexing. For example, the Cart page will never be indexed.

Also, some pages can optionally be excluded from search indexing by you, in the admin. For instance, you can set some photos to NOT be included in search results - those will be excluded from Google indexing. Or, if you have other private pages that you don't want showing up in Google, you can add those to the "Excluded" list on the Sitemap page in the admin; those pages will also then get a 'noindex' tag so that Google doesn't index them.

Duplicate without user-selected canonical

→ LIKELY NOT AN ISSUE

These are pages that for whatever reason Google has determined that the page is too similar to another indexed page, so they have decided not to index it.

If you click on this listing in the report and look at the listed page URLs, they are often photo search pages which makes sense because the photo search results may end up looking quite similar to other gallery pages that include the same photos.

Soft 404

→ LIKELY NOT AN ISSUE

This means that the page is missing but for some reason the website didn't return a 404 "missing page" response.

Crawled - currently not indexed

→ MOSTLY NOT AN ISSUE, BUT YOU MIGHT WANT TO REVIEW SOME OF THESE PAGES

This is a list of pages that Google is aware of and has reviewed, however they have chosen not to index them. Looking at the list of these addresses, the vast majority of them tend to have URL parameters and canonical tags, which means that they are essentially in the same "Alternate page with proper canonical tag" category as above, in which case indexing would not be expected.

That said, some pages in here are simply not indexed because for whatever reason Google has deemed them to be unworthy of indexing. The reason is mostly likely because there is not enough unique content on those pages. So, common suspects here are photo pages where the photos don't have any caption or a short caption that doesn't really say much.

This is a good reminder as to why it's important to write unique, descriptive, and informative captions for your photos, and also to write a paragraph or two of text on each gallery page to offer some information about the subject of that page. (Read more about SEO best practices here for more info and tips).

Duplicate, Google chose different canonical than user

→ LIKELY NOT AN ISSUE

This is a curious but usually minor scenario when Google has decided that it will index a page under a slightly different address than what the website told it to via the "canonical" tag. This typically has to do with pagination; for example, if a gallery has multiple pages of thumbnails, Google may decide to index the page that shows all the thumbnails, instead of the first page.

Discovered - currently not indexed

→ NOT AN ISSUE

This is just saying that Google is aware that the page exists (usually via the website's Sitemap) but has not had a chance to index it yet.


Hopefully this helped explain why the vast majority of these "not indexed" listings are normal and expected. Rest assured that our WideRange websites are very Google-friendly and highly optimized for search engine performance! Please let me know if you have any questions.