Mark Waterfield: Addressing Similar Content: When to Canonicalize, Noindex, or Do Nothing by @@joshuacmccoy

Every day in SEO, there are worries, wonders, and what-ifs. These include questions such as, “Does Google like my site?” and, “Am I maximizing my bot crawl or am I giving them too much to crawl?”

While we only want to give search engines quality content for indexation and consideration, it can be tempting to weed down your indexable content.

Also, you might worry that search engines don’t understand your content because there are similar pages due to product variances, etc.

You can seriously hurt your SEO by making rash decisions that might make Google happy but ultimately damage your organic visibility potential.

Let’s look at why you may want to shape your indexation/crawlable pages as well as a few examples of taking this all too far.

Canonicalize?

Sample of a canonical link

Using canonical tags can be a great way to instruct a crawling search engine as to the representative version of duplicate content or similar content. All you have to do is place this tag within the head section of the source code.

The canonical tag can be a great way to deal with content that you know is duplicate or similar, but for user needs on the site, or a slow site maintenance team, it must exist.

How do you figure out if you have similar or duplicate content on your site?

The easiest test is to manually review your site and address site sections that appear to have separate URLs but have similar content (e.g., copy, image, headings, title elements, etc.). Take a few of these URLs and use a tool like Similar Page Checker. You can also use Siteliner, which will review your site for similar content.

Now that you have a good feel for cases of similarity, you need to understand if this lack of uniqueness is worthy of canonicalization. Here are a few examples and solutions.

Case 1: Your site exists at both HTTP and HTTPS versions of site pages.

Solution: Canonical tag to the page version that has the greatest amount of links, internal links, etc., until you can do a one-to-one redirect of all duplicating pages.

Case 2: You sell different types of T-shirts with several shirt topics per each shirt type. There is no unique copy on these pages, just the unique name, image, price, etc. Should you canonically point the child product pages to the parent shirt page?

Solution: Do nothing. These pages are unique enough to be indexed. They have unique names differentiating them and this could help you for long-tail keyword instances.

Case 3: You sell T-shirts but have a page for every color and for every shirt.

Solution: Canonical tag the color pages to reference the parent shirt page. Each page isn’t a separate product, just a simple variation.

Use Case: Canonical Tagging Content That’s Unique Enough to Succeed

Similar to the example presented above, I wanted to explain that sometimes slightly similar content can still be appropriate for indexation. What if it was shirts with child pages for different shirt types like long-sleeve, tank top, etc.? This now becomes a different product, not just a variation.

A great example of this is an automotive sales site which features pages for car makes, associated models, and variations of those models (2Dr, 4Dr, V8, V6, deluxe edition, etc.). The initial thought with this site is that all of the variations are simply near duplications of the model pages. So why would we want to annoy search engines with this near duplicative content when we can canonicalize these pages to point to the model page as the representative page? We moved in this direction but still the anxiety on whether these pages could succeed made us move to canonical to each respective page.

So, what did we see?

We found that organic traffic increased to these child pages but also to the parent pages. It’s my opinion that when you give credit back to the child pages, the parent page looks to have more authority as it has many child pages which are now given back “credit”.

Monthly traffic to all these pages together grew 5x. Since September, when we revised the canonical tags, there is now 5x monthly organic traffic to this area of the site with 754 pages driving organic traffic compared to the 154 recognized in the early Fall.

Line graph showing rise in organic traffic due to canonical tags

Don’t Make These Canonicalization Mistakes

Setting canonical tags that endure a redirect before resolving to the final page. Don’t slow down search engines by forcing them into redirects and pointing importance to a page that is not live.
Canonical tags set to 404 error pages.
Canonical tagging to the wrong page version, i.e., www./non-www., HTTP/HTTPS.

Noindex?

Noindex

You can also use of the meta robots noindex tag to fully exclude similar or duplicate copy. Placing the noindex tag in the head section of your source code will stop search engines from indexing these pages.

While the meta robots noindex tag is a quick way to remove duplicate content from ranking consideration, it can be dangerous to your organic traffic if you fail to use it appropriately.

This tag has been used in the past to weed down large sites to present only search-critical site pages so that site crawl spend is as efficient as possible. However, you want search engines to see all relevant site content to understand site taxonomy and hierarchy of pages.

Here are a couple ways noindexing might be discussed as a solution.

Case 1: To aid your customers, you provide documentation from the manufacturer, even though they feature this on their site.

Solution: Continue to provide the documentation to aid your on-site customers but noindex these pages. They are already owned and indexed with the manufacturer, which likely has much more domain authority than you. Essentially, you aren’t going to outrank them for this content.

Case 2: You offer several different, but similar products. The only differentiation is color, size, count, etc. We don’t want to waste crawl spend.

Solution: Solve via the use of canonical tags. A long-tail search could drive qualified traffic because a given page would still be indexed and would be able to rank.

Case 3: You have a lot of old products that you really don’t sell much of anymore and are no longer a major focus.

Solution: Either canonically point these pages to relevant categorical pages or redirect them to relevant categorical pages. These pages have age/trust, may have links, and may possess rankings.

Use Case: Don’t Sacrifice Rankings/Traffic for Crawl Spend Considerations

When it comes to our site, we know we want to put our best foot forward for search engines. We don’t want to waste their time when crawling and we don’t want to create a perception that most of our content lacks uniqueness.

In the example below, to reduce bloat of somewhat similar product page content from search engine review, meta robots noindex tags were placed on child product variation pages during the time of a domain transition/relaunch.

The below graph shows total keyword amounts which transitioned from one domain to another. When the meta robots noindex tags were removed, the overall amount of ranking terms grew by 50 percent.

Line graph showing rise in ranking when noindex tags were removed during domain transition

Don’t Make These Meta Robots Noindex Mistakes

Don’t place a meta robots noindex tag on a page with inbound link value. If so, the page in question should be permanently redirected to another relevant site page.
If you’re noindexing a page which is included in main, footer, or supporting navigation, make sure that the directive isn’t “noindex, nofollow” but “noindex, follow” so search engines that are crawling the site can still pass through the links on the noindexed page.

Conclusion

The canonical and meta robots noindex tags can be useful tools to instruct search engines on known content similarity and duplication, or the removal of needless content from search engine indexes.

Just be careful how you tag! It’s easy to hinder your organic search visibility potential by inaccurately architecting how your site should be crawled.

Image Credits

Featured image courtesy of Shutterstock and edited in Canva, May 2017.

Screenshots by Josh McCoy. Taken May 2017.

Mark Waterfield

Monday 5 June 2017

Addressing Similar Content: When to Canonicalize, Noindex, or Do Nothing by @@joshuacmccoy