Canonical URLs – Why You Should Care

Your site may have a page that can be reached through more than one URL variation – with and without WT.mc_id for example. This can cause search engine spiders to record more than one URL for that page, and that’s a bad thing. You can prevent the marker parameters (such as WT.mc_id) from being recorded by spiders, by using canonical link-tags.

Until Google published their Canonical URL link-tag standard in February 2009, we Outsiders hadn’t seen the word “canonical” in actual written form since grad school.

Anyway, one meaning of the word canonical is “the simplest form.”  In other words, a “canonical” mathematical model is the model with the fewest possible rules and variables, out of all possible mathematical models for a thing.

We love the irony of an obscure multisyllabic word that means “simple.”

They might have called it the “Standard for Preventing URL variations from being indexed.”  The SFPUVFBI.

You should care about Canonical URLs when you have a page on your site that can potentially be reached using multiple URLs, due to tracking parameters.

Example:

Your page http://yoursite.com/promo.asp can be reached through the plain URL.

But:

  • Because you want to track clicks from a special promo graphic on your pages, you’ve hard-coded the promo graphic’s link to go to /promo.asp?WT.ac=fromhomepage.  Or maybe /promo.asp?prevpage=homepage.
  • There are affiliate sites with links to your promo.asp page, and they have helped your tracking by hard-coding their links to go to /promo.asp?source=affiliatesitename
  • A search engine has followed a banner or affiliate or even paid search link that contains campaign parameters, and displays an organic search listing going to /promo.asp?WT.mc_id=oprahbanner or /promo.asp?WT.mc_id=paidsearchmsn
  • Somebody followed a link from one of your campaign emails and copied what they saw in their address bar into their blog, resulting in links going to /promo.asp?source=Feb2011email.
  • A search engine picks up that same link from that blog and puts it in the index.

See, that promo page can be reached through five or six different URLs, a plain one and several others with campaign parameters in them.

It should be obvious that the only one you want to be in the search engine index is the plain one.  If the wonky URLs are indexed and clicked on, your campaign reports will report on visits that appeared to come from a campaign … but actually came from an organic search listing.

This is bad.

There are other problems, too.  At best, multiple versions of a page’s URL will water down the ranks or PageRank for these pages.  At worst, the search engine will assume it’s seeing spamming, i.e. duplicate content on multiple pages.

The Canonical URL tag will fix  all of the above

All of this can be avoided by adding code to your page’s <head> section.  This bit of code was announced by Google back in February and has since been adopted, at least in intention, by other search engines such as MSN-LiveSearch, Yahoo and Ask.

(February 2011 update – Bing/Yahoo still ignores the Canonical tags!  We’re not sure about Ask.  We’ve added another post that uses Webmaster Tools to do pretty much the same thing as Canonical tags, and it WILL affect Bing/Yahoo.)

The code snippet is used only by the search engine spiders.  It states what you want to be the one and only way that you want the page to appear in their indexes.

<link rel=”canonical” href=”http://www.yoursite.com/subdirectory/promo.asp” />

Remember, it goes inside the  <head> section of the page.

By the way, this tag doesn’t affect what WebTrends sees or SDC records at all.  It’s only used by search engine spiders.

Think about how many of your pages might be affected by this problem, considering your banners, pay-per-click, affiliates, and on-site advertising.  If you have a lot of them, like we do, you might want to program your content management system to automatically put the canonical link-tag into the header of every page.