Today is: Friday, 3rd September 2010
Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy.
Subheader: Lorem ipsum dolor sit amet, consectetuer adipiscing elit, sed diam nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex.
Can you hurt your PR by blocking pages on robots.txt?

Recently I have been working hard to reduce the number of errors for my website as shown in Google Webmasters account under Diagnostics, Content Analysis. At the moment, I’m seeing many errors under “Duplicate meta descriptions” and “Duplicate title tags” consisting of the same URL but with the tracking parameters we use for tracking conversions on external traffic sources.
For example:
http://mywebsite.com/page1.html
http://mywebsite.com/page1.html?source=abc
My orginal solution was to add this entry to my robots.txt:
Disallow: /*/?source=*
This way I can block Googlebot from indexing these pages containing the tracking code as duplicate pages. However, I was a bit concerned about the fact that some of these pages are on external sites and are passing SEO link juice to my site. So, if I block these pages for Googlebot, I may be shooting myself in the foot by loosing this SEO juice, and may hurt my PR. What a nightmare!!! So I asked myself:
1) In order to make a more informed decision, I is there a tool out there to help me measure how many of links w/ ?source= are out there passing SEO juice to my site? (Maybe it is not a big deal and I could easily block these URLs within minimum pain!)
2) Is there any other way to remove these errors besides blocking these pages using robots.txt? (Please note that I can’t get rid of these tracking codes since we are using these to measure conversions on several traffic sources, and it would be a very difficult task)
The answer were simple:
Firstly google webmaster tools allows you to see how many pages are linking at any particular page. this feature can be used for finding links to 404s (as shown here mattcutts.com/blog/free-direct-text-links/ ) or to any type of page
this link analysis tool
http://www.blogstorm.co.uk/link-analysis-tool/502/
(which requires setting up a MySQL database on your server) looks at your site’s pages indexed in Yahoo, and then shows how many backlinks yahoo shows for each page
Majestic Seo
http://www.majesticseo.com/
allows you to download a backlink report for your site for free
However a better way to remove this error would be to 301 redirect these tagged pages to the associated core URL. I am not great with .htaccess…but if you hire a competent programmer they should be able to come up with the code needed to 301 redirect the source= version of a page to the regular page.
this way you get the pagerank and anchor text benefit of the links without any duplicate content worries.
This however:
One solution I thought was to add this entry to my robots.txt:
Disallow: /*/?source=*
This is a bad idea.
Hypothetically suppose you have the same content under 10 different sources. Google sees this as 10 individual pages. Even by blinding Google to 9 of them, you’re still spilling pagerank left and right (its trying to rank 10 pages when it should be ranking one), and it is stopping at the border of your site rather than flowing through your internal links to more important pages.
The better option:
- RewriteRule (.*)?source=[^&]*$ $1 [R=301]
redirects URLs where source is the only query parameter.
- RewriteRule (.*)?source=[^&]*&(.*)$ $1?$2 [R=301]
redirects URLs where source is the first parameter.
- RewriteRule (.*)&source=[^&]*(.*)$ $1$2 [R=301]
redirects URLs where source is not the first parameter.
If you want to know WHY these work, your local university has a CS degree program available, or alternatively you can look up “regular expressions” and prepare for some pain
As always, I write these things without testing. Test before deploying in some temporary directory.
It may though depend on how your site is configured…many affiliate driven systems drop a cookie during the redirection process. some redirects may not though…so you have to see how your site is configured from a programming standpoint.
If you are tracking conversions you could always buy from yourself after clicking an affiliate link like this (Botox Sheffield; shamless I know) or other tracking link to see if it was tracked.