Pawel Szulencki Search Engine Optimization/Marketing blog.
Welcome back! Thanks for sticking by.
There are two things that concern the duplicate content issue in seo: the duplicate content filter and penalty.
The duplicate content filter
The first one is a filter in Google ranking and indexing algorithms that strives to find replicates of information and filter them out. The filtering means that when two similar documents on separate URl addresses are found, Google determines which document is more important, which one is the “original” and which one is a copy. Next, the duplicated document is removed from indexes and only one document is presented in search engine results.
Filtering is not so dangerous to websites as its goal is to keep the indexes clean and out of duplicated content which in many cases is spam. That way Google serves more relevant information and keep records of original content.
This filtering means, for instance, that if your site has articles in “regular” and “printer” versions and neither set is blocked in robots.txt or via a noindex meta tag, we’ll choose one version to list.
The duplicate content penalty
Beside filters Google uses a special penalty which results in “appropriate adjustments in the indexing and ranking of the sites involved” in the duplicate content procedure. “As a result, the ranking of the site may suffer, or the site might be removed entirely from the Google index, in which case it will no longer appear in search results.”
So, for instance, Google finds two identical websites with the same content and discovers that this situation was made on purpose and not by accident. In most cases its about scraped content, spam attempts, doorway pages, too much similar content on two websites etc. At that time Google determines the original source of the content and adjusts the web pages rankings for certain keywords. The penalty concerns only certain pages involved in duplicate content, not whole websites.
If you see sudden fall in rankings for certain keywords or if you see some pages of your website not showing in results or showing on far places in results you may be experiencing a duplicate content penalty.
The most dangerous manifestation of duplicate content penalty is when your home page or other most important pages of your website get penalized. In that case your home page (which usually attracts the biggest number of visitors and gets the highest rankings in search engines) will not show on high positions for desired keywords or may not appear at all. In that situation you may loose a lot of potential customers and traffic.
Nevertheless
“…we prefer to focus on filtering rather than ranking adjustments … so in the vast majority of cases, the worst thing that’ll befall webmasters is to see the “less desired” version of a page shown in our index” - Google Webmaster Central.
Pawel Szulencki is a SEO (Search Engine Optimization) and Marketing certified specialist who is interested in organic SEO, paid campaigns (PPC) and Social Media Marketing channels. (Read more)
ketan (2 comments.)
December 5th, 2008 at 7:07 am
Hi. thanks for sharing this wonderful information.
After reading this post it will help me to manage my site in a better way .
ketan (2 comments.)
December 5th, 2008 at 7:11 am
can you help me to how to get ranked hire in google?
Pawel Szulencki (171 comments.)
December 8th, 2008 at 8:07 pm
@ketan: Im happy to hear you find those information useful. Read this blog to search for information you require and if you would like to use my SEO/SEM services please contact me for further details (blog [at] seoblogr.com)
uReba (1 comments.)
January 29th, 2009 at 1:47 am
Very informative article. I think it is very important to stay out of that duplicate content filter. I have read other stories that indicated that some of these scrapers can actually make themselves appear as the originator of the content. It gets very scary when you start throwing filters like this into the equation.
I wonder how much is true of the filter & them actually removing sites, unless it was blatent. Like for example a site with many pages, and the occasional dupe page, will they just weaken those pages or punish the whole site.
Perhaps one day we will know.
Pawel Szulencki (171 comments.)
January 30th, 2009 at 5:10 pm
@uReba: About the stories of scrapers pretending to be the originators - it can be true sometimes. I also read some stories about that but i never experienced it myself (luckily).
I believe that a reputable website with somehow scraped content (even by mistake or search engine misinterpretation) will not get hurt so much (if at all). But i have no 100% guarantee on that.
James S (1 comments.)
February 2nd, 2009 at 9:34 pm
In regards to the duplicate content part of this blog post, I personally use the http://www.copygator.com website to find and stop duplicate content:
1. it’s automated and brings me results instead of me searching for duplicated content. All i had to do was submit my feed and it started monitoring my feed showing me who’s republished my articles on the web.
2. i get notified by email so it contacts me when it finds copies of my articles online.
3. i use their image badge feature to alert me directly on my website when my content is being lifted.
4. it’s a free service as opposed the “per page” cost of copyscape/copysentry.
Pawel Szulencki (171 comments.)
February 3rd, 2009 at 9:37 am
@James S: Thx for the link. It is a good tool for keeping an eye on the copyright issues.