Pawel Szulencki Search Engine Optimization/Marketing blog.
Welcome back! Thanks for sticking by.
In order to help your users you may want to publish the same portions of information in different formats such as text format on website, PDF or Office Word document format in order to enable people to view or download their preferred format version. As much as its useful, it may create a duplicate content issue.
The problem is that search engines can also index other popular document formats and present them in search results. If they do not distinguish different document formats they may assume that the documents are duplicated with purpose to spam search engines and the site may face duplicate content troubles.
Pros of different document formats:
Cons of different document formats:
Conclusion
In most cases it is ok to publish various document formats of the same content on your website as search engines can differ those different formats and they do understand the purpose of publishing them on the same website.
Nevertheless if you want to make sure that your website will not face duplicate content issues I recommend to limit the number of different formats or disallow them from indexing with robots.txt file or “noindex, nofollow” meta tags.
Sphere: Related ContentPawel Szulencki is a SEO (Search Engine Optimization) and Marketing certified specialist who is interested in organic SEO, paid campaigns (PPC) and Social Media Marketing channels. (Read more)
Money Academy (2 comments.)
January 12th, 2009 at 3:43 am
this good for documents but when we talk about the blog/website content i don’t know how we treat it ..
if someone copy my articles and what if my article ( the original ) is not indexed , but the copied one indexed so what Google do ?
they count my articles as duplicated and the other is the original ?
this is not fair
any explain ?
Pawel Szulencki (171 comments.)
January 13th, 2009 at 11:16 am
@Money Academy: If your article is not indexed for any reason and someone copies your article, for Google that can mean that the other source is the original (if Google finds that article first with no link back to your article on your website).
When Google indexes your article they will try to determine which one is the original one and there is no 100% confidence that your article will be determined as the original one.
But why you assume that your article will not be indexed in the first place? Your job is to do all you can to help Google determine that your content is the original and in that case you have to make sure that your document is indexed as the first one and that by that document it is possible to determine that you created it (include your websites url address or any other details that can identify you as the author).
That should help.
Money Academy (2 comments.)
January 26th, 2009 at 4:33 pm
thank you Pawel , hope it be the original .
Money Academys last blog post..Short and easy system to drive free instant traffic to your site (Yahoo Answers Advance Traffic Report)
avoiding plagiarism (1 comments.)
May 22nd, 2009 at 1:55 pm
Thanks for that Pawel! I really like your post. Even the most reputable authors can “steal” other people’s work without realizing it. We have to stop that, I recommend checkforplagiarism.net