Pawel Szulencki Search Engine Optimization/Marketing blog.
Welcome back! Thanks for sticking by.
What is robots.txt file?
Robots.txt file is plain text file (no HTML). The robots.txt tells search engine crawlers what to exclude from indexation on your website. When they visit your site they try to follow all links and index all your content including folders and sub folders. If you have anything to hide from search engines or you want to prevent from duplicate content you should create a robots.txt file.
How to create a robots.txt file?
First of all the robots.txt MUST be placed in the root directory of your web (which is usually the place where you keep your index.htm, index.html, index.php or any file you use as your home page.
The example robots.txt file may look as follows:
User-agent: *
Disallow:
User-Agent: the asterix (*) refers to any bot
Disallow: here you put all sites/folder/files you want to be excluded from the indexes of search engines
Examples.
Allow all search engines to index the site:
User-agent: *
Disallow:
Disallow all search engines from site indexation:
User-agent: *
Disallow: /
Disallow Google search engine from indexing “private” folder and anything that is inside that folder:
User-agent: googlebot
Disallow: /private/
Disallow MSN from indexing anything while all other search engines can crawl all your content:
User-agent: MSN
Disallow: /
Allow only Google to index your site, all other search engines are disallowed:
User-agent: googlebot
Disallow:
User-agent: *
Disallow: /
The list of search engine bots is available at search engine dictionary website.
Robots.txt for SEO.
You may disallow Google Image crawler from indexing your pictures in order to save broadband on your server by putting the following code into robots.txt file:
User-agent: Googlebot-Image
Disallow: /my-picture-folder/
You can exclude search engines from indexing certain parts of your server to avoid showing some files in search engine result pages to the users:
User-agent: *
Disallow: /archives/
Disallow: /passwords/
Disallow: /cgi-bin/
Disallow: /private/top-secret-files/
By manipulating the robots.txt file you have bigger control over what search engines index on your site. Its the only time you may actually order search engines to do something…or to be clear, what not to do.
For instance if you are testing different versions of the same page you should exclude that testing folder from indexation so there was no duplicate content issue within your site.
Check The Web Robots Page for more details on robots.txt web standard.
Sphere: Related ContentPawel Szulencki is a SEO (Search Engine Optimization) and Marketing certified specialist who is interested in organic SEO, paid campaigns (PPC) and Social Media Marketing channels. (Read more)
Add New Comment