by Rayz
Published on: January 29, 2009
Categories: On Page Optimization
Tags: No Tags
Comments: No Comments

Robots.txt is a file kept in the root of every website and is used to instruct a search engine spider as to which web pages of your website should be indexed and which web pages should be ignored.

The Robots file is built with specific commands that a spider will look for, that lays down directions for the crawler.

Let’s analyze what is contained in a robots.txt file:

• User-agent
• Disallow
• Wildcards “/” or “*”

User-agent: Refers to any search engine spider.

For eg:
User-agent: googlebot , allows only Google to spider the website

User-agent: *, allows every search engine’s spiders to crawl the website.

Disallow: Used to specify folders or files which a spider is not allowed to crawl.

For eg:
Disallow: /, specifies that none of the files in the website should be crawled.

Disallow: /images/, specifies that any file within the folder ‘images’ should not be crawled. A few command sets that could be useful for a webmaster while creating Robots.txt is given below: Allow all search engine spiders to index all files

User-agent: *
Disallow: Allow only Google’s search engine spiders to index the website

User-agent: googlebot
Disallow: To ignore all files in a specific directory

User-agent: *
Disallow: /images/ To ignore only a specific file

User-agent: *
Disallow: /images/sample.jpg

If you don’t want any search engines to index any files on your website, use the following:

User-agent: *
Disallow: /

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to LinkedIn Post to MySpace Post to Reddit Post to Squidoo Post to StumbleUpon Post to Technorati

No Comments - Leave a comment

Leave a comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

Welcome , today is Saturday, January 20, 2018