Robots.txt is a file kept in the root of every website and is used to instruct a search engine spider as to which web pages of your website should be indexed and which web pages should be ignored.
The Robots file is built with specific commands that a spider will look for, that lays down directions for the crawler.
Let’s analyze what is contained in a robots.txt file:
• Wildcards “/” or “*”
User-agent: Refers to any search engine spider.
User-agent: googlebot , allows only Google to spider the website
User-agent: *, allows every search engine’s spiders to crawl the website.
Disallow: Used to specify folders or files which a spider is not allowed to crawl.
Disallow: /, specifies that none of the files in the website should be crawled.
Disallow: /images/, specifies that any file within the folder ‘images’ should not be crawled. A few command sets that could be useful for a webmaster while creating Robots.txt is given below: Allow all search engine spiders to index all files
Disallow: Allow only Google’s search engine spiders to index the website
Disallow: To ignore all files in a specific directory
Disallow: /images/ To ignore only a specific file
If you don’t want any search engines to index any files on your website, use the following: