Close Print

Robots & Spiders

by Rayz
Published on: January 29, 2009
Categories: Google
Tags: No Tags
Comments: No Comments

A web-spider also known as web-crawler is a program that crawls through websites and indexes the web pages present in them to feed their search engine database.

Spiders, depending on their proprietary algorithm checks every webpage, weighs them, assigns a page rank based on the page’s relevancy and writes the rank over to their Search Engine Index. This index is then used by Search Engines to list the web pages for every search that is being conducted over their search portal.

Robots.txt is a file kept in the root of every website and is used to instruct a search engine spider as to what areas of the website, they are allowed and not allowed to crawl.

Spiders crawl a website based on the instructions written down in the Robots file. The Robots file is built with specific commands that a spider will look for, that lays down directions for the crawler. The spider then crawls the website collecting information and then indexing the words on its pages and following every link found within the website. Spiders can read mostly every text on a webpage (except for text within images, videos and Flash), headings, alt tags, link titles, keywords, hidden texts etc. It can even traverse through URL’s that goes to another domain outside the current website.

Robots and spiders together would help a search engine analyze and fetch accurate page information for every website.

Post to Twitter Post to Delicious Post to Digg Post to Facebook Post to LinkedIn Post to MySpace Post to Reddit Post to Squidoo Post to StumbleUpon Post to Technorati

No Comments - Leave a comment

Welcome , today is Saturday, January 20, 2018