Hashemian Blog

Huh? What?

Monday, May 23, 2005

robots.txt 

If you operate a public web site, there is little doubt that you'd like an occasional visit from search engine minions, known as robots. Robot are little agents that search engines dispatch to your site to scan your page contents and sent them back to the mother ship for cataloguing and finally including in search engine results pages (SERP).

If you have ever scanned your web logs, you would undoubtedly noticed these agents. They come with different names like googlebot, msnbot, and yahoo slurp. Almost all legitimate robots ask for permission before crawling a site, and the way that's done is through a file named robots.txt. This capability has been around since the early days of the search engines, but it is perhaps one of those often forgotten details. The reason is that if a robot can't locate /robots.txt on a Web site's root, it takes that as a green light to crawl and index the whole site.

robots.txt is flat ASCII file with a simple format. It is placed at the root directory of the Web site, so for example, it can be accessed this way: http://www.hashemian.com/robots.txt. If you want search engines to crawl your whole site, you would specify this inside robots.txt:
User-agent: *
Disallow:

If you want to block robots from a certain location of your site, you would specify this:
User-agent: *
Disallow: /certain-location

I won't bore you with the details. You can read about the stuff here.

Now the question is: if a missing robots.txt file is an open permission to crawl, why bother creating one? The best reason is to save on bandwidth. Many sites are designed to deliver a standard page to help lost users with missing pages. A robot looking for a missing /robots.txt file would also receive this page, and while in most instances, the standard error page will not cause any harm, the robot would still have to parse it, wasting bandwidth and resources. A safe practice to avoid this waste is to place an empty robots.txt on your Web site.

Finally, understand that /robots.txt works based on the honor system. While most legitimate search engines follow its instructions, there is no way to enforce obedience via this file.
<robots.txt>

0 comments |

0 Comments:

Post a Comment

This page is powered by Blogger. Isn't yours?

Links
  • Syndicate Hashemian.com/blog/
  • Subscribe to Hashemian.com/blog/ with Bloglines
  • Read Hashemian.com/blog/ with Bloglines
  • Subscribe to Hashemian.com/blog/ with My Yahoo!
  • Technorati Profile
  • TMCnet.com
  • ARCHIVES
  • 09/01/2003 - 10/01/2003
  • 03/01/2004 - 04/01/2004
  • 04/01/2004 - 05/01/2004
  • 05/01/2004 - 06/01/2004
  • 06/01/2004 - 07/01/2004
  • 07/01/2004 - 08/01/2004
  • 08/01/2004 - 09/01/2004
  • 09/01/2004 - 10/01/2004
  • 10/01/2004 - 11/01/2004
  • 11/01/2004 - 12/01/2004
  • 12/01/2004 - 01/01/2005
  • 01/01/2005 - 02/01/2005
  • 02/01/2005 - 03/01/2005
  • 03/01/2005 - 04/01/2005
  • 04/01/2005 - 05/01/2005
  • 05/01/2005 - 06/01/2005
  • 06/01/2005 - 07/01/2005
  • 07/01/2005 - 08/01/2005
  • 08/01/2005 - 09/01/2005
  • 09/01/2005 - 10/01/2005
  • 10/01/2005 - 11/01/2005
  • 11/01/2005 - 12/01/2005
  • 12/01/2005 - 01/01/2006
  • 01/01/2006 - 02/01/2006
  • 02/01/2006 - 03/01/2006
  • 03/01/2006 - 04/01/2006
  • 04/01/2006 - 05/01/2006
  • 05/01/2006 - 06/01/2006
  • 06/01/2006 - 07/01/2006
  • 07/01/2006 - 08/01/2006
  • 08/01/2006 - 09/01/2006
  • 09/01/2006 - 10/01/2006
  • 10/01/2006 - 11/01/2006
  • 11/01/2006 - 12/01/2006

  • Read Financial Markets  |   Home  |   Blog  |   Web Tools  |   News  |   Articles  |   FAQ  |   About  |   Contact

    © 2001-2008 Robert Vahid Hashemian
    Support the effort
    Liked this page?
    Please consider creating a link to it
    from your Web site.

    hashemian.com
    هاشمیان.com

     Home

     Blog

     Web Tools Add Free Web Tools custom Google Toolbar button (Requires Toolbar >V4)
    Usage

     News

     Articles

     FAQ

     About

     Contact

     Financial Markets Book
    Read Complete Book


    Search Amazon:  
    Amazon Logo
    |robotstxt|

    more…



    aStore - Hashemian.com on Amazon

    Visits: Powered by hashemian.com

     

     

     

     

     

    Search Hashemian.com




    eBay

    more…