15Degrees-NORTH BlogÂș


« Endurance Kart Racing Team: A New Year A New Team. | Main | Endurance Kart Racing: Round 1 2005 »

March 25, 2005

Internet Tech: Robots.txt

Now here is a subject thats close to me heart :) Having built no less than 3 robots from scratch I know a fair bit about the subject.

However This post is about another type of Robot altogether, our little spider bots that roam the net 24 hours a day. More specifically this post is about the robots.txt file and why you should have one.

Search engines will look in your root domain for a special file named "robots.txt" (http://www.15dn.com/robots.txt) The file tells the robot (spider) which files it may spider (download). This system is called, The Robots Exclusion Standard. The reason you want a robots.txt file is so you can exclude certain files/directorys from being spidered and indexed. Now why would you want to do that? Well essentially it will prevent pages that you dont want people to see popping up somewhere in a search engine. This is my robots.txt file here at 15dn

User-agent: *
Disallow: /cgi-bin/
Disallow: /15dn Subscribe Pages/
Disallow: /Aff/
Disallow: /arp3/
Disallow: /Banners/
Disallow: /BonusBooksGiveAway/
Disallow: /Corey4StepsDownloadBin/
Disallow: /emailTemplate/
Disallow: /Flash/
Disallow: /FreeToSellDownLoadBin/
Disallow: /Funny/
Disallow: /Linkpage template/
Disallow: /MailMerge/
Disallow: /movabletype/
Disallow: /Spam/
Disallow: /wusage/
Disallow: /En/
Disallow: /Templates/

User-agent *

means all robots and then you can see the directorys I have disalowed below that line. The tutorial Here walks you through creating a robots file for yourself.

Of course only spiders that adhere to the robots.txt standard bother to read the file. Malicious robots dont bother and will still try to harvest email addresses etc from your site even if you disallow those pages in your txt file.


Mark...

PS
Have a great Easter weekend

Posted by Mark at March 25, 2005 8:53 AM

Trackback Pings

TrackBack URL for this entry:
http://www.15dn.com/cgi-bin/mt/mt-tb.cgi/116

| Internet Tech

Comments

Post a comment




Remember Me?

(you may use HTML tags for style)

February 2012
Sun Mon Tue Wed Thu Fri Sat
              1   2   3   4
  5   6   7   8   9   10   11
  12   13   14   15   16   17   18
  19   20   21   22   23   24   25
  26   27   28   29            
Search this site:


Monthly Archives


Syndication and RSS feeds
Add to Google Reader or Homepage
Subscribe in NewsGator Online
Subscribe in Rojo
Subscribe in Bloglines
Add to My AOL
Blog Reviews
Blogarama - The Blog Directory
Listed in LS Blogs
Listed on Blogwise