« Endurance Kart Racing Team: A New Year A New Team. | Main | Endurance Kart Racing: Round 1 2005 »

March 25, 2005

Internet Tech: Robots.txt

Now here is a subject thats close to me heart :) Having built no less than 3 robots from scratch I know a fair bit about the subject.

However This post is about another type of Robot altogether, our little spider bots that roam the net 24 hours a day. More specifically this post is about the robots.txt file and why you should have one.

Search engines will look in your root domain for a special file named "robots.txt" (http://www.15dn.com/robots.txt) The file tells the robot (spider) which files it may spider (download). This system is called, The Robots Exclusion Standard. The reason you want a robots.txt file is so you can exclude certain files/directorys from being spidered and indexed. Now why would you want to do that? Well essentially it will prevent pages that you dont want people to see popping up somewhere in a search engine. This is my robots.txt file here at 15dn

User-agent: *
Disallow: /cgi-bin/
Disallow: /15dn Subscribe Pages/
Disallow: /Aff/
Disallow: /arp3/
Disallow: /Banners/
Disallow: /BonusBooksGiveAway/
Disallow: /Corey4StepsDownloadBin/
Disallow: /emailTemplate/
Disallow: /Flash/
Disallow: /FreeToSellDownLoadBin/
Disallow: /Funny/
Disallow: /Linkpage template/
Disallow: /MailMerge/
Disallow: /movabletype/
Disallow: /Spam/
Disallow: /wusage/
Disallow: /En/
Disallow: /Templates/

User-agent *

means all robots and then you can see the directorys I have disalowed below that line. The tutorial Here walks you through creating a robots file for yourself.

Of course only spiders that adhere to the robots.txt standard bother to read the file. Malicious robots dont bother and will still try to harvest email addresses etc from your site even if you disallow those pages in your txt file.


Mark...

PS
Have a great Easter weekend

Posted by Mark at March 25, 2005 8:53 AM

Trackback Pings

TrackBack URL for this entry:
http://www.15dn.com/cgi-bin/mt/mt-tb.cgi/116

| Internet Tech

Comments

Post a comment




Remember Me?

(you may use HTML tags for style)

June 2009
Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30        

Monthly Archives



Recent Entries


Syndication and RSS feeds


 Subscribe in a reader



Add to Google Reader or Homepage

Subscribe in NewsGator Online

Subscribe in Rojo

Subscribe in Bloglines

Add to My AOL

Add to netvibes

Blog listings, Reviews and Traffic


Blog Reviews





Blogarama - The Blog Directory



Listed in LS Blogs

Listed on Blogwise

Add me to your Blogroll


Blogroll Me!


License's



The Geek in me...


My blogger code: B9 d++ t+ k++ s u f i o+ x- e+ l c (decode it!)

Technorati Profile



Style by Us.




Powered by Movable Type
Version 4.25