Now here is a subject thats close to me heart 🙂 Having built no less than 3 robots from scratch I know a fair bit about the subject.

However This post is about another type of Robot altogether, our little spider bots that roam the net 24 hours a day. More specifically this post is about the robots.txt file and why you should have one.

Search engines will look in your root domain for a special file named “robots.txt” (https://www.15dn.com/robots.txt) The file tells the robot (spider) which files it may spider (download). This system is called, The Robots Exclusion Standard. The reason you want a robots.txt file is so you can exclude certain files/directorys from being spidered and indexed. Now why would you want to do that? Well essentially it will prevent pages that you dont want people to see popping up somewhere in a search engine. This is my robots.txt file here at 15dn

User-agent: *
Disallow: /cgi-bin/
Disallow: /15dn Subscribe Pages/
Disallow: /Aff/
Disallow: /arp3/
Disallow: /Banners/
Disallow: /BonusBooksGiveAway/
Disallow: /Corey4StepsDownloadBin/
Disallow: /emailTemplate/
Disallow: /Flash/
Disallow: /FreeToSellDownLoadBin/
Disallow: /Funny/
Disallow: /Linkpage template/
Disallow: /MailMerge/
Disallow: /movabletype/
Disallow: /Spam/
Disallow: /wusage/
Disallow: /En/
Disallow: /Templates/

User-agent *

means all robots and then you can see the directorys I have disalowed below that line. The tutorial walks you through creating a robots file for yourself.

Of course only spiders that adhere to the robots.txt standard bother to read the file. Malicious robots dont bother and will still try to harvest email addresses etc from your site even if you disallow those pages in your txt file.

Mark…

PS
Have a great Easter weekend