# See for
# detailed info on excluding robots from a site.
#
# See for
# a way to validate the contents of this file.
#
# updated: 14-Jan-2006, George A. Theall
# Linkcheckers get pretty much free reign.
#
# o MOMspider can get everywhere.
User-agent: MOMspider
Disallow: /nogo
# Selected search engine 'bots get pretty much free reign.
# nb:
# appie => Walhello, http://www.walhello.com/
# boitho.com-dc => Boitho, http://www.boitho.com/, Norwegian search engine
# fast => Fastsearch (used by alltheweb.com)
# gaisbot => Gais, http://gais.cs.ccu.edu.tw/, Taiwanese search engine
# GalaxyBot => Galaxy, http://www.galaxy.com/
# Googlebot => Google
# Mercator + Scooter => AltaVista
# Mj12bot => Majestic-12, http://www.majestic12.co.uk/projects/dsearch/mj12bot.php, a distributed search engine.
# mogimogi => http://www.goo.ne.jp/, Japanese search engine.
# mozDex => http://www.mozdex.com/, an open source search engine
# msnbot => MSN Search.
# NG => Exalead, http://www.exalead.com/, French search engine
# Nutch =>, http://www.nutch.org/, open-source search engine
# Pompos => dir.com, http://dir.com, French search engine
# QuepasaCreep => quepasa.com, Latin American portal / search engine
# Slurp => Inktomi (includes MSN Search and HotBot)
# VIAS => http://vias.ncsa.uiuc.edu/viasarchivinginformation.html
# VoilaBot => http://www.voila.com (French search engine)
# Zao => Kototai, http://www.kototai.org/, Japanese search engine research project
# ZyBorg => WiseNut, http://www.wisenut.com/, and Looksmart
User-agent: appie
User-agent: boitho.com-dc
User-agent: fast
User-agent: gaisbot
User-agent: GalaxyBot
User-agent: Googlebot
User-agent: Mercator
User-agent: Mj12bot
User-agent: mogimogi
User-agent: mozDex
User-agent: msnbot
User-agent: NG
User-agent: Nutch
User-agent: Pompos
User-agent: QuepasaCreep
User-agent: Scooter
# NB: for the month of July 2004, Intomi's slurp 'bot has done nothing
# but try to grab invalid URLs (other than robots.txt), URLs that
# *never* existed here. Can you say "database corruption"? :-(
#User-agent: Slurp
User-agent: VIAS
User-agent: VoilaBot
User-agent: Zao
# NB: starting in January 2005, looksmart's seems to have switched from
# WiseNut to grub for its crawler. The later doesn't bother
# requesting robots.txt and doesn't seem to understand response
# codes of 403. So should WiseNut ever come back, screw 'em.
# User-agent: Zyborg
Disallow: /cgi-bin
Disallow: /code
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~amanda/pics
Disallow: /~amanda/videos
Disallow: /~gpt/pics
Disallow: /~gpt/videos
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# Other 'bots that I'm ok with.
# o IBM Almaden Research Center.
User-agent: http://www.almaden.ibm.com/cs/crawler
Disallow: /cgi-bin
Disallow: /code
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~amanda/pics
Disallow: /~amanda/videos
Disallow: /~gpt/pics
Disallow: /~gpt/videos
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o The Internet Archive, http://www.archive.org/.
User-agent: ia_archiver
Disallow: /cgi-bin
Disallow: /code
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~amanda/pics
Disallow: /~amanda/videos
Disallow: /~gpt/pics
Disallow: /~gpt/videos
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o LinkWalker, http://www.seventwentyfour.com/, for checking links.
User-agent: LinkWalker
Disallow: /cgi-bin
Disallow: /code
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~amanda/pics
Disallow: /~amanda/videos
Disallow: /~gpt/pics
Disallow: /~gpt/videos
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# o research project from Kitsuregawa Laboratory, The University of Tokyo.
User-agent: Steeler
Disallow: /cgi-bin
Disallow: /code
Disallow: /hidden
Disallow: /icons
Disallow: /nogo
Disallow: /zips
Disallow: /~amanda/pics
Disallow: /~amanda/videos
Disallow: /~gpt/pics
Disallow: /~gpt/videos
Disallow: /~theall/bookmarks
Disallow: /~theall/wedding
# All robots are excluded by default. Please direct requests to
# allow access to webmaster@tifaware.com.
#
# 'bots I know about but don't want to bother with
# o Girafabot
# Used by girafa.com to visualize search results. I'd be ok
# with this if only they'd respect robots.txt.
# o grub-client, http://grub.org/html/documents.php?op=robots-faq
# Distributed crawler for the grub search engine. I'd be ok
# with this if only they'd respect robots.txt.
# o lachesis, ftp://ftp.imag.fr/pub/labo-LSR/DRAKKAR/internet-performance/lachesis/
# Supposedly an Intel tool for measuring ISP latency, although
# after examining it I think it's mis-identified.
# o larbin, http://larbin.sourceforge.net/index-eng.html
# Multi-purpose web crawler.
# o Mozilla/4.0 (efp@gmx.net)
# Spammer tool to scrape email addresses.
# o NPBot, http://www.nameprotect.com/botinfo.html
# Used by NameProtect to scan for brand / IP violations.
# o Psbot, http://www.picsearch.com/bot.html
# Used by Picsearch to index pictures. I don't really have any
# pictures here that I want indexed.
# o Teoma
# Used by AskJeeves search engine. I'd be ok with it if only
# it would respect exclusions in robots.txt.
# o TurnitinBot, http://www.turnitin.com/robot/crawlerinfo.html
# Used by Turnitin.com to prevent plagarism.
User-agent: *
Disallow: /
Disallow: /nogo