Default robots.txt File For Web-Server

How do I create a default robots.txt file for the Apache web-server running on Linux/Unix/MS-Windows server?

Web Spiders, (also known as Robots), are WWW search engines that “crawl” across the Internet and index pages on Web servers. The robots.txt file help webmasters or site owners to prevent web crawlers (robots) from accessing all or part of a website. Web site owners use the robots.txt file to give instructions about their site to web robots using the Robots Exclusion Protocol.

robots.txt File Syntax and Rules

The robots.txt file uses basic rules as follows:

  1. User-agent: The robot the following rule applies to
  2. Disallow: The URL you want to block.
  3. Allow: The URL you want to allow.

Examples: The default robots.txt

To block all robots from the entire server create or upload robots.txt file as follows:

User-agent: *
Disallow: /

Above two lines are considered a single entry in the file. To allow all robots complete access to the entire server create or upload robots.txt file as follows:

User-agent: *


User-agent: *

Please note that User-agent: * means match “any robot”. You can include as many entries as you want. You can include multiple Disallow or Allow lines and multiple user-agents in one entry. The following example tells robots to stay away from /foo/bar.php file

User-agent: *
Disallow: /foo/bar.php

In this example, you instructs all robots not to enter in /cgi-bin/ and /print/ directories:

User-agent: *
Disallow: /cgi-bin/
Disallow: /print/

This example tells a specific robot called fooBar to stay away from your web-site. fooBar is the name of the actual user-agent of the bot. Feel free to replace ‘fooBar’ with the actual user-agent of the bot:

User-agent: fooBar
Disallow: /

To block files of a specific file type say all *.png image files, use the following syntax for googlebot:

User-agent: Googlebot
Disallow: /*.png$

The following example disallows a Robot named “fooBar” from the paths “/cgi-bin/” and “/pdfs/”:

# Tell "fooBar" where it can't go
User-agent: fooBar
Disallow: /cgi-bin/
Disallow: /pdfs/
# Allow all other robots to browse everywhere
User-agent: *

In this example, I am only allowing a Web Spider named “googlebot” into a site, while denying all other Spiders:

# Allow "googlebot" in the site
User-agent: Googlebot
# Deny all other spiders
User-agent: *
Disallow: /

How do I create a robots.txt file on my server?

Please note that a robots.txt file is a special text file and it is always located in your Web server’s root directory. It should be noted that Web Robots are not required to respect robots.txt files, but most well-written Web Spiders follow the rules you define. You can create robots.txt on your system and upload it using ftp client.

You can login to your server using ssh command and use a text editor such as vi to create a robots.txt file. In this example, I am login to server called and creating the file at /var/www/html directory from OS X or Linux/Unix based desktop system. MS-Windows user try putty ssh client:
cd /var/www/html
vi robots.txt

Sample robots.txt file

Sample robots.txt file from

#Allow Google Media Partners bot
User-agent: Mediapartners-Google
#Block the bad bots
User-agent: ia_archiver
Disallow: /
User-agent: VoilaBot
Disallow: /
User-agent: Baiduspider
Disallow: /
User-agent: MJ12bot
Disallow: /
User-agent: BecomeJPBot
Disallow: /
User-agent: Exabot
Disallow: /
User-agent: 008
Disallow: /	
User-agent: Sosospider
Disallow: /
#Block specific urls and directories for all bots
User-agent: *
Disallow: /low.html
Disallow: /lib/
Disallow: /rd/
Disallow: /tools/
Disallow: /tmp/
Disallow: /*?
Disallow: /view/pdf/faq/*.php 
Disallow: /view/pdf/tips/*.php 
Disallow: /view/pdf/cms/*.php

Posted by: SXI ADMIN

The author is the creator of SXI LLC and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter.

Leave a Reply

Your email address will not be published. Required fields are marked *

Previous Post

How to Make Website WCAG Compliant?

Next Post

Link download Kali Linux 2020.1 (ISO + Torrent)

Related Posts