Imagine you’re a Internet marketing service company and you keep trying very hard to get a top ranking in the search engines for your customer.
Even after several weeks, the customer’s Web site hasn’t been listed in any search engine. Then you start to realize that the search engine spiders and robot programs cannot access the Web site because your customer blocks them (by mistake).
There are two ways to block search engine robots: a) with a simple text file in the root directory of the host server, or b) with a certain META tag in the Web pages.
The host server might have a plain text file named “robots.txt” in the root directory. It contains rules for the search engine spiders. The rules in the robots.txt file follow the Robots Exclusion Protocol, a document designed to help Web administrators and authors of Web spiders agree on a way to navigate and catalog Web sites.
The content of the robots.txt file consists of two main commands: “User-agent” and “Disallow”.
The User-agent command specifies the name of the robot for which the following commands should be applied to. You can set this to “*” to have the spidering commands applied to any robot.
The second command, “Disallow”, specifies a partial URL that should not be indexed by the Web robot.
Here is what it looks like:
This small piece of code tells all search engine spider programs to go away. If you find a text file called “robots.txt” in the root directory of the host server with the above content, you should delete it immediately. The text file says that no search engine is allowed to index your Web site.
Even if your robots.txt file don’t contain the above commands, you should make sure that its syntax is correct. A robots.txt file with a faulty syntax also prevents search engine spiders to index your Web site.
b) The META ROBOTS tag
There’s a second way to stop search engine robot programs to index your Web site: the META ROBOTS tag. If you find the following HTML tag in your Web pages:
<META NAME=”robots” CONTENT=”noindex,nofollow”>
you should replace it immediately with
<META NAME=”robots” CONTENT”=”index,follow”>
If you want all search engine spiders to index all Web pages, you can also remove the META ROBOTS tag from your Web pages.
Further information about both ways to stop search engines to index your Web site can be found at: How Important is the Robots.Txt