Did you know there is a file that can single-handedly prevent your entire site from ever ranking on Google? Did you also know that the same file can also be used to help your site’s SEO? That file is called robots.txt and this post will go over everything you need to know to leverage this file to get the most out of your SEO efforts.
What is Robots.txt?
Robots.txt is the first file search engine bots look at when they crawl a website. The file tells the bots how to crawl the website through coded instructions called The Robot Exclusion Protocol. Robots.txt is typically used to prevent bots from crawling certain parts of a site and to include certain content in search results.
While many people would like every page of their site to be prominently displayed on the first page of Google, there are several instances in which you need to hide pages from search engines.
For example:
- Parts of the site that are still under development
- Copies of certain landing pages used for targeted ad campaigns that would otherwise be considered duplicate content and hurt search results rankings
- A website may share a piece of content that was published by another site
Using Robots.txt
Even though robots.txt is a critical SEO tool, websites are not required to have a robots.txt file. In fact, many site owners and webmasters ever feel the need to create one. Without a robots.txt file in place, search engine bots are allowed to crawl the entire site unimpeded. While this usually does not present an issue for small or simple websites, if you’re running ad campaigns or have a more sophisticated website, you’re going to want to use robots.txt if you want to have SEO success. If your site does need a robots.txt file, whoever creates the file MUST know what they are doing. You never want to risk blocking the wrong part or potentially all of your site from search engines. Fortunately, the code used in the file is pretty straight forward.
For Example:
Suppose http://www.examplewebsite.com wants to block all bots from the contact page. Let’s call this page contact.php. The code would be:
User-agent: *
Disallow: /contact.php
It can also allow certain bots to crawl a site while blocking others. If that same site only wanted to block Bing from the contact page, they would enter:
User-agent: bingbot
Disallow: /contact.php
If your needs are a little more complex, you can use meta robots tags. These are written in the < head >of individual pages and can include more detailed instructions. For example, suppose http://www.examplewebsite.com still doesn’t want their contact page to be included in search results. However, they have some cool features on the contact page that have generated some valuable backlinks. In order to keep the SEO value of these backlinks while blocking the page from search results, they would add the following meta robot tag.
<head>
<title>Contact US</title>
<meta name =”description” content= “Interested in our services? Contact us today!” />
<meta name=”Robots” Content=”NOINDEX, FOLLOW”>
</head>
You can use different combinations of “NOINDEX”, “INDEX”, “NOFOLLOW”, and “FOLLOW” to tell the bots what to do.
Be Careful What You Block
In order to better understand your site, search engines want to have access to everything it needs to properly categorize your site in their algorithms. This means robots.txt should only include the files that really need to be blocked. It’s also important to note that all robots.txt files are public. In fact, you can reach any site’s robots.txt just by adding /robots.txt to the end of their domain. Because of this, you need to make sure any files containing sensitive information have security features in place to protect the data.
A Powerful but Delicate Tool
Robots.txt and meta robots tags can be very useful, but you MUST be sure to use them properly or risk losing rankings. If you have any questions about robots.txt or want to add it to your site, be sure to talk to your webmaster before you write any code.