Robots.txt
A robots.txt file is like a fence around your property. Fences are meant to keep danger out, but they also can allow others to see through them. For a website, the robots.txt file sits at the root folder of your website and indicates the parts of your website you do or don’t want web crawlers to see or access. You can target individual files, files types, folders, and which IPs and bots to keep out of your site.
Why should you care about the robots.txt file?
- Improper usage of the robots.txt file can hurt your Search Engine ranking*
- The robots.txt file controls how some bots and spiders see and interact with your website
- This file contains the instructions for the bots to interact with your site and is a fundamental part of how search engines work
*Google to discontinue support for the Robots.txt NoIndex in September 2019.
What is the robots.txt file?
The robots.txt is a single file that uses the Robots Exclusion Standard, which is a protocol with a small set of commands that can be used to indicate access to your site by section and by specific kinds of web crawlers (such as mobile crawlers vs desktop crawlers). Robots.txt allows you the opportunity to attempt to block areas of your website that you may not want crawlers to find (like member only areas). Using a Robots.txt file is a step, but not the only step you should take to mark off areas of your site you may not want crawlers entering.
Webpages (HTML/PHP)
For non-image files (that is, web pages), robots.txt is used to control crawling traffic, typically because you don’t want your server to be overwhelmed by a search engine crawler.
Image files
The robots.txt does prevent image files from appearing in Google search results. This can be a good way of keeping your images out of the Google image search if you are a photographer who sells their works online, as an example. It does not prevent other pages or users from linking to your image. This is good because you do want people to share your pages and work with friends on social media.
Resource files
You can use robots.txt to block resource files such as unimportant image, script, or style files. Keep in mind if these files are needed to render your website, this may affect your site’s searchability. If the files are blocked then the crawler will not load it, even if called by the page. Will your site look the same on Mobile if you remove the CSS that is intended for mobile? How will google think your site looks if it can’t see the CSS?
A Note from Google
You should not use robots.txt as a means to hide your web pages from Google Search results. This is because other pages might point to your page, and your page could get indexed that way, avoiding the robots.txt file. If you want to block your page from search results, use another method such as password protection or noindex meta tags or directives directly on each page.
Basic Examples of robots.txt
Here are some common robots.txt setups.
Allow Full Access
User-agent: *
Disallow:
Block All Access
User-agent: *
Disallow: *
Block One Folder
User-agent: *
Disallow: /folder/
Block One File
User-agent: *
Disallow: /file.html
Does your site already have a robots.txt file?
You can check for a robots.txt file from any web browser that is online. The robots.txt file is always located in the same place on any website, so it is easy to determine if a site has one. Just add “/robots.txt” to the end of a domain name as shown below.
If you have a file there, it is your robots.txt file. You will either find a file with words in it, find a file with no words in it or return a 404 error page.
Testing your robots.txt file
If you have access and permission you can use the Google search console to test your robots.txt file. Instructions to test your Robots.txt file are found here.
To fully understand if your robots.txt file is not blocking anything you do not want it to block you will need to understand what it is saying. I will cover that below.
Do you need a robots.txt file?
You may not even need to have a robots.txt file on your site. In fact, it is often the case you do not need one.
Reasons you may want to have a robots.txt file:
- You have content you want to be blocked from search engines
- You are developing a site that is live, but you do not want search engines to index new pages yet
- You want to fine-tune access to your site from reputable bots and crawlers
- You are using paid links or advertisements that need special instructions for bots
- They help you follow some Google guidelines in some situations
Reasons you may not want to have a robots.txt file:
- Your site is simple and error-free and you want everything indexed
- You do not have any files you want or need to be blocked from search engines
- You do not find yourself in any of the situations listed in the above reasons to have a robots.txt file
- It is okay to not have a robots.txt file.
When you do not have a robots.txt file the search engine robots like Googlebot will have full access to your site. This is a normal and simple method that is very common.
Keys to robots.txt
- If you use a robots.txt file, make sure it is being used properly
- An incorrect robots.txt file can block bots and crawlers from discovering all the pages from your site
- Ensure you are not blocking pages or elements that Google needs to read, render and rank your pages