A robots.txt file is a file is like a fence around your yard. Some fences let you see through and others are built to keep everything out. The robos.txt file sits at the root of your website and indicates the parts of your website you do or don’t want web crawlers to see or access.
Why should you care about the robots.txt file?
- Improper usage of the robots.txt file can hurt your Search Engine ranking
- The robots.txt file controls how search engine spiders see and interact with your website
- Google cares about it: This file is mentioned in several of the Google guidelines
- This file is the instructions for the bots search engine send to interact with your site and is a fundamental parts of how search engines work
What is the robots.txt file?
The file uses the Robots Exclusion Standard, which is a protocol with a small set of commands that can be used to indicate access to your site by section and by specific kinds of web crawlers (such as mobile crawlers vs desktop crawlers).
For non-image files (that is, web pages) robots.txt is used to control crawling traffic, typically because you don’t want your server to be overwhelmed by a search engines crawler.
The robots.txt does prevent image files from appearing in Google search results. This can be a good way of keeping your images out of the google image search if you are a photographer who sells their works online, as an example. It does not prevent other pages or users from linking to your image. This is good because you do want people to share your pages and work with friends on social media.
You can use robots.txt to block resource files such as unimportant image, script, or style files. Keep in mind if these files are needed to render your website, this may affect your site’s searchability. If the files are blocked then the crawler will not load it, even if called by the page. Will your site look the same on Mobile if you remove the CSS that is intended for mobile? How will google think your site looks if it can’t see the CSS?
A Note from Google
You should not use robots.txt as a means to hide your web pages from Google Search results. This is because other pages might point to your page, and your page could get indexed that way, avoiding the robots.txt file. If you want to block your page from search results, use another method such as password protection or noindex tags or directives.
Does your site already have a robots.txt file?
You can check from any browser. The robots.txt file is always located in the same place on any website, so it is easy to determine if a site has one. Just add “/robots.txt” to the end of a domain name as shown below.
If you have a file there, it is your robots.txt file. You will either find a file with words in it, find a file with no words in it, or not find a file at all.
Testing your robots.txt file
If you have access and permission you can use the Google search console to test your robots.txt file. Instructions to do so are found here.
To fully understand if your robots.txt file is not blocking anything you do not want it to block you will need to understand what it is saying. We cover that below.
Do you need a robots.txt file?
You may not even need to have a robots.txt file on your site. In fact, it is often the case you do not need one.
Reasons you may want to have a robots.txt file:
- You have content you want blocked from search engines
- You are developing a site that is live, but you do not want search engines to index it yet
- You want to fine tune access to your site from reputable robots
- You are using paid links or advertisements that need special instructions for robots
- They help you follow some Google guidelines in some situations
Reasons you may not want to have a robots.txt file:
- Your site is simple and error free and you want everything indexed
- You do not have any files you want or need to be blocked from search engines
- You do not find yourself in any of the situations listed in the above reasons to have a robots.txt file
- It is okay to not have a robots.txt file.
When you do not have a robots.txt file the search engine robots like Googlebot will have full access to your site. This is a normal and simple method that is very common.
Keys to robots.txt
- If you use a robots.txt file, make sure it is being used properly
- An incorrect robots.txt file can block Googlebot from indexing your page
- Ensure you are not blocking pages or elements that Google needs to read, render and rank your pages