Is your website ready for search engines? Do you want to promote your website online and increase visibility on SERP?
If your answer is a Yes then you should first check if your website is ready to be crawled by search engines. A robots.txt file is used to control the interaction of search engines with your website.
What is robots.txt file?
robots.txt file instructs the search engine crawlers whether they can crawl the website pages and whether not. robots.txt file also helps to block a particular search engine’s bot and allow some other search engines.
In other words, robots.txt file directly speaks with search engine crawlers and tells them which part of the website is accessible for them and which is not.
You may think, what is a search engine crawler?
Search Engine Crawler
Search engine robots are called crawlers. Some other terms used for crawler are spiders, bots, web crawlers. In regards to robots.txt file, these crawlers are referred to as User-agents.
But, wait! These are not their official names. Before preparing a robots.txt file, you should know the official name of the particular crawler to instruct them. For example, Google crawler name is Googlebot, Bing crawler name is Bingbot and more are there. You can see the list
robots.txt file is a text file, that contains the instruction to instruct the search engine’s crawlers. Below are the terms used in a robots.txt: –
- User-agents: – Contains the name of the referring search engine’s crawlers (Googlebot, Bingbot, etc.). You can find the list of most of the User-agents here.
- Disallow: – Contains the path, URLs or directory names you want to disallow.
- Allow: – Allow command is accepted by Googlebot. You can allow a particular URL or subfolder inside a disallow section.
- Crawl-delay: – Tells the search engine’s crawlers how many seconds they should wait to reload and crawl a page. Here, Googlebot doesn’t consider this term and has a separate setting inside the search console for crawl-delay.
- Sitemap: – Tells the search engine’s crawler about the location of an XML sitemap.
There are two characters used for pattern matching 1. Asterisk(*) 2. Dollar ($)
- Asterisk(*) is used for all.
- Dollar($) is used to match the end of the URL
Where to place a robots.txt file on the website?
Your robots.txt file should be present at the root of the website as every crawler will look there. Whenever a crawler visits a website it checks for www.exampledomain.com/robots.txt. If they didn’t get a robots.txt file they will crawl every page of the website.
So, make sure that your robots.txt file is present on the root of the website and is accessible by visiting www.exampledomain.com/robots.txt.
Below robots.txt code will tell the search engine’s crawlers not to index any pages of your website including home page: –
Below code will tell search engines crawler that they are allowed to crawl all the pages of a website: –
Some major points to remember
- You can check the robots.txt file of any website by just adding the /robots.txt at the end of the root domain name.
- In order to be found by search engine crawlers, make sure you put the robots.txt at the root level of the website.
- It is case sensitive. Robots.txt and robots.txt are treated as separate. so, keep the name exactly in lower case as robots.txt. This is the reason I have typed robots.txt in lower case each time excluding this section, so that, there was no confusion.
- Subdomains and main domains should have separate-separate robots.txt file. if you have many subdomains then create separate robots.txt file for each subdomain.
- It’s a common practice to update the sitemap URL in robots.txt file.
Not sure or still have confusion an SEO service provider can help you to set up a robots.txt file on your website.
It takes very little time to set up a robots.txt file for your website and this is a one time task. It enhances your website SEO and increases visibility on search engine result pages (SERP). Then why should you not like to give it a try?
I would personally like to suggest everyone set up a robots.txt file on their website. What do you think please share with me?