What is robots.txt?
Robots.txt is a text file located in the site’s root directory that specifies for search engines’ crawlers and spiders what website pages and files you want or don’t want them to visit.
When to use it and the Importance of robots.txt?
The robots.txt file is needed only you would like to possess some content on your web site excluded from the search engines. If you don’t need to exclude something that means embody everything on the search engines than you don’t want robots.txt file. Sometimes the server returns 404 error when you don’t have a robots.txt file. So it’s better to have robots.txt whether it is blank or with code to allow access to everyone.
How to create a robots.txt file?
Create a new text file and save it as the name “robots”. You can use the Notepad program on Windows PCs or TextEdit for Macs and then “Save As” a text-delimited file. Upload it to the root directory of your website
Basics of robots.txt syntax :
User-agent: * Disallow: /
I would choose to have a robots.txt file with the above code to allow access to everything for all bots rather than having an empty or no robots.txt file.
The asterisk after “user-agent” means that the robots.txt file applies to all web robots that visit the site. It’s name of search engine crawlers and The slash after “Disallow” tells the robot to not visit any pages on the site. This lines starting with the directive.’
Look at different examples of how you may want to use the robots.txt file:
Prevent the whole site from indexation by all web crawlers:
User-agent: * Disallow: /
Allow all web crawlers to index the whole site:
User-agent: * Disallow:
Prevent only several directories from indexation:
User-agent: * Disallow: /cgi-bin/
Prevent the site’s indexation by a specific web crawler:
User-agent: GoogleBot Disallow: /
Disallow different bots from different directories
User-Agent: * Disallow: User-Agent: Googlebot Disallow: /restricted/ User-Agent: BadBot Disallow: /disallow_access/
Robots Tag
The tag can be used to tell the robots not to index the content of a page. It can also be used to allow/disallow the crawler to follow the links of the page.
The syntax is:
<html> <head> <title>...</title> <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> </head>
To prevent the page from being indexed in the search engines but allows the crawler to follow the links present on the page use the follow tag on your page.
<meta name="robots" content="noindex, follow">
I hope this post is helpful to your website and if you any query than comment below. Our experience developer at Lathiya Solutions is ready to help you.