mobile-first indexing approach to rank all new domains.
Google has now announced another change where it is officially dropping the support for robots.txt no-indexing. For decades, robots.txt was a text file that website admins used to upload on the servers to instruct search engine robots on how to crawl and index pages on their website. The webmasters will need to find an alternative by September 1st when the change comes into effect.
The company open sourced the robot.txt parser yesterday. The Robots Exclusion Protocol (REO) was first proposed by a Dutch Software Engineer Martijn Koster in 1994. It has pretty much become the web standard to instruct google bots on the parts of the website that should not be crawled. While Google has supported the robot.txt directive in the past, but it will no longer be the case going forward.
Robot.txt no index directive used to be very effective for webmasters.
Google’s official tweet reads:
Today we’re saying goodbye to undocumented and unsupported rules in robots.txt If you were relying on these rule… https://t.co/vXkrmFTnBs
— Google Webmasters (@googlewmc) 1562050978000
The company says that it is retiring all code that handles unsupported and unpublished rules (read ‘noindex’) on September 1, 2019. The decision is taken to maintain a healthy ecosystem and preparing for future open source releases.
Alternative options as suggested by Google webmasters:
• noindex in robots meta tags: Supported both in the HTTP response headers and in HTML, the noindex directive is the most effective way to remove URLs from the index when crawling is allowed.
• 404 and 410 HTTP status code: Both status codes suggest that the page does not exist. Google will automatically drop such URLs.
• Password protection: Hiding a page behind login page is a general way to remove the page from Google’s index.
• Disallow in robots.txt: Search engines can only index pages that they know about, so blocking the page from being crawled usually means its content won’t be indexed. While the search engine may also index a URL based on links from other pages, without seeing the content itself, we aim to make such pages less visible in the future.
• Search Console Remove URL tool: Google offers a dedicated tool to easily remove a URL from Google’s search results. You simply have to enter the URL that you wish to remove.