Optimizing the referencing of a website sometimes involves the deindexing of pages that are useless or of poor quality. How to proceed ? What options are available? What are the most effective? Here is a small summary of the possibilities available to you to deindexing pages of a site on Google.
DEINDEXING PAGES ON GOOGLE: INTRODUCTION
Before going further, it is important to understand that the indexation techniques recommended vary according to your will or not to keep the Internet accessible to the pages you want to de-index . So if you want to delete pages but remove them quickly from the Google index, the actions to be undertaken will have some specificities.
Moreover, it is also important to understand the difference between indexation and crawl to ban Google . Adding a “Disallow:” command in the robots.txt is not intended to deindex pages but rather to prevent Google from crawling pages, which is not at all the same. This robots.txt directive should not be used to deindex one or more pages.A common mistake in deindexing is to put “Disallow:” commands in the robots.txt to prevent the crawl of pages that have meta noindex tags. This action prevents Google from crawling noindex tags, so it can not interprete and deindex pages quickly, which is counterproductive.
HOW TO DEINDEX A LARGE VOLUME OF PAGES FROM GOOGLE?
The techniques we will present below all allow you to independently deindex one or more pages:
- Add noindex tag in the <head> section of the pages to de-index : <meta name = “robots” content = “noindex”>
- Then create a sitemap of indexation and submit it to Google via Webmaster Tools and with the robots.txt directive “Sitemap:” followed by access to full URL to the sitemap
- This technique makes it possible to deindex pages quickly enough,
- Use the X-Robots-Tag .htaccess : this technique to de-index the pages or files without having to modify their source code: X-Robots-Tag: noindex
<Files ~ "\ .pdf $"> Header set X-Robots-Tag "noindex, nofollow" </ Files>
- This code allows, for example, not to index all the PDF files of a site.
- Use Directive “Noindex:” robots.txt : this directive, although not officially interpreted by Google actually works
HOW DO I DE-INDEX PAGES RECENTLY DELETED FROM GOOGLE?
If you delete a page, Google’s deindexing of the page is not instantaneous, and the page may take some time to be effective if deleted pages are still meshed in the structure or if they are not.
DE-INDEXING UNNECESSARY PAGES
In order to quickly deindex a large volume of unnecessary pages, here is what it is recommended to do:
- Return an HTTP 404 (not found) or an HTTP 410 (gone) on the deleted pages and then wait for Google returns the pages to reflect the removal of the page
- To speed up the process, if necessary, you can generate a de-indexation sitemap containing only the old deleted URLs returning a 404 or a 410.
DEINDEXING PAGES BECAUSE NEWS IS MORE RELEVANT
If you want to deindex a page, or a large number of pages, because you think other pages are more relevant, here is the recommended approach:
- Implement 301 redirects from old pages to the new : the use of 301 redirects is particularly recommended if the old pages were receiving backlinks, it allows in this case to limit the loss of linkjuice and boost the new pages.
- To accelerate Google’s take-up of these 301s, you can again create a sitemap containing all the URLs that redirect to the news and submit it via your Search Console account and via robots.txt
HOW TO QUICKLY REMOVE A PAGE FROM GOOGLE WITH SEARCH CONSOLE?
Do you really need to remove / de-index a Google page very quickly? It is possible to make an “express” but temporary request directly from the Search Console.
To do so, you must visit this page of Webmaster Tools and then specify the URL to de-index temporarily.
To permanently deindex the pages, you will then have to resort to the techniques mentioned above (meta noindex tag, Here is the procedure to do it through the Search Console