5 Easy Steps To Fix Secure Page (https) Duplicate Content
I have had a couple of cases (and queries) involving secure pages (https://) and duplicate content recently, so thought it would be a useful area to discuss.
Hypertext transfer protocol secure (https) pages are often used for payment transactions, logins and shopping baskets to provide an encrypted and secure connection. Secure pages of course can be crawled and indexed by the search engines like regular pages. Although it might be hard to spot the difference between a http and https version of a page, they are of course technically different URI (it might only be a ‘s’ that’s the difference!) and they will be treated as a separate page by the search engines.
So as an example, the two URI below would be seen as different pages -
http://www.screamingfrog.co.uk/
https://www.screamingfrog.co.uk/
This is often not a major issue, but we know duplicate content can be a problem as it causes dilution of link equity (splitting of pagerank between pages rather than combining to one target) aswell as a waste of crawl allowance.
So How Do Secure Pages Get In The Index?
Well, like any URI, they are either found via internal or external links. So either you are linking to the secure page from the website, or someone else externally is linking to the page (or another internal page connected to it!) and hence why it has been crawled and indexed. You can find secure pages in Googles index via the site: and inurl:https commands, like this example (we have zero results, wahey!).
However, one of the most common things we find is the use of a single secure page from a login or shopping cart / basket which then contains relative urls. For example -
“/this-is-a-relative-url/”
As shown above, relative urls of course don’t contain protocol information (whether they are http or https!). They simply use the same protocol as the parent page (unless stipulated in another way like a base tag). Hence, crawled from a secure page, the url would therefore be secure (https). Often entire websites can then be crawled in secure format by a simple switch like this!
So, What Steps Should You Take To Ensure Your Secure Pages Are Not Indexed?
1) First of all, make sure you use the correct protocol on the correct pages. Only secure pages that genuinely need to be, should be secure, like shopping basket, login or checkout pages etc. Product pages on the whole, shouldn’t be so make sure users can’t browse these and potentially link to secure versions of these pages.
2) Use absolute URLs – Absolute urls define the hyper text transfer protocol and don’t leave it to chance. So if you have a secure page that can be crawled (via internal or external links), make sure you have absolute urls.
3) You could also robots.txt out a shopping basket or login page so the search engines don’t crawl the page. Be careful not to block any other secure pages that you DO want in the index, or any secure pages which might of already accured some link equity (see point 5!). You can also consider the use of a ‘nofollow’ link attribute to the login/shopping basket page. This is the only page we might recommend using a nofollow on for internal links. Matt Cutts from Google commented on this previously in a Google Webmaster Help video. Please note, you shouldn’t have to take this step if you can follow the other steps in this guide. Ideally if you don’t want your shopping or login page in the index, use a meta noindex tag.
What Should I do If I Already Have Duplicate Secure Pages In The Index?
4) Find the reason why you have secure pages in the index, either internal or external links and follow the steps already outlined above. If you can’t find the link source internally (shameless plug), try the SEO spider which will do it for you. If it’s not an internal link, then there could be external links in play.
5) 301 permanently redirect the secure (https) page to the correct http version. This will mean the search engines drop the https out of the index, rank the correct http version and pass any link equity (or pagerank!) to the correct version of the page. If you can’t use a 301 redirect, then try using the canonical link element instead. Obviously make sure you haven’t blocked any of the pages you are going to redirect via robots.txt!
Hopefully this article will provide a useful guide to help remove any duplicate secure pages (https).







I always thought it best to use secure.whatever.tld for https all the time, keeping ssl off of the main site. I know people feel more secure when they see it, and since it gets indexed separately, there is less indexing trouble. But the biggest reason was for robots.txt management.
If you keep ssl on the same domain as non-ssl, as you do here, how do you serve up http://whatever.tld/robots.txt and also https://whatever.tld/robots.txt ? If you did want those to be different, you’d have a conundrum, no?
I believe it would be good, if you can add a Cannonical Redirect also to the Header section to prevent some worst scenario that we might not think which leads to duplication.
Add this line to meta section for each of the respective webpages to avoid duplication
E.g For this webpage that we are reading now, needs to avoid duplication then it would be good to add
For me this has come handy in many situations.
I have a question about this – how do you separate home/landing page & marcom pages currently in https to an http from the rest of the web pages which we want to keep as https? is there a way to do this? thanks!!
Hi Heddi,
Yes you can and it depends on your server set-up.
I advise to have a chat with your development team. My preference would certainly be to only use secure pages on pages that really do need to be secure.
Thanks,
Dan
This is by far the most complete article about removing dupes between the http and https protocol.
Thumbs up for suggesting changing relative paths to absolute ones. However, if that is not possible a good alternative would be using rel=”canonical” in the https pages so they all point to the http ones.
The 301 redirect solution would be the best but a bit risky if there are https pages that need to be excluded.
Thanks for the explanation of how to avoid indexing the SSL pages.
But why would I want that? I didn’t find any reasons so far.
Google is developing it’s next generation HTTP (Spdy), which is ALWAYS encrypted/secore. And the browser showing that the connection is secured creates trust in the shop.
But it has to be secure BEFORE using the login, otherwise how could the user tell?
So why would I want to remove HTTPS URLs from the index, instead of turning it around and ONLY indexing the secure pages?
From a usability point of view, redirecting users from their secure connection to an unsecure one sucks! We just would have one more irritating page that makes it hard for users to understand if a connection is secure or not.
The only argument I read so far is that people would link to the non-secure-page. But when investing in linkbuilding, why not start and promote a secure URL?
Please enlight me guys! I’d be glad to come to an understanding here.
Hi Andy,
You seem passionate. But it’s not SSL URLs specifically, it’s when they cause duplicate versions of http URLs.
I explained why at the start of the post – “This is often not a major issue, but we know duplicate content can be a problem as it causes dilution of link equity (splitting of pagerank between pages rather than combining to one target) aswell as a waste of crawl allowance.”
So you want one or the other for a URL, not both. Some sites choose to go completely secure which is cool. I think it’s unnecessary though!
Redirecting from a secure page to a non secure makes sense if the page doesn’t need to be secure. If it should be, don’t do it. Another way would be to use a canonical as mentioned.
Thanks for the comments.
Cheers.