At work, we’re looking at a major webpage redesign, and are therefore looking at other similar programs and their websites to see what they are using… and one major thing has struck me in the process. No one uses redirects to force certain URLs. For example: http://www.staze.org vs http://staze.org. Now, in one case, both work, and they present the same content (bad for SEO), in the WORST case, one works, and the other doesn’t. Almost none of the sites we looked at handled this correctly.
Really, it’s extremely easy to fix. Either in .htaccess, or in your virtual host file, just add something like:
RewriteEngine On
RewriteCond %{HTTP_HOST} ^localhost [OR]
RewriteCond %{HTTP_HOST} ^127\.0\.0\.1
RewriteRule ^(.*) – [L]
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule ^/(.*) http://www.example.com/$1 [R=301,L]
UPDATE: Please see the corrected code above to account for anything referencing your site on the local machine via localhost, or 127.0.0.1…. some of my site broke without me noticing until today. DOH!
This says “if the request isn’t for www.example.com (it’s for example.com, foo.example.com, etc), then redirect it so it’s www.example.com”. Now, if you use HTTPS for your site, and you put the above rule in your .htaccess file, then you’ll need to address that. Probably something like:
RewriteEngine on
RewriteCond %{HTTP_HOST} ^example\.com [NC]
RewriteCond %{SERVER_PORT}!443
RewriteRule ^(.*) http://www.example.com/$1 [R=301,L]
#rewrite all HTTPS requests
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule ^(.*) https://www.example.com/$1 [R=301,L]
Though it might be over kill. Anyway, you should prevent google from indexing HTTPS anyway. Which I do with some trickery like in my HTTPS vhost file that does:
RewriteRule ^/robots.txt$ robots_ssl.txt [P]
Basically, that says “anything requesting robots.txt via HTTPS, give it robots_ssl.txt” which has a simple:
User-agent: Googlebot
Disallow: /
User-agent: *
Disallow: /
Basically, I don’t want it to index anything over HTTPS anyway. That’s duplicate content, and it bogs down my server to be letting google (or any spider) hammer away at https.
So, moral here… fix your site. Don’t let visitors use any URL they want to see your content, and worse, don’t let shit break when they do. Worst case, like, if your domain is example.com, and you don’t want a www on the website URL, then set up a wildcard in DNS to point everything at your webserver. Then setup the above redirects. So someone can type in whosyourmomma.example.com, and still get http://example.com.
Oh, and don’t even get me started on the whole www vs. no www issue. I’m not really of one mind on the issue… and think it varies from case to case. My site, I enforce it. At work, I enforce against it on our website. I honestly think it’s as aesthetics issue. Long URLs suck, and if typing in foo.example.com gives you a department, then you should get a webpage. You shouldn’t need www.foo.example.com. But, I’m sure this discussion is nearly as bad as top vs bottom posting.
For Google’s take, look: here.