There are few things that will make your site look more unprofessional than users following a broken link and receiving a 404 – File not found message. For a good developer it’s easy to ensure that a site is free from broken links. If you’re a Linux user you can use ht://check, if you prefer Windows you can use Xenu and Mac users can try BLT.
However, not all Web developers are as diligent (or competent) as you are. I’m sure you choose Cool URIs that never change and, in those very few circumstances where you do have to change a URI, you put in a proper server-side redirect. However, what about those other Web developers, the ones that link to you, how many of them check links after they’ve created them?
What do you do if another developer makes a mess of linking to you so that it causes a 404 Page Not Found error on your site?
As I see it, you’ve got three choices:
- Wait and see if they notice
- Try and find a contact address, email the Webmaster and ask them to correct the link
- Fix the problem yourself
Wait and see if they notice
They might do, but chances are they won’t. If they haven’t noticed that the link is broken at the time of creation they’re unlikely to notice and return to it later and fix it.
Contacting the Webmaster
Track down the page that contains the broken link (usually easy enough) and find an appropriate email address to email your request to. You then have to sit back and wait for something to happen. In my experience, the success rate for this is less than 50%. So, having expended the time and effort you’re not guaranteed any success anyway.
Fix it yourself
If you want a job done properly, do it yourself. I recently watched a video of the presentation given by Matt Cutts at WordCamp 2007 in San Francisco. The presentation was called ‘Whitehat SEO tips for Bloggers‘ and is very informative indeed. In one part of the talk, Matt commented on the use of 404 errors to get extra links (and divert users from the traditional 404), he said:
On this page [from Google Webmaster tools] I have got 35 links that are broken. If you look at these links, they are to upper case blog, blog parentheses percent 2-0 now these are not my links cuz WordPress does good links, and you know I know I never make a mistake, right? I could never have a broken link, but if you see these you can figure out Oh, maybe I can handle 404’s differently. If I use 404’s in a different way, I can get 35 links for free so you could use your 404 handler just do a redirect to your root page or something like that.
However, I think you can go a bit further than this. I think it’s better to work out where users should be going and send them there.
By using Google Webmaster tools and your own server logs you can see the broken links that the googlebot and actual users follow to your site. Invariably these are the result of simple typos or rubbish HTML created by someone else. For example, some broken links that I’ve observed on one of my sites (domain obfuscated) are:
- http://www.example.com/cnotact.html
- http://www.example.com/about.html<p>
By looking at these it’s pretty clear where the link should go so, in your Web server config (or an .htaccess file), you can set up a simple redirect. If you use Apache you’d use something like:-
Redirect permanent /cnotact.html http://www.example.com/contact.html Redirect permanent /about.html<p> http://www.example.com/about.html
As a result of this, you don’t need to contact the Webmaster of the site with the dodgy links and ask them to change them. Most importantly, a user following the link will reach the right place and not realise they’ve been redirected. Now, when the googlebot follows the duff link, it won’t get a 404 File Not Found error but a 301 code staying that the object has permanently moved. So, it’s good for users and your page rank and you no longer have to rely on the diligence of others.
One drawback is that, when a new broken link is established, there will be a certain proportion of users that experience it before you can correct it. How you deal with this is perhaps a reflection of how quickly you can respond to the broken links or how popular your site is. So, why not replace your normal 404 page with a hook into your site search tool and return links to the closest matching URLs? This can act as a stop-gap solution until you get the redirect in place.
Whatever you do it’s time to make 404s work for you and not distract your users.
Since posting the above, this article has been published on the IBM developerWorks:
Make your 404 pages smarter with metaphone matching