Use archive.org to park old websites without link rot

I’ve built dozens of websites over the years – both professionally and as part of artistic, research, teaching, or freelance projects. I’m still very proud of some of them and would like to show them to people and link to them. Other people have also linked to them extensively over the years, and those inbound links are useful.

The problem is that keeping all this stuff online takes maintenance and often causes headaches. I think I found a 10 second technical solution that I hope it doesn’t annoy the good people at archive.org.

Quick how-to:

  1. Find the best possible instance of your website on archive.org’s wayback machine.
  2. Create two redirect rule on your web server, one to block archive.org’s archiving script, and one to redirect all otehr traffic to archive.org

Breakdown:

Finding the best instance of your website

Archive.org indexes your website with varying regularity. You might want their latest version – but it doesn’t just depend on your website. Archive.org archives the environment of your site too.

There may be other websites that went offline before your site – if you link to the latest version that archive.org has, it may be that links out of your site are therefore dead or linking to domain squatters that moved in when people you linked to moved out.

The solution to this is to use archive.org’s time navigation bar to find the ‘optimum’ time at which your site, and it’s significant neighbours was in it’s heydey. Use this as the basis of the apache redirect rule.

Redirect rules

Most web servers allow you to define redirect rules. I use Apache. Apache provides you with several ways of doing this. You can either create redirect rules in your Apache configuration files via your sites-available, or add them to an .htaccess file in the root directory of your domain.

My redirect rule for my old art collective’s website looks like this:

<IfModule mod_rewrite.c>
        Options +FollowSymLinks
        RewriteEngine on

        # if it's archive.org trying to archive itself
        RewriteCond %{HTTP_USER_AGENT} ^ia_archiver
        RewriteRule ^.* - [F,L]

        # otherwise redirect to archive.org
        RewriteRule (.*) http://web.archive.org/web/20061205014515/http://twenteenthcentury.com/$1 [R=301,L]
</IfModule>

Thoughts?

I’d be interested to know what archive.org thinks of this use of their service. It seems such an obvious solution to a very widespread problem.

So many of my artist friends – particularly those who developed their own web skills for artistic purposes – now spend inordinate amounts of time keeping their websites alive across server and database incompatibilities, changes in the programming languages they used to create their services, and various other headaches.

This isn’t ideal – obviously the services can’t run on archive.org, and sadly, some of the more interesting bits of work I did were not very friendly to web crawlers so didn’t provide archive.org with much to go on, but that’s a good pointer for future work: make sure that your web projects are easily crawled by archive.org so you don’t have to sysadmin it for the rest of time.

1 thought on “Use archive.org to park old websites without link rot”

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.