As promised earlier, we did in fact install a new version of the PURL server early on July 27, 2009. We had a couple of surprises, but they have been fairly minor, and with some adjustments we are now completely transitioned to the new software.
The software is open source and was written for us by Zepheira. The original specifications for the new software allowed for some variation in the exact operation of PURLs. Our thinking was that part of the reason we were redoing the software was to make improvements to PURLs, and we wanted to impose as few restrictions on that as we could. As we attempted the transition, however, reality collided with our best intentions and we found that changes that seemed quite reasonable had too great an impact on existing users. Working through those issues took quite a bit of work and a lot of testing, but I think we finally managed it.
Probably the most important new feature in PURLs is the ability to create advanced PURLs for which you can control the HTTP status code (e.g. the new 303-See Other redirection code used in the semantic Web). You can also delete PURLs. Once you have done this the PURL is tombstoned so there is a record in the PURL server, even though the PURL no longer is available for redirection.
In case you are interested, here are some of the problems we encountered:
- Non-standard characters. Some characters (e.g. the dollar sign $) are valid is some parts of a URI, but not in others. We still have a few unusual PURLs with valid but unusual characters we aren't handling correctly.
- Undocumented features. There were some cases where the old PURL server would try to 'do the right thing' even though the 'right thing' was neither documented nor expected, for instance treating a PURL as a partial indirect even though it was not. In almost all cases we were able to fix these problems by more conventional PURL techniques.
-
DNS problems. There was at least one case where someone had redirected a domain name to purl.org. We have actually used that trick in-house, but did not realize anyone else was doing it. The new software is more aware of where requests originate than the old system and blocks this sort of thing. In the case in question we were able to accommodate them, but it isn't something we are encouraging.
The DNS problem in particular shows how difficult it can be to thoughly test this type of software. PURLs are very much embeded in the Web and it is difficult to predict all the ways various PURLs intereact. We did get better at it with (rather painful) experience, however.
Since purl.org resolves something close to two million PURLs/day (about 20/second), we do everything we can to avoid disrupting that flow, and had only minimal interruptions in this transition.
--Th
Comments