I hadn't run into microformats until Eric Childress called them to my attention. The idea is to mark up your HTML using XHTML to make it easy to parse, yet still human readable. This reminds me of the KEV (key encoded value) versions of OpenURL 1.0. Everything you refer to in an OpenURL needs to be defined in the OpenURL registry, and the KEV definitions are done in XHMTL, so they display nicely in a browser, but are tagged (mostly using the 'class' attribute) so they aren't that hard to parse.
Actually, some things, like metadata formats, are often defined twice in the registry, once for XML and the other for KEV encoding. This complicates some things. The code I posted last week tried to massage KEV journal metadata into the XML form. Since this is difficult to do automatically (and you aren't even guaranteed that there is a corresponding XML encoding), I've given up on that, simplifying the latest code somewhat. For display of KEV information the program now tries to read in the registry definition and uses the information there for descriptions, but doesn't try to send back valid context objects in XML.
Almost more as an exercise than anything else, I've occasionally tried to make a computer program as short as possible (see OAI-PMH and OpenURL 1.0 for example). The interesting thing is, in general the program was much better for the effort. Not that the OAI-PMH code is particularly readable, but that was just to get it on a T-shirt. Taking out the tricks would still result in a very short OAI harvester (maybe two pages instead of one) which would be both readable and relatively short.
Looking at some programs, I'm struck by the idea that when you see regularity and repetition in the code, everyone would be better off if it could be squeezed out. Usually this results in a shorter, easier to understand, and often faster program. I think this is one reason I like Python -- it doesn't require as much boiler-plate as languages like Java.
The master of this is Chuck Moore who invented (and seemingly spent his life in) Forth. Screen shots of his Forth code look almost like noise, but he's able to do things like sit down at a bare machine and bring up an operating system on it in a few hours. The operating system (along with editors, assemblers, and all your applications) would fit comfortably on a floppy disk (a few hundred kilobytes for those who've never used them). Or a VLSI design tool in 500 lines!
One thing to keep in mind is that it isn't how hard it is to understand each page of code that's important, but how hard it is to understand the whole program. I claim that one page of dense code that does what ten pages of looser code would do is easier to read, understand, and maintain.
I once wrote an incremental Forth compiler to run on OCLC's Sigma 9 machines (in assembler, of course). Can't say I'd recommend the language to anyone to program in now, but it certainly was both fast and compact, characteristics we really needed 25 years ago. I've always regretted not attending one of the Forth conferences they used to have back then and not getting to meet Mr. Moore.
I've spent some time this week looking at the OpenURL 1.0 spec, mainly because Jeff Young is using it in the WikiD (used to be called MetaWiki) processor. It's a lot more concrete than what I heard Herbert Van de Sompel talk about a few years ago, but abstract enough to build a structured Wiki on top of it, which is probably abstract enough for most of us.
About the only way I can understand this sort of thing is to write some code that processes it. It's easy to fool yourself that you understand something, but a lot harder to be fooled when your program fails because you really didn't. Has anyone else ever been confused trying to keep track of the difference between the referent, referrer, referring-entity, requestor, and resolver? Too many R's, but I'm clear on them now, at least for the day.
Somewhat in the spirit of our one and two page OAI-PMH harverster and repository, I've managed to cram an OpenURL resolver into around 200 lines of Python code that will at least handle the examples Jeff has up here. It doesn't actually do anything other than parse the Context Object and echo it back in XSLT formatted XML. I know there are some things missing, but I think the code could be extended.
Probably the most difficult part of getting this to work is coping with the two basic ways of encoding the new OpenURLs. KEV is what you need to put a simple OpenURL into a URL, and XML for anything more complicated.
In case you're interested in the code, here's a compressed tar file of the 200+ lines of code and the support files to exercise it. No promises that it's abstract enough to build a Wiki on, but I did go to the trouble of adding the gratuitous white-space that Python programmers seem to expect to make it more readable. There's a README.txt file in the openurl directory. I'd be interested in hearing from anyone that tries it out.
I don't always get to ALA, but I did this summer and had a few minutes to go through the exhibits. One thing I was struck by were the number of robots on display. One booth had a whole series of CD/DVD cleaning robots which were fun to watch (I did see one mess up once trying to pick up a disk). More fun, though, were the material sorters, of which there were more than one. They'd dump a load of books in the machine's hopper and then conveyor belts and rollers would sort them out based on RFID tags in the books. The ones I saw seemed to work well, although I bet that making them work reliably in a library is a bit more of a trick.
The thought occurred to me that these sorters may make it possible to ease into putting RFID tags in books. Since I assume they could separate those items that don't have tags from those that do, a library might be able to get some benefit from RFID even if their whole collection wasn't converted.
I thought Wal-Mart was making its suppliers put RFID tags on their merchandise, which sounded a bit strange, but that isn't quite what they are doing. They've chosen 100 (moving to 200) suppliers that they require to put RFID tags on each case of merchandise. Evidently most of their suppliers aren't convinced that this is to their benefit and are only doing it for Wal-Mart, not all their shipments.
We're doing more and more things on Linux here at OCLC. All of our new 'nux boxes run Linux, as does our Beowulf cluster. One of the reasons for choosing Linux is its stability. You hear of people running servers and workstations for months without rebooting them. We've seen that too, but lately things haven't been quite as good as that.
Some of our Linux boxes are starting to have dozens of services (like PURLs and xISBN) running in multiple HTTP servers with several people writing code and installing things like servlets in multiple versions of Apache Tomcat. And then the box hangs up somehow or crashes altogether and we're pretty much left wondering what happened. We've been working on getting map-reduce running on our Rocks/Beowulf cluster and some of the nodes seem to crash when we run more than four jobs on them. Again, leaving no trace of exactly went wrong. These are admittedly jobs taking a lot of cpu and memory, but it's all pure Python and shell scripts, nothing that should be able to bring a system down. These are 'research' machines (although we don't directly administer them) and we have limited resources to track these sorts of problems down.
Contrast that with our (now fading) experience with Solaris. We ran some of those Sun boxes into the ground and it was very rare to have to reboot them, and they almost always gave us some warning that things weren't right before that. Not to mention that they did a better job allocating resources among competing jobs.
We're running Linux 2.4.21 on the PURL server. The cluster is also running 2.4.21 under Rocks release 3.1.0.