Once in a while I'll write a blog post but never publish it. I found this in my list of unpublished entries, and since we in OCLC Research are in the midst of moving our offices, thought it was worth publishing even though it was written last July.
Peter Denning has an interesting column in the July issue of Communications of the ACM v50i7, Computing is a Natural Science by Peter J. Denning (self-archived version) [thanks to Confessions of a Science Librarian for the links]. The main point of the article is that as we understand the world better, we see that computation is embedded very deeply in many aspects of it. Probably the best example of this is biology which now encompasses the study of DNA encoding, and more and more focuses on the computation that goes on with that information (I wonder how long before we will be able to predict what an organism looks like solely from their DNA?). Another example is a recent article The Memory Code by Joe Z. Tsien in the July 2007 Scientific American about watching the computation going on in a mouse's brain in reaction to being shaken (be sure to watch the video).
We've seen a similar progression in our field. As metadata went digital organizations such as OCLC allowed sharing of it to an unprecedented degree (our cataloging service seems close to the 'infinite games' that Denning describes), and we are gradually making the maintenance of the data more and more a computation rather than a human task. Of course, Denning's point would be that it has always been a computation; what we are doing is understanding that computation and making it run on our computers.
As the source materials become available in digital form, organizations such as Google are doing similar things, although I'm not sure what the parallel would be to the automation of maintenance for the sources; possibly their continued refinement of information extracted from the page images, such as redoing the character recognition and indexing. For Wikipedia it probably corresponds to more automation of the content. Currently Wikipedia seems to resist much of that automation, although I suspect that 'resistance is futile' and that it will incorporate more and more automatically selected and edited material.
So, are there any implications for librarianship? The ones that come to mind are probably obvious:
- More of our metadata will be automatically extracted from source materials (especially as the digital form becomes available earlier in the process)
- This will include suggestions for classification and subject headings
- Authority control of names, etc. has much to gain from automated analysis of both our existing metadata and associated texts
- Access to the full text of collections is going to change everything
The simpler library tasks have already been computerized. More and more of them will be.
I had to laugh a bit when I tried to access a site Denning contributes to called Great Principles of Computing when the links on the welcome screen didn't work in Firefox (magnification had caused a heading to block access to them). There is a lot of information there, but 'usability' isn't the main thrust, nor is 'programming'. I often hear people complain of the lack of programming skills of computer science graduates, but Denning has long felt that thinking of computing as programming is out-of-date at best. See his CACM column of three years ago.
--Th