« April 2006 | Main | June 2006 »

LibX and xISBN

Libx LibX has a screen cast about how it is using xISBN.  LibX bills itself as 'A Firefox Extension for Libraries' and is a joint project of the Virginia Tech Newman Library and the VT Department of Computer Science.  In addition to understanding xISBN, it recognizes COinS links, standard identifiers such as DOIs and PubMedIDs, and bundles a number of features to help you use your library through your browser.

I suspect that eventually the LibX xISBN support will become both less visible and more automatic, but LibX's use of it is a good example of how this sort of functionality can be integrated on top of existing systems without great effort.

--Th

xISBN and stable identifiers

Gears There has been an extended discussion on Code4Lib about how to use the xISBN service to create identifiers that could be used for grouping records in an OPAC.  Many assumed there was probably such an ID associated with xISBN's groups of numbers, but that's not the case.  We do create an identifier for each of the FRBR groups, which in turn are used to create the xISBN groups, but that identifier is not guaranteed to be stable across runs.  The resulting xISBN groups are just groups, with no identifier other than the individual ISBNs in the group.

Ben Ostrowsky is going to the trouble of using xISBN to look up something like 500,000 ISBNs from the Tampa Bay Library Consortium catalog, and creating identifiers for record display grouping.  This should work for a while, but as groups get joined and split in xISBN as our code and data change this could cause problems longer term.

We (OCLC Research) plan to keep the xISBN service up indefinitely, but OCLC is looking into creating a more 'production' service.  What this will look like isn't clear right now, but some obvious things might be more frequent updating (currently xISBN's database is updated every six months) and guaranteed levels of service.  Oh, and maybe a stable identifier for the groups would be a good idea too, so a project like Ben's could incrementally update its database.

We've actually been struggling with the same issue of stable identifiers for the VIAF project.  From a research point-of-view avoiding the need for maintaining identifiers across time is very attractive (we can just recompute the whole thing without reference to previous matches).  From a production and distribution view, stable identifiers seem necessary.

By the way, if anyone else plans to do a harvest like this, please let us know ahead of time (as Ben did), and try not to hit the server too hard.  We seem to be able to support multiple hits/second, but we've never load tested xISBN, and the system it runs on is running out of capacity (we should have a replacement up in a few weeks).

--Th

An interview

Educause_logo_1 A couple of weeks ago I did an interview with Matt Pasiewicz of EDUCAUSE.  I've done videos before, but this is the first recorded interview I can remember.

The day before we were going to do the interview, Matt asked me whether I preferred 'Skype or Gizmo?'  Since I hadn't used either (and OCLC currently discourages use of voice over IP (VoIP)) we had to reschedule the interview until I was set up.  We ended up using Skype, which worked pretty well, although digital artifacts are fairly apparent in the audio quality.  While I was signing up with Skype I went ahead and paid them $10 so that I could 'dial-out' and get to regular phone numbers.  It works, but the audio quality isn't very good.

While paying my fee, one of the things I noticed was that Skype couldn't accept a credit card number with embedded blanks.  Jon Udell had a recent post about how idiotic this is.  Skype has hundreds of millions of customers, more than six million concurrent users while I was on, and they can't accept a credit card number in its most natural format.  It doesn't give you a lot of confidence in the rest of their system (e.g. the security of that credit card number I persuaded them to take).

--Th

Worse is better?

Sgiscaled 'Worse Is Better' is a bit of programmer lore that has fairly wide acceptance.  People often accept this as the reason why VHS triumphed over Betamax, and why people use Microsoft products.  The standard essay about it (see this and this) is about the problems Lisp has had becoming widely used.

I was reminded of this hearing that SGI, which got its start making high-end graphics workstations, has gone bankrupt.  I'm not sure I think 'worse is better' is ever an explanation of one technology winning over another.  In most cases it isn't so much that one is better than another that is important, it is how broadly useful the technology is.  We see this over and over.  The IBM PC was more popular than Apple because it was a more flexible architecture, even though, for what it did, the Apple was at least as good.  VHS had a longer recording time than Betamax and was more widely licensed, so it was more generally useful.

The explanation for this sort of thing is more along the lines of Clayton Christensen's Innovator's Dilemma than 'worse is better'.  The dilemma is that disruptive technologies emerge that address the low end of your business.  This is fine at first, since there is little margin at the low end.  Unfortunately these technologies tend to improve to the point where they take more and more of your business until yours is gone.  All during this process, people are using the 'best' technology for their purposes.  I suspect this happened to SGI's original graphics business as standard PC graphics progressed from terrible to amazing.  You can see a very similar process going on as more and more computers are based on the x86 chip architecture rather than special-purpose designs.

--Th

Buchareşti & ELAG 2006

Buchlibnook ELAG (European Library Automation Group) 2006 was held in Romania this year at the Central University Library in Bucharest.  The library is across from a square (Piata Revolutiei) where quite a bit of fighting went on during the revolution of late 1989, and the library itself (along with its card catalog) was completely burned during the troubles.  Since then the university has built a new library building which is connected to the now nearly-restored original library where ELAG met.

ELAG was its ususal sprinkling of interesting papers and more interesting people.  Since this was held in Eastern Europe there were quite a few attendees from Slovenia, Slovakia and the Czech Republic.  Of the papers from E. Europe I was especially struck by one by Adolf Knoll, of the National Library of the Czech Republic in Prague.  They seem to have developed quite a sophisticated system.

Bucharest itself, however, might be more interesting.  This is obviously a city with some infrastructure problems, although the airport was fine and the roads seem to be getting quite a bit of attention.  Drivers tend to be aggresive, but they will stop for people in crosswalks — they just don't want you to think so.

--Th

Scanned books and the DDC

Last Sunday (14 May 2006) the New York Times Magazine had a story about scanning books, especially the Google Book Search project.  No article like this is complete without a swipe at 'out-of-date schemes like the Dewey Decimal System, particularly in frontier or fringe areas like nanotechnology or body modification.'

I suppose it's nice to have the classification scheme everyone has heard of, and the DDC certainly has some limitations, especially if you want to use it to index things like journal articles, but this article is mostly talking about books.  Books in libraries.  Books in libraries, most of which already have Dewey numbers assigned.  This is what the DDC is designed for.  More informal tags are great, but I'd be surprised if people use them to group books (or even parts of books) much by what librarians think of as subjects.  These are not pictures, which are notoriously hard to search for, but can be turned into text objects that lend themselves to searching.

To a certain extent the availability of full text takes away the need for very specific indexing, but the DDC offers a great compliment to that, at a level and uniformity that would be difficult to achieve in other ways.  And I suspect that even 'body modification' has a place in the DDC.  Probably more than one depending on what aspect of it is being considered*.

--Th

*See Tags and Dewey in 025.431: The Dewey blog

Name searching

Naco For some time we've had a simple Web service up for searching the NACO name authority file.  This grew out of our (somewhat limited) participation in the eprintsUK project.  One of our plans was to have a service that could be used to verify names for institutional repositories.  We haven't given up on that, and Ralph LeVan in particular is interested in linking local names to global names.  A couple of weeks ago Ralph redid the matching algorithm used in our name lookup.

This is something we've known was needed for some time.  As a matter of fact, I worked out a new ranking algorithm which improved the retrievals, but it never made our public service.  Ralph's, though, is substantially better since it can tolerate many spelling errors and is smart about ranking based on usage counts and preferred form vs. cross references in the authority file.

I'm pretty far down the list if you just search for Hickey, (my 'name rank' in WorldCat was at something like 650,000th the last time I looked) but a search for T B Hickey retrieves only my record.  Because of the toleration for spelling variants, the searches T B Hicky and T B Hikcey both work too.  Here's another search that failed in the old service: Jim GrayThe Jim Gray I was after comes up fourth in the list (we need to get the $q displaying to differentiate the second and third entries).  This one is a bit tricky, since Jim's real name is James Nicholas Gray, but he writes and is established as Jim Gray.  Our old system lost him in a sea of James Grays.  Even the new system won't find him as Jim Grey (with an e), since that form is quite common.

We think the system works pretty well.  Give it a try, and if you find something that doesn't work the way you expected let us know and we'll see if we can fix it.

--Th

The day I almost met James Gosling

News James Gosling is understandably famous.  Although he wrote the first C version of the Emacs editor, I first became aware of him while trying out the Andrew window manager in the mid eighties.  The Andrew WM was interesting in that it tiled windows rather than overlapped them, and I rather liked it. Then he designed NeWS (Network extensible Window System) in the late eighties that was built using PostScript (little or no relationship to Display PostScript which was Adobe's idea of how to use PostScript on screens) that was also very interesting.  I designed a couple of very simple windowing systems in the early eighties for page display on MS-DOS, and so had some idea of the complications involved.

Now Gosling is mainly known for developing Java.  When Java was first announced a couple of us at OCLC jumped at it and started using it.  Our main problem was speed.  One of our applications decoded Group-4 encoded images for page display and when written in Java it was very slow.  One of the main Java developers got interested in it and rewrote it to speed it up some and Keith Shafer (no longer at OCLC) and I went out to California to visit and talk about our applications.

I was in charge of the car, and during the visit went out to move it to avoid a parking ticket.  While I was out, Gosling appeared and Keith got to talk to him a bit.  I still wonder whether I would have had the nerve to ask him if he would sign my copy of The NeWS book I was carrying with me.

Here's an interesting article by him: Windows System Design: If I had it to do over again in 2002.

--Th

My Photo

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31