« March 2006 | Main | May 2006 »

FRBR Ranking

Numberblocks There has been an interesting discussion on the Code4Lib mailing list the last couple of days about how to rank results in a FRBR environment.  I weighed in with the common opinion around here (at least in OR) that the major factor in ranking should be some sort of popularity score.  We typically use the total number of WorldCat holdings for the work, but it would seem as though circulation data could be used as well.  Other ranking criteria, such as the number of times a term occurs, I claimed are secondary at best.

Shortly after posting that, we had a visitor that pointed out a weakness in ranking only by library counts.  Diane Vizine-Goetz was demonstrating a soon-to-be-released version of FictionFinder by searching for 'Don Quixote' and the second most highly ranked item was Henry Fielding's History of the Adventures of Joseph Andrews, "A Henry Fielding novel written to imitate the action of Cervantes' romantic-heroic character, Don Quixote.'"

Now obviously the Fielding novel is related to Don Quixote, but it doesn't seem as though it should be second in the list, especially because there were several other 'works' listed that look as though they should have been included in the main Don Quixote group, but were missed because of title variants (e.g. The Ingenious Gentleman Don Quixote de la Mancha).  It's even conceivable that Joseph Andrews could have come ahead of Don Quixote in the list if it had more library holdings.  (Actually, it isn't even close at 4,866 holdings versus Don Quixote's 40,257).

So, I think it is clear that a simple library count isn't the best possible way to rank FRBR work-sets.  What should be done to fix it is less clear.  In the above example the string 'Cervantes, author of Don Quixote' actually appears in the subtitle of many manifestations of Joseph Andrews.  Right now ranking by library holdings is fairly understandable, and in our experience works very well.

--Th

Pervasive content

Maxheadroom Birte Christensen-Dalsgaard is visiting OCLC this week and just gave an interesting lecture on the Danish Stats Biblioteket's efforts to improve the user experience.  One point she made is that we are approaching a time of 'pervasive content', much along the lines of the long predicted pervasive computing.  By this she means that content (of books, articles, music, etc.) will be available everywhere you need or want it.

Of course, pervasive computing is a prerequisite to pervasive content, but I think we would all agree that computing is certainly becoming pervasive.  And I think Ms. Christensen-Dalsgaard is probably right, that easy access, not only to freely accessible Web pages but also to licensed content, is something libraries should be planning for.

Lorcan Dempsey was just in my office pointing out Malcolm McCullough's analogy of writing to computing.  Writing moved from the scriptorium to street signs, while computing is moving from the machine room to the fabric of our lives.  Here are some more of Lorcan's thoughts on Digital Ground.

--Th

MXG and OpenSearch

Metasearch Those of you interested in lowering the barriers to inter-system searching should be interested in the work of the NISO MetaSearch Initiative.  This group has been nibbling at the interoperability problem for some time and Task Group 3 (Search/Retrieve) has developed the MXG (for Metasearch XML Gateway) protocol.  What this amounts to is a prescribed way to dumb-down SRU, almost all the way to Open Search, but doing it in a way that is compatible with SRU.  The idea of all this is that, sooner or later, your system is going to need at least some of the facilities of SRU and using a protocol that is at least compatible to it will make everyone's life easier when that day comes.

Ralph LeVan has been active in both the SRU and MXG work (the Metasearch XML Gateway Implementors Guide is mostly his work) and has produced a 'Functional Matrix' that compares OpenSearch, the three levels of MXG, and SRU:

OS 1.0MXG L1MXG L2MXG L3SRU
Request Record Starting Point
Request Number of Records
Request Record Schema.
Defined Query Grammar. . .
Specify sort Order. . . .
Specify Ranking Order. . . .
Diagnostic Messages.
XML Response
Record Count In Response
Records In Known Schema

= Full Support, = Limited Support

I extracted this matrix from a presentation by Ralph and LC's Ray Denenberg at the Computers in Libraries 2006 workshop on Interoperability Standards and Searching Multiple Repositories

Here's a short description of the MXG levels from the Implementors Guide:

  • Level 1 defines a standard URL which will accommodate arbitrary query grammars.
  • Level 2 extends Level 1 by adding the requirement that servers provide a standard XML record that defines the capabilities of the server.
  • Level 3 extends Level 2 by adding the requirement that servers support a standard query grammar: CQL

It's probably worth mentioning that Levels 1 and 2 are non-compliant subsets of SRU.  The recent SRU Implementors Group Meeting Report has a discussion of the effort to harmonize SRU with OpenSearch.  It's actually fairly interesting reading -- SRU is under active development.  Among other things mentioned is the definition of an OAI-PMH profile for SRU, something we've been doing here in OCLC Research for some time (i.e. OAI over SRU).

Thanks to Eric Childress for suggesting this post.

Thanks also to Ray Denenberg who pointed out the SRU meeting report and that the move towards compatibility between OpenSearch and SRU isn't currently being driven by the Metasearch initiative.

--Th

My Photo

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31