xISBN may have moved on, but we're still working on some new things here in OCLC Research. My latest project is WorldCat Identities. Identities has a summary page for every person (and soon corporate body) based on information gleaned from WorldCat. It has a fuzzy searching module done by Ralph LeVan, Tom Dehn did most of the XSL work, J.D. Shipengrover most of the design, and incorporates lots of comments and suggestions from Eric Childress, Diane Vizine-Goetz and Karen Smith-Yoshimura. Much of the inspiration for the pages comes from FictionFinder, the no-longer-with-us RedLightGreen system that RLG did before our merger, and Janifer Gatenby's idea that a Wiki would be a good complement to the VIAF project.
I haven't been blogging very much the last few months since Identities is about all I could think about and it wasn't ready to show. It's now a lot closer to being ready and the RLG Programs group has started a formal beta test with a great group of libraries (70 contacts at 18 institutions). They have already made some good suggestions (some easier to implement than others, of course).
The cloud at the top of this post is based on the surnames of the most widely held identities in WorldCat. One of the most striking aspects of the cloud is the number of composers that are widely held (about a third of the top 100). This project has been a lot of fun and I think it shows quite well some of the possibilities for the use of library bibliographic data.
Of course, we couldn't stop with just people. Following FictionFinder's example (and some of their regular expressions), fictional/legendary characters (Robin Hood) and famous animals (Secretariat) have their own pages.
--Th
Update: I should have mentioned that David Palfrey at the University of Cambridge supplied the Wikipedia links.
This is pretty cool stuff Thom. I wonder if you've all considered making the URLs cool so that people could use them as identifiers in their data.
Response: We'll be creating some simpler ways to link to the pages, but the URL as shown in the browser should (almost always) be good until the whole service gets moved somewhere else. --Th
Posted by: Ed Summers | February 13, 2007 at 11:49
As I probably told you when I saw you last week, this is so cool! I like the way we can mine the data in the collective catalog to create interesting and useful ways for people to discover and use bibliographic information. One minor bug report, though. Using quotes in a search (to force a phrase search, as we are now accustomed to do) throws a Java exception.
Posted by: Roy Tennant | February 13, 2007 at 12:44
How is it Rowling didn't make the top 100 tag cloud?
Posted by: P. Melanchthon | February 13, 2007 at 16:10
Excelent stuff! Well designed, looks good, gives relevant information, makes me want to read my old story books all over again. Congrats to all of you; this is in some sense a quantum leap for library systems, and not a moment to late, either.
Posted by: Alex | February 13, 2007 at 20:18
This looks really interesting. Good job.
I especially like the visual timeline and the related names feature. The timeline feature really contains a lot of information that is easily gleaned from the way it is presented and the related names feature is not only close to what people have become used to from commercial enterprises - it also makes a lot of sense in this particular context.
Clean interface and easy to use - what more could we ask for?
:)
Posted by: Rebekka Kinimond | February 14, 2007 at 03:27
When I entered my name in the search box in the inverted form customary for a library search: Bowman, James Ray I got hits for the surname Ray! If the name is to be entered in direct order an instruction to do so would be helpful.
Response: While that might be possible for some names, I'm not seeing it for this search. All the names seem to have surname of Bowman.
--Th
Posted by: James Ray Bowman | February 15, 2007 at 12:48
I got hits for the surname Ray because I omitted a comma after Bowman. I just confirmed that. Sorry.
Posted by: James Ray Bowman | February 15, 2007 at 19:51
I would like to add that a search in the Library of Congress Authorities File works either with or without a comma after a beginning surname. So does a search in the catalog of the Arlington County Public Library in Arlington, VA. I suggest your
system should do likewise.
Posted by: James Ray Bowman | February 15, 2007 at 20:31
Looks clean, useful and fun. It also brings forward, of course, mistakes made in a variety of catalogues. Is it an option to add a 'virtual merge', so results on a page can be combined with the preferred heading?
Example:
If I search for [Verhoeven, Paul] I get 16 results.
#1 is the preferred heading
#2 is a mix between 2 directors, one German and one same as #1
#3 is a wrong year of birth, same as #1
#4 has extra qualifier, same as #1
#10 has misspelling, same as #1
Would be great if it could also be used to correct widespread mistakes.
Posted by: Peter Schouten | February 19, 2007 at 10:21
It is interesting that composers feature so heavily - but what does it mean? I guess from your comment about how the calculation is done, then the answer is that it means most works of music have many more multiple manifestations than literary works?
This makes sense, as we can see immediately that each work will have a written score, and a performance - but because every performance is unique then it seems likely there are many manifestations on the performance side.
If I'm right about this, then I start to wonder - why aren't more of the top 100 composers. Also suprised at the number of names I've never heard of on the list - may just be my ignorance of course, but Adler, Greeley, LaHaye - none of these are familiar to me.
Posted by: Owen Stephens | February 24, 2007 at 03:02
Question for you: how much of Worldcat is in the Identities project? If a sample, what percentage of Worldcat? Looks great and really an interesting way to use Worldcat.
Response: That's a difficult question to answer. We look at all the records, so every unique name (after normalization and some cross references) have a page and they are all reflect to a certain extent (e.g. in the counts for the timeline and languages). I would guess that there are citations for a large percentage of works, especially if they were weighted by library holdings, but I don't have a count. --Th
Posted by: beverly | February 28, 2007 at 15:18