« January 2007 | Main | March 2007 »

A musicians cloud

Music_cloud Clicking on the image at left should bring up an HTML cloud of links into WorldCat Identities.  The names in the current cloud on the main Identities page is based on a measure based on number of records and library holdings.  David Palfrey (Univ. of Cambridge), who computed the Wikipedia links on the Identity pages, challenged me on the measure.  Not being able to defend it strongly (it seemed to work?), I rewrote the ranking.  The musicians cloud was built by finding all the Identity pages that seem to be musicians (based on subject heading and genre), ranks them by number of works, manifestations, and library holdings, and picks the top identities from the three lists.  A better way might be to find all the music records in WorldCat and work from that, but I wanted to see how well the Identity records themselves would work.

At first I also included roles, but it seems that many widely held authors have done some sort of musical work that shows up in their role list, so right now that is being ignored.

--Th

Identities and authorities

Turing Jakob had some interesting comments on a recent post.  Among other things he is interested in the relationship between WorldCat Identities and authority files.

Those of you familiar with WorldCat may understand that until a few years ago the Library of Congress/NACO authority file was the official source of the controlled form for names.  This has changed, however, as OCLC adds metadata from sources outside the English speaking world.  Now, if the language of the metadata is not English, you can no longer assume that the form of the name follows the NACO file. OCLC is in the process of loading a number of national bibliographic files and the records are not merged with existing records even if we can identify the records as describing the same manifestation. Notice that it is the language of the metadata that is important, not the language of the item being described.  Not all that subtle a point, but one that is sometimes confusing.

Back to Jakob's first question — what is the relationship of WorldCat Identities to the LC/NACO authority file?  The Identity pages are extracted from WorldCat, so any unique string (after normalization) will result in a separate page, except when we have been able to map it to a more standard form.  This mapping is done using information gleaned from the NACO file.  We are also mapping German names into NACO names using preliminary matches provided by the VIAF project.  In an ideal world the pages would then display using the form of name appropriate to your locale, but right now they display using the NACO form.  We'll be folding in a more international view as our work with VIAF progresses.  We hope to add links to the German PND file of personal name authorities fairly soon.

--Th

Identity cycles

The_cycles_of_time1 One of the things I find most interesting in WorldCat Identities is looking at the related names and their Identity pages.  The other night I as I was clicking through some of them I wondered what the longest cycle would be if you just kept clicking on the first related name.  A little code and a bunch of cpu time later, the answer (this week at least) seems to be 5:

Atiliano Sánchez→ Marcos Brizuela → Francisco Gómez del Palacio → Miguel G Careaga → Basilisa de la Calera → Atiliano Sánchez

Actually I was hoping for some better known names.  There are lots of other ways the game could be played, but maybe that is best left as an exercise for the reader.

--Th

Identities update

Davidadler There have been several comments on yesterdays post about WorldCat Identities.  Here are some reactions to the comments:

Ed Summers asked about 'cool' URLs.  We plan on having more straight-forward URLs in the future, but haven't worked out the details yet.  The current URLs are SRU search requests using the NACO normalized version of the name and should be stable, or at least as stable as the Web site and form of the name is.

Roy Tennant wondered why J. K. Rowling didn't make the top 100 cloud.  For the cloud I computed a score of number of manifestations associated with the name plus the square root of holdings associated with those records.  The 100th entry (David A. Adler) had a score of 157,414, while J. K. Rowling was 402'd with a score of 78,595.  It's probably worth noting that the holdings count is the number of libraries that have said they have the item, not the number of copies that they have.

Roy also found a way to make the search misbehave.  We'll be fixing that soon.

--Th

WorldCat Identities

WcidentitiescloudxISBN may have moved on, but we're still working on some new things here in OCLC Research.  My latest project is WorldCat Identities.  Identities has a summary page for every person (and soon corporate body) based on information gleaned from WorldCat.  It has a fuzzy searching module done by Ralph LeVan, Tom Dehn did most of the XSL work, J.D. Shipengrover most of the design, and incorporates lots of comments and suggestions from Eric Childress, Diane Vizine-Goetz and Karen Smith-Yoshimura.  Much of the inspiration for the pages comes from FictionFinder, the no-longer-with-us RedLightGreen system that RLG did before our merger, and Janifer Gatenby's idea that a Wiki would be a good complement to the VIAF project.

I haven't been blogging very much the last few months since Identities is about all I could think about and it wasn't ready to show.  It's now a lot closer to being ready and the RLG Programs group has started a formal beta test with a great group of libraries (70 contacts at 18 institutions).  They have already made some good suggestions (some easier to implement than others, of course).

The cloud at the top of this post is based on the surnames of the most widely held identities in WorldCat.  One of the most striking aspects of the cloud is the number of composers that are widely held (about a third of the top 100).  This project has been a lot of fun and I think it shows quite well some of the possibilities for the use of library bibliographic data.

Of course, we couldn't stop with just people.  Following FictionFinder's example (and some of their regular expressions), fictional/legendary characters (Robin Hood) and famous animals (Secretariat) have their own pages.

--Th

Update:  I should have mentioned that David Palfrey at the University of Cambridge supplied the Wikipedia links.

xISBN Moving

OpenlyI'm pleased to report that the Openly group here at OCLC is about to take over the operation of xISBN.  Here is a (slightly edited) message that Eric Hellman sent out earlier today:

"As many of you are aware, the xISBN service, developed by OCLC's Office of Research, has been running in an experimental, semi-supported mode, and it has proved quite popular. Last year, OCLC charged the OCLC Openly Informatics Division with the task of making it a fully supported WorldCat service.


At about 4PM EST on Tuesday, February 13, a switch will be flipped, and traffic aimed at the experimental version of xISBN will begin to be routed to a replacement xISBN service supported by the Openly Informatics Division of OCLC. Any application that follows http
redirects- this should be most xISBN client applications- will continue to work without needing changes. The timing of this switch has been dictated by the decommissioning of a server, and we apologize if this short notice seriously impacts anyone.

After the switch, the traffic currently sent to "http://labs.oclc.org/xisbn/[ISBN]" will be redirected to "http://old-xisbn.oclc.org/webservices/xisbn/[ISBN]". This service will respond in almost exactly the same way that the research version has responded; you can change your applications to use the replacement address effective immediately. Of immediate benefit to all users of xISBN will be the drastically improved currency and frequent updates of the xISBN data set.

As you might guess from the replacement system host name, there will soon be a "new" version of the xISBN service. Xiaoming Liu, who has been working on xISBN for 3 months, will unveil the "WorldCat xISBN Service" at the Code4Lib conference at the end of the month.

There is a small difference in the behavior of the replacement service. If you send the replacement service a 13 digit ISBN, the entire result set will be returned with 13 digits.

If you expect your xISBN client service to use more than 1000 queries per day, please let us know (xisbn-support@oclc.org), as the traffic control systems have also changed.

To make sure that you are alerted of all of the coming changes surrounding xISBN, please make sure to sign up for the XIDENTIFIER-L listserv. Sign up at http://listserv.oclc.org/scripts/wa.exe?A0=XIDENTIFIER-L


--

Eric Hellman, Director                            OCLC Openly Informatics Division"

In my (Thom Hickey's) opinion this transfer is a very good thing.  It gives xISBN a home and a chance to grow into a much better service.  The new system does have a 1,000 query/day limit, but Eric seems very willing to work with those for which this proves a problem.  OCLC and its members are fortunate to have Eric and his group available to make xISBN a continuing service.

I'd like to take this chance to thank everyone that has helped us get the xISBN service this far.  We've had many suggestions over the last couple of years, all of which we have listened to, and some of which we've been able to use to improve the service.

--Th

My Photo

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31