« September 2006 | Main | November 2006 »

Relator codes and terms

Relator MARC-21 has the concept of relator codes and relator terms which can be associated with a contributor to indicate their role in creating an item.  Here's LC's list.  I'm sure there's a reason why you can use either the $e to put in a term, or $4 for a code, but from a processing point of view, that's just one more redundancy we have to handle.  I recently did a scan of our July 2006 copy of WorldCat.  I found 369 different codes and 9,456 terms after some normalization.  Here are the top codes, along with the number of times they occurred:

prf (1,080,900)
cnd (203,921)
voc (78,921)
itr (77,058)
aut (72,700)
act (56,518)
arr (50,621)
edt (49,205)
trl (43,608)
ill (42,657)

and the top relator terms:

ed (629,083)
joint author (474,307)
ill (214,764)
tr (172,801)
comp (123,239)
printer (60,070)
photographer (45,115)
orient (40,064)
illus (38,201)
former owner (34,892)

I was struck by the small amount of overlap between the two lists.  Most of the terms and codes are self explanatory.  PRF is performer and ITR is instrumentalist.  It wasn't obvious what orient is, but since they occurred in Portuguese thesis records I asked my friend Ana Pavoni in Brazil, who happens to be a world expert in ETDs.  She says it stands for 'orientador' which means mentor or supervisor.  Former owner refers to someone that once owned the item (usually a book).  It surprised me to see it occur so frequently, but for some items it is of interest.

The total number of occurrences was a bit higher than I expected, and it looks like both fields need to be inspected.  I've made a table of the more frequent relator terms encountered in records and translate those into relator codes for my processing.  I've found that if a person's records have a high proportion of cmp (composer) codes and a high proportion of records that indicate 'language' as zxx (non-language material), they are what most people think of as composers.

--Th

VIAF at Access 2006

Viafflow A little over a week ago I spoke Podcast Slides at the Access 2006 conference in Ottawa about how we are processing the data for VIAF and our new WorldCat Identities project.

Podcasts of the talks are now available along with the slides for many of the Access talks.  I wasn't there for the whole conference, but I can recommend the talk by Slavko Manojlovich about the Atlantic Scholarly Information Network Podcast Slides.

--Th

Melvyl Recommender Project

Cdlcropped The FRBR Blog shows an excerpt from the Full Text Extension Supplementary Report of the Melvyl Recommender Project for their current procedure for deciding whether two items should be considered the same work or not.  They do this by calculating a score based on how well authors, titles, dates, and identifiers match.  All in all, their procedure probably does a fairly good job of bringing together similar items, but I've never been a fan of assigning scores and then adding them up.  They mention 'twiddling of knobs' to adjust the scores and my experience is that you never finish twiddling and that changes and additions to the scoring are very difficult to get right.

My preference is to use a decision table.  Here's one for the Melvyl matching:

TitlesE E E - P P P
AuthorsP E - E E P P
Idents- - E E - - P
DatesP - E E P E -

Here's how to use the table.  For each of the rows, you decide whether the records have an Exact, Partial, or no match.  These are ordered, so a P in the table means that value has to be at least a partial match.  The first column then says that if you have an Exact title match, and at least Partial author and date match, then your records match.  The hyphen in the Idents row means that for this column it doesn't matter how well the identifiers match.  The last column shows that partial matches on all but dates result in a match, whether dates match or not.  In order to match two records they have to satisfy at least one column.

We've found this sort of table easier to understand, extend, and modify.  The Melvyl scoring fit very nicely into this because each of their criteria had three levels.  In our experience that's the right granularity.

--Th

Things to read

Lunacy2 Yesterday I ran into a number of things worth looking at, so just in case you don't have enough to read, here's some more.

I spent some of the weekend at Access 2006 in Ottawa and heard Clifford Lynch's talk about what he's tracking and influencing.  He mentioned a speech by Mary Sue Coleman, president of the University of Michigan which she gave in February 2006 to the Professional/Scholarly Publishing Division of the Association of American Publishers defending her library's decision to participate in the Google Book project.  Clifford pointed out that one of the consequences of this is that for the first time in its history the university has a backup plan for the library.  Ms. Coleman's speech is worth reading.  My guess is that the courts are going to stop the digitizing of the still-in-copyright material, and that it may be a tragedy.  When you think in terms of decades or centuries there is a reasonable chance one or more of these libraries will disappear and a digital surrogate is better than nothing, not to mention the better disclosure of it in the meantime.

Two items touching on non-Latin authority control:  The Association for Library Collections and Technical Services Task Force on Non-English Access report (noticed in Catalogablog).  Sorry if by the time you read this the link is stale--library reports really should have cooler URLs.  Also a paper by Heidi Lerner in Library Resources & Technical Services v. 50 #4, October 2006, pp. 252-261, an extensive review entitled 'Anticipating the Use of Hebrew Script in the LC/NACO Authority File' (no problem with a link to LRTS going stale--you can't make them).  Someday we'll have non-Latin in NACO.  Maybe the VIAF project can help with this.

And finally an hilarious pamphlet lent me by Eric Childress: Lunacy and the Arrangement of Books by Terry Belanger, which I recommend for all bibliophiles, would-be bibliophiles, and classification enthusiasts.

--Th

My Photo

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31