« Controlling names in WorldCat | Main | Uniform titles 2 »

Comments

Patt Leonard

Re: your comment "Our current work clustering always uses the 240 in preference to the title proper reflected in the 245 (title statement) field." I ran into a problem with that logic while trying to search for anthologies of comics in Worldcat.org. Search "au:henley marian" brings up 10 records. "Maxine" is one of the titles I wanted. When I click on that title, I see a note "2 editions". Clicking on that I see another title (Laughing Gas), which is not another edition at all--it's a different compilation of the Maxine cartoons, but both records have "Maxine! |k Selections" in the 240 field, and so Worldcat.org groups them together. In this instance, that 240-over-245 logic should not be used.

Response:
Yes, what we are proposing would avoid that match.

--Th

Hal Cain

Doesn't this problem indicate that the use of MARC tag 243 (bibliographic), which hasn't been implemented in systems following LC practice, would be worthwhile? Would it be feasible to implement it retrospectively, using the technique described in your blog about automated linking of bib headings to authorities? There's no comparable provision for name-title tags.

Unfortunately, in the bib format 1XX $u is already taken (though it's unassigned in the authority format).

Response:
Yes, I suppose this would be possible, but you would probably have to get LC to change its practice.

--Th

Stephen Hearn

The conventional title 240 is definitely a different animal, and not a FRBR "work" title. A thornier question: what does WC Identities mean by "most widely held works"? In Dostoyevsky's list, "The Gambler" appears three times. Is it three works? Crime and Punishment appears both at the top and at the bottom of the expanded list. So what does "work" mean in WC Identities? In this list, it appears that the FRBR "work" doesn't really "count."

Response:
We group the manifestations into 'works' as best we can based on author/title. In this case, 'The Gambler' was combined with other works, confusing us. In the future we hope to recognize that a manifestation might contain multiple works, but our software isn't up to that challenge yet (they can get very messy with our current cataloging).

--Th

Casey Mullin

As part of our Variations3 project here at IU, we are investigating ways to derive work records from MARC records. Not suprisingly, we have spent a great deal of effort dealing with these "collective titles". Happily, in most cases, when the 240 has one of these values, there are analytical 700 added-entry fields from which to derive works. Beyond that, the other source for work-record data is the 505 contents field, which is significantly less bound to a specific format (and thus all but impossible for a machine to parse meaningfully). The upshot of this is: we also ignore these "generic" 240 values. In some cases, they are simply place-holders in my opinion, and useful only for filing in catalog displays. In a FRBR-ized environment, however, they would be even less necessary. Nonetheless, I do not agree with the current effort by the Joint Steering Committee to eliminate the use of "selections" altogether.

Casey Mullin
Metadata Assistant -- Variations3 Digital Music Library
Indiana University

Kathy Glennan

As you note, "Music has its own highly developed approach to uniform titles." So, what is the impact of this decision that considers more than a dozen music-related collective uniform titles "noise"? Isn't the real problem, as Hal Cain has already noted, the fact that these are not coded as collective uniform titles? When WorldCat attempts to group bibliographic records into works, when is it appropriate to use some of these collective uniform titles? For example, do you want to bring together *all* sound recording expressions containing Beethoven's nine symphonies, or should this be restricted to the same performances (e.g., on LP, cassette, CD)?

Response: Actually we are taking another look at some of these. In particular manifestations that use 'Symphonies' don't seem to have any place better to go. I'll blog about this after we get a little more data.

--Th

Giles Martin

A couple of uniform titles on your list are in fact intended to be used for single works, but only make sense in conjunction with the 1XX field used in conjunction with them. These are "Constitution" and "Annual report".

The MARC field "240 Constitution" does not refer in itself to a single work, but the combination of "110 United States" with "240 Constitution" does refer to a single work. Similarly, "240 Annual report" by itself is not a single work, but in conjunction with "110 OCLC" is does refer to a single (serial) work.

Others on your list can refer to identifiable collections of works wthat are useful to treat as a single entity. For example, "240 Symphonies" in conjuction with "100 Beethoven, Ludwig van, 1770-1827" refers to a collection of 9 works which are often published together, and so are useful to refer to together. So that collection would usefully have its own identity.

I don't think you can discuss 240 headings without looking at their relationship with 1XX headings at all, just as you can't use 245 fields by themselves (since it's quite common for different works to have the same 245 title.

Response:
Yes, we always combine the 240 with a 1XX.

--Th

Michelle Hahn

Out of curiosity, are you planning on only ignorning the 25 common uniform titles when they are used for more than one specific work?

Fore example, would you be ignoring "Symphonies", but NOT ignoring "Symphonies, no. 3, op. 55, Eb major"?

Based on the numbers, it looks like you are planning on ignoring the all-encompassing titles, rather than the titles for individual works, but I just wanted to be clear!

Response:
Yes, it would only be for those with 'Symphonies'. As I mentioned on another comment, though, Symphonies is probably one that we won't use, along a few of the others. We haven't quite finished analyzing these.

--Th

Conal

How are the computed clusters being persisted?

Do you store a control number in the $0 subfield?

Response:
OCLC bibliographic records are stored in an internal XML schema (we call it CDF for Common Data Format), and the work ID is just another field in that. In the Research tests I'm describing, we don't have a persistent identifier other than the author/title key we finally end up with after quite a bit of processing.

--Th

Conal

Thom, I was really wondering whether this authority work was going to leak back out into the MARC world, or remain internal to OCLC. I realise the work is still experimental, but do you have plans to publish the clusters in MARC form?

Response:
I am not aware of any plans for publishing them (beyond WorldCat), but it is an interesting suggestion, and I'll give it some thought.

--Th

The comments to this entry are closed.

My Photo

February 2016

Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29