We are changing our processing of 240's (Uniform titles) in another way than the skipping of generic ones described in the last post. In this case we are paying more attention to them.
We try to respect the LC/NACO Name Authority File (NAF). If we want to compare two titles and they both have name/title entries in the LAF, then we don't merge them into a single workset no matter how similar they are. We also use the uniform title for title comparisons (except possibly for some of the more generic ones). This works fine for titles that are in the authority file, however many uniform titles are in the records as 240's, but never get their own name/title record in the NAF. The practice seems to be that unless additional information needs to be attached to the title, the 240 in the record is sufficient.
The example we found was Dostoyevsky's Crime and Punishment versus The Notebooks for Crime and Punishment (actually Prestuplenie i nakazanie vs. Prestuplenie i nakazanie; neizdannye materialy). Our latest experiments in Research brought those together (because only Prestuplenie i nakazanie has a name/title record). The cataloging (with two distinct 240's) would indicate that is an error.
So, now we plan to treat 240's in LC records as authoritative. That will clear up the Crime and Punishment problem, but we are still struggling to separate Mrs. Piggle Wiggle from Mrs. Piggle Wiggle's Farm without breaking too many other good matches.
Update: We are seeing just under 86,000 new name/titles derived from the 1XX and 240 in LC records. Some of them are in records that total thousands of library holdings.
--Th
This stuff is very interesting, thanks for giving us some hints as to what you were doing.
I would be overjoyed if these new algorithmic refinements were released open source as the very basic original workset grouping algorithm was.
Response:
Well, we can describe what we do, but a lot of our recent work has been trying to bring together more of the author/title variants, and that entails some rather complicated heuristics that are very dependent on WorldCat. We do end up with long lists of a/t variants which might be useful to others.
--Th
Posted by: Jonathan Rochkind | May 02, 2008 at 14:43