« April 2008 | Main | June 2008 »

Who are we controlling

Most nights we run a job that goes through a list of personal names in WorldCat that should be linked to the LC/NACO authority file and link about 400,000 of them.  Here are the top names from last night:

Chaucer, Geoffrey,‡dd. 1400
Sullivan, Arthur,‡cSir,‡d1842-1900
Armstrong, Louis,‡d1901-1971
Daudet, Alphonse,‡d1840-1897
Hauptmann, Gerhart,‡d1862-1946
Wodehouse, P. G.‡q(Pelham Grenville),‡d1881-1975
Disraeli, Benjamin,‡cEarl of Beaconsfield,‡d1804-1881
Bennett, Arnold,‡d1867-1931
Sousa, John Philip,‡d1854-1932
FitzGerald, Edward,‡d1809-1883.

Everyone except Louis Armstrong was born before 1900.  Here's the list from the previous night:

Marx, Karl,‡d1818-1883
Calvin, Jean,‡d1509-1564
Rilke, Rainer Maria,‡d1875-1926
Tagore, Rabindranath,‡d1861-1941
Vega, Lope de,‡d1562-1635
Hume, David,‡d1711-1776
Webster, Daniel,‡d1782-1852
La Fontaine, Jean de,‡d1621-1695
McClure, Louis Charles,‡d1867-1957.
Potter, Beatrix,‡d1866-1943

The most recent birth date is 1875.  Another list from over the weekend (from 700,000 headings):

Prokofiev, Sergey,‡d1891-1953
Grimm, Jacob,‡d1785-1863
Erasmus, Desiderius,‡dd. 1536
Gogh, Vincent van,‡d1853-1890
Gershwin, George,‡d1898-1937
Foster, Stephen Collins,‡d1826-1864
Tillich, Paul,‡d1886-1965
Johnson, Lyndon B.‡q(Lyndon Baines),‡d1908-1973
García Márquez, Gabriel,‡d1928-
Milne, A. A.‡q(Alan Alexander),‡d1882-1956

A 1928 date (or 1927 according to the German authority file), someone that is still alive.  Interesting that it is a Columbian author that rates so highly (he won a Nobel prize for literature in 1982).

Now these aren't random names (they all have at least two subfields), but still, I'm struck by how early these common names in WorldCat tend to be.

--Th

Uniform titles 2

CrimeWe are changing our processing of 240's (Uniform titles) in another way than the skipping of generic ones described in the last post.  In this case we are paying more attention to them.

We try to respect the LC/NACO Name Authority File (NAF).  If we want to compare two titles and they both have name/title entries in the LAF, then we don't merge them into a single workset no matter how similar they are.  We also use the uniform title for title comparisons (except possibly for some of the more generic ones).  This works fine for titles that are in the authority file, however many uniform titles are in the records as 240's, but never get their own name/title record in the NAF.  The practice seems to be that unless additional information needs to be attached to the title, the 240 in the record is sufficient.

The example we found was Dostoyevsky's Crime and Punishment versus The Notebooks for Crime and Punishment (actually  Prestuplenie i nakazanie vs. Prestuplenie i nakazanie; neizdannye materialy).  Our latest experiments in Research brought those together (because only Prestuplenie i nakazanie has a name/title record).  The cataloging (with two distinct 240's) would indicate that is an error.

So, now we plan to treat 240's in LC records as authoritative.  That will clear up the Crime and Punishment problem, but we are still struggling to separate Mrs. Piggle Wiggle from Mrs. Piggle Wiggle's Farm without breaking too many other good matches.

Update:  We are seeing just under 86,000 new name/titles derived from the 1XX and 240 in LC records.  Some of them are in records that total thousands of library holdings.

 

--Th

My Photo

June 2009

Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30