There is a blog I look at occasionally called Typo of the Day for Librarians. Every day they post a new typographical error and talk about how and where it occurs in library catalogs. Today while debugging WorldCat Identities I ran across what must be a fairly new error, an XML processing error embedded in the data. In particular the record had & in it (instead of just &). This happens when a string with an ampersand gets 'escaped' for insertion into XML twice:
- The string starts out New York & Pennsylvania
- It gets escaped into New York & Pennsylvania
- Escaping it again gives you New York & Pennsylvania
So if you look at the raw XML you see &. On the screen it looks like & (which, confusingly, is what should be stored in the XML). Interestingly enough, WorldCat.org does some sort of magic, so that the public view of the records display correctly. Today WorldCat has quite a few records with this error in them, but I've given the list to bibchange to look at.
If you are interested in standard typographical errors in catalogs another site (associated with Typo of the Day for Librarians) is Typographical Errors in Library Databases.
--Th
Update (2011 November 4). Here's a more current URI for the last link:http://www.terryballard.org/typos/typoscomplete.html.
Comments