UNIMARC
I spent a couple of hours yesterday trying to read some UNIMARC records. I found the UNIMARC Manual and started coding up a Python class that could read it in. The record starts with a Record Label that needs to be decoded, followed by a directory, very similar to the Leader and Directory in MARC-21. So, I'm merrily coding along and noticing that this is looking very similar to our class that imports OCLC MARC. Very similar. Actually, in terms of just getting the records read in, identical, other than the Unicode indicator (byte 9 of the leader) in MARC-21. I felt a little dumb when this is what I ended up with:
# inherit from OCLC MARC
class UniMarc(omarc.OMarc):
def isUnicode(self):
return True # just assume it is!
Of course the field tags are all different and I'm sure there are lots of subtle differences, but basically code that deals with MARC Communications Format records can read the UNIMARC records. I suppose this is common knowledge for many, but I was surprised. Has anyone tried to document what the differences are, especially in fixed-field elements?
--Th
Thom,
The British Library must have done that work. They have a tool USEMARCON Plus to convert between diferent flavors of MARC.
http://tinyurl.com/okyxg
Posted by: David Bigwood | August 10, 2006 at 15:03
Yes, you would think so.
I have this theory that we should keep the records we receive in something close to their original format and have routines that understand the format to pull information out of them. There's always a trade-off of loss of information during translation vs. the ease of use, and for research it seems as though ease of use should loose.
--Th
Posted by: Thom | August 10, 2006 at 15:55