You might think that a well understood format such as MARC would have a single straight-forward way of being represented in JSON. Not so! There are lots of ways of doing it, all with their own advantages (see some references below). Still, I couldn't resist creating yet another.
This encoding grew out of some experimentation with Go (Golang), in which encoding MARC in JSON was one of my test cases, as was the speed at which the resulting encoding could be processed. Another inspiration was Rich Hickey's ideas about the relaionship of data and objects:
...the use of objects to represent simple informational data is almost criminal in its generation of per-piece-of-information micro-languages, i.e. the class methods, versus far more powerful, declarative, and generic methods like relational algebra. Inventing a class with its own interface to hold a piece of information is like inventing a new language to write every short story.
That said, how to represent the data still leaves lots of options, as the multiple enocodings of MARC into JSON show.
Go's emphasis on strict matching of types pushed me towards a very flat structure:
- The record is encoded as an array of objects
- Each object has a 'Type' and represents either the leader or a field
Here are examples of the different fields:
{"Type":"leader", "Data": "the leader goes here"}
{"Type":"cfield", "Tag":"001", "Data":"12345"}
{"Type":"dfield", "Tag":"245", "Inds":"00", "Data":"aThis is a title$bSubtitle"}
Note that the subfields do not get their own objects. They are concatenated together into one string using standard MARC subfield delimiters (represented by a $ above), essentially the way they appear in an ISO 2709 encoding. In Python (and in Go) it is easy to split these strings on the delimiter into subfields as needed.
In addition to making it easy to import the JSON structure into Go (everything is easy in Python), the lack of structure makes reading and writing the list of fields very fast and simple. The main HBase table that supports WorldCat now has some 1.7 billion rows, so fast processing is essential and we find that this encoding much faster than processing the XML representation. Although we do put the list of fields into a Python object, that object is derived from the list itself, so we can treat is as such, including adding new fields (and Types) as needed, which then get automatically carried along in the exported JSON.
We are also finding that a simple flat structure makes it easy to add information (e.g. administrative metadata) that doesn't fit into standard MARC without effort.
Here are a few MARC in JSON references (I know there have been others in the past). As far as I can tell, Ross's is the most popular:
Ross Singer: http://dilettantes.code4lib.org/blog/2010/09/a-proposal-to-serialize-marc-in-json/
Clay Fouts: http://search.cpan.org/~cfouts/MARC-File-JSON-0.001/
Galen Charlton: http://search.cpan.org/~gmcharlt/MARC-File-MiJ-0.04/
Bill Dueber: http://robotlibrarian.billdueber.com/2010/02/new-interest-in-marc-hash-json/index.html
A more general discussion by Jakob Voss http://jakoblog.de/2011/04/13/mapping-bibliographic-record-subfields-to-json/
Here is a full example of a record using the same example Ross Singer uses (although the record itself appears to have changed):
[{"Data": "01471cjm a2200349 a 4500", "Type": "leader"},
{"Data": "5674874", "Tag": "001", "Type": "cfield"},
{"Data": "20030305110405.0", "Tag": "005", "Type": "cfield"},
{"Data": "sdubsmennmplu", "Tag": "007", "Type": "cfield"},
{"Data": "930331s1963 nyuppn eng d", "Tag": "008", "Type": "cfield"},
{"Data": "9(DLC) 93707283", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "a7\u001fbcbc\u001fccopycat\u001fd4\u001fencip\u001ff19\u001fgy-soundrec", "Tag": "906", "Type": "dfield", "Inds": " "},
{"Data": "a 93707283 ", "Tag": "010", "Type": "dfield", "Inds": " "},
{"Data": "aCS 8786\u001fbColumbia", "Tag": "028", "Type": "dfield", "Inds": "02"},
{"Data": "a(OCoLC)13083787", "Tag": "035", "Type": "dfield", "Inds": " "},
{"Data": "aOClU\u001fcDLC\u001fdDLC", "Tag": "040", "Type": "dfield", "Inds": " "},
{"Data": "deng\u001fgeng", "Tag": "041", "Type": "dfield", "Inds": "0 "},
{"Data": "alccopycat", "Tag": "042", "Type": "dfield", "Inds": " "},
{"Data": "aColumbia CS 8786", "Tag": "050", "Type": "dfield", "Inds": "00"},
{"Data": "aDylan, Bob,\u001fd1941-", "Tag": "100", "Type": "dfield", "Inds": "1 "},
{"Data": "aThe freewheelin' Bob Dylan\u001fh[sound recording].", "Tag": "245", "Type": "dfield", "Inds": "14"},
{"Data": "a[New York, N.Y.] :\u001fbColumbia,\u001fc[1963]", "Tag": "260", "Type": "dfield", "Inds": " "},
{"Data": "a1 sound disc :\u001fbanalog, 33 1/3 rpm, stereo. ;\u001fc12 in.", "Tag": "300", "Type": "dfield", "Inds": " "},
{"Data": "aSongs.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aThe composer accompanying himself on the guitar ; in part with instrumental ensemble.", "Tag": "511", "Type": "dfield", "Inds": "0 "},
{"Data": "aProgram notes by Nat Hentoff on container.", "Tag": "500", "Type": "dfield", "Inds": " "},
{"Data": "aBlowin' in the wind -- Girl from the north country -- Masters of war -- Down the highway -- Bob Dylan's blues -- A hard rain's a-gonna fall -- Don't think twice, it's all right -- Bob Dylan's dream -- Oxford town -- Talking World War III blues -- Corrina, Corrina -- Honey, just allow me one more chance -- I shall be free.", "Tag": "505", "Type": "dfield", "Inds": "0 "},
{"Data": "aPopular music\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "aBlues (Music)\u001fy1961-1970.", "Tag": "650", "Type": "dfield", "Inds": " 0"},
{"Data": "3Preservation copy (limited access)\u001fuhttp://hdl.loc.gov/loc.mbrsrs/lp0001.dyln", "Tag": "856", "Type": "dfield", "Inds": "41"},
{"Data": "aNew", "Tag": "952", "Type": "dfield", "Inds": " "},
{"Data": "aTA28", "Tag": "953", "Type": "dfield", "Inds": " "},
{"Data": "bc-RecSound\u001fhColumbia CS 8786\u001fwMUSIC", "Tag": "991", "Type": "dfield", "Inds": " "}
]
--Th
Note: As far as I know Rich Hickey and I are not related.