Occasionally I run into someone trying to extract information out of VIAF and having a difficult time. Here's a simple example of how I'd begin extracting titles for a given VIAF ID. Far from industrial strength, but might get you started.
The problem: Have a file of VIAF IDs (one/line). Want a file of the titles, each proceeded by the VIAF ID of the record they were found in.
There are lots of ways to do this, but my inclination is to do it in Python (I ran this in version 2.7.1) and to use the raw VIAF XML record:
from __future__ import print_function
import sys, urllib
from xml.etree import cElementTree as ET
# reads in list of VIAF IDs one/line
# writes out VIAFID\tTitle one/line
# worry about the name space
ns = {'v':'http://viaf.org/viaf/terms#'}
ttlPath='v:titles/v:work/v:title'
def titlesFromVIAF(viafXML, path):
vel = ET.fromstring(viafXML)
for el in vel.findall(path, ns):
yield el.text
for line in sys.stdin:
viafid = line.strip()
viafURL = 'https://viaf.org/viaf/%s'%viafid
viafXML = urllib.urlopen(viafURL).read()
for ttl in titlesFromVIAF(viafXML, ttlPath):
print('%s\t%s'%(viafid, ttl.encode('utf-8')))
That's about as short as I could get it and have it readable in this narrow window. We've been using the new print function (and division!) for some time now, with an eye towards Python 3.
--Th
Update 2015.09.16: Cleaned up how namespace is specified
Comments