Those of you who noticed a recent news release about VIAF (English, French, Dutch with more to come) will know we've been busy. One of the major changes to VIAF is that we have settled on an open license for the VIAF data. Up to now VIAF has been running without a license and access to bulk downloads of the data was available only by special request and the BnF, DNB, LC, and OCLC all had to agree to provide access to the data.
We don't have everything in place, but we hope to release data under ODC-By this month (April 2012). In the interest of being linked-data friendly we are publishing a dataset description as a VoID (Vocabulary of Interlinked Datasets) document. There is a test version of the description on the VIAF test site: http://test.viaf.org/viaf/data. The page describes the dataset, the files available, references the license and provides guidelines on applying the license. Within each of the data dumps themselves, the records will contain a reference to that VoID document.
Attribution is to the Virtual International Authority File VIAF (Virtual International Authority File), and we specify in the VoID document that the use of the canonical VIAF URI (e.g. http://viaf.org/viaf/49224511) qualifies as attribution if more traditional ways of acknowledgement are difficult.
We are working on an online application form and for the first time we have a page up about how to participate in VIAF, and an official email address: mailto:[email protected], which would be a good place to react to all this if you do not wish to add a comment here. And we are interested in people's reactions. OCLC and the VIAF participants have been planning the transition of VIAF to an official OCLC service since last August when we met in San Juan.
I have been involved a couple of workshops about maintaining scholarly identity over the last month, and it is very difficult to come up with convincing business models for the services to make them 'sustainable'. OCLC's interest in supporting VIAF both as part of our public purpose and for use within WorldCat is one of the few such models that is convincing. Of course, a large part of VIAF's success depends on the interest and good will of the VIAF participants!
--Th
Update (2012-04-11): I probably should have said more about the VoID document (see comments below). As Ed Summers points out it contains both microdata and RDFa, trying to work at multiple levels at the same time. Jeff Young put it together -- evidently it is a bit of a trick to get the two working at once. This lead us to HTML 5, which cleans up a lot of the cruft that has accumulated in HTML over the years. JD Shipengrover worked on the display, but it may not be that great in older browsers. I admit that there's some truth in Jonathan Rochkind's observation about 'sort of proof of concept'. We are pushing the boundaries a little and seeing what the technology can do, but I claim we are quite pragmatic about it even so.
This is awesome news.
For real world likely use cases of VIAF, data from VIAF is going to be merged into other data sets, in complex ways.
I suggest you be clear about your 'attribution', about where/how the attribution needs to be given. On every individual page (or API response?) shown to a user (or software) that might possibly include a data element that may have come from VIAF? (A given app may or may not find it easy to track whether a given data element DID come from VIAF; if this is required, it raises the barriers to use).
Or just a single attribution in the application 'about' page 'documentation' saying "some data from VIAF? Or something in between?
Also, I may be revealing myself as crotchety old anti-RDF man here, but trying to comprehend the abstract and indirect graph of data here (VoID documents, what?), and start thinking out how to write the software to use it -- is reminding me unpleasantly of dealing with SOAP. I understand why you want to vend the data in RDF, for it's maximal abstraction, and (at least in my opinion) as a sort of proof of concept of RDF for real data. But if you were able to also provide the data in a simpler, more concise, less abstract/indirect, format fit to the VIAF data specifically (likely JSON)--I suspect you'd get more consumers using it.
Posted by: Jonathan Rochkind | April 09, 2012 at 18:37
We don't have anything against JSON, but haven't gotten around to it. The data is available in RDF, MARC-21, and the native XML. Attribution on a Web page somewhere would be fine. The ODC-By license is really at the dataset level, not the record. Since tracing all the bits would be impractical, it is neither expected nor reqired.
Sorry you don't like the VoID document. We thought it was a good way to collect together a description of the dataset. You can always read it as HTML, which we have tried to make readable.
--Th
Posted by: Thom | April 09, 2012 at 22:07
I like the concept of VoID files, and yours renders nicely as HTML.
One minor issue is that the gzip'ed text file with links is not available. Could you check that, please?
Posted by: Bencomp | April 10, 2012 at 06:16
The actual files for distribution aren't ready yet (nor is the VoID document in its final place in production). Should be up later this month.
--Th
Posted by: Thom | April 10, 2012 at 08:58
Hi Thom, thanks for this update. It is really exciting, particularly the news about the license and the various data dumps. It's a huge step forward.
I took a very quick look at the VoID document, and saw the embedded RDFa and Microdata. I ran it through the RDFa Distiller (http://www.w3.org/2007/08/pyRdfa/) to examine the triples, and it didn't look like much made it through. I did some quick fiddling to see if I could get it to work but nothing obvious popped out. Were you able to see the triples using another tool?
Posted by: Ed Summers | April 11, 2012 at 04:44
My bad, it looks like the W3C distiller that is compatible with RDFa 1.1 (http://www.w3.org/2012/pyRdfa/) is able to extract the triples just fine. I was using the RDFa 1.0 distiller at http://www.w3.org/2007/08/pyRdfa/
I think 1.1 isn't a W3C REC yet, but @JeniT says it is close (https://twitter.com/#!/JeniT/status/190003638788308992). I guess it couldn't hurt to use 1.0 equivalent features until 1.1 support is solid if you want to be (perhaps overly) cautious.
Posted by: Ed Summers | April 11, 2012 at 05:21