VIAF has long interchanged data with Wikipedia, and the resulting links between library authorities and Wikipedia are widely used. Unfortunately we only harvested data from the English Wikipedia (en.wikipedia.org), so we missed names, identifiers and other information in non-English Wikipedia pages.
Fortunately the problem VIAF had with Wikipedia was similar to the problems that Wikipedia itself had in sharing data across language versions. Wikidata is Wikimedia's solution to the problem, and over the last year or two has grown from promising to useful. In fact, from VIAF's point of view Wikidata now looks substantially better than just working with the English pages. In addition to picking up many more titles for names, we are finding a million names that do not occur in the English pages, and names that match those in other VIAF sources has nearly doubled to 800 thousand from 440 thousand.
Since we (i.e. Jenny Toves) was reexamining the process, we took the opportunity to harvest corporate/organization names as well, something we have wanted for some time, so some 300K of the increase comes from those.
We expect to have the new data in VIAF in mid to late April 2015 and it is visible now in our test system at http://test.viaf.org.
The advantages we see:
- Much less bias towards English
- More entities (people and organizations)
- More coded information about the entities
- More non-Latin forms of names
- More links into Wikipedia
This will cause some changes in the data that are visible in the VIAF interface. One of these is that VIAF will link to the Wikidata pages rather than the English Wikipedia pages, and we are changing the WKP icon to reflect that ( to
). This means that Jane Austen's WKP identifier (VIAF's abbreviation for Wikipedia) will change from WKP|Jane_Austen to WKP|Q36322. Links to the WKP source page will change from
http://en.wikipedia.org/wiki/Jane_Austen
to
http://www.wikidata.org/entity/Q36322
Although it is possible to jump from the Wikidata pages to Wikipedia pages in specific languages, we feel these links are important enough that we will be importing all the language specific Wikipedia page links we find in the Wikidata. These will show up as 'external links' in the interface in the 'About' section of the display.
A commonly used bulk file from VIAF is the 'links' file that shows all the links made between VIAF identifiers and source file identifiers (pointers to the bulk files can be found here). The links file includes external links, so the individual Wikipedia pages will show up in the file along with the Wikidata WKP IDs. Here are some of the current links in the file for Lorcan Dempsey:
http://viaf.org/viaf/36978042 BAV|ADV11117013
http://viaf.org/viaf/36978042 BNF|12276780
. . .
http://viaf.org/viaf/36978042 SUDOC|031580661
http://viaf.org/viaf/36978042 WKP|Lorcan_Dempsey
http://viaf.org/viaf/36978042 XA|2219
The new file will change to:
http://viaf.org/viaf/36978042 BAV|ADV11117013
http://viaf.org/viaf/36978042 BNF|12276780
. . .
http://viaf.org/viaf/36978042 WKP|Q6678817
http://viaf.org/viaf/36978042 WKP|http://en.wikipedia.org/wiki/Lorcan_Dempsey
http://viaf.org/viaf/36978042 XA|2219
Lorcan only has one Wikipedia page, the English language one. Jane Austen has more than a hundred, and all those links will be there.
Of course, this also means some changes to the RDF view of the data. We're still working on that and will post more information when we get it closer to its final form.
--Th