« Testing date parsing by fuzzing | Main | VIAF RDF Changes »



A study for Europeana of datasets including Person/Organization names: http://vladimiralexiev.github.io/CH-names/README.html. Conclusions:
- The best datasets to use for name enrichment are VIAF and Wikidata
- There are few name forms in common between the "library-tradition" datasets (dominated by VIAF) and the "LOD-tradition datasets" (dominated by Wikidata)
- VIAF has more name variations and permutations, Wikidata has more translations
- VIAF is much bigger (sec 2.4.2): 35M persons/orgs. Wikidata has 2.7M persons and maybe 1M orgs
- Only 0.5M of Wikidata persons/orgs are coreferenced to VIAF, with maybe another 0.5M coreferenced to other datasets, either VIAF-constituent (eg GND) or non-constituent (eg RKDartists)
- A lot can be gained by leveraging coreferencing across VIAF and Wikidata
- Wikidata has great tools for crowd-sourced coreferencing

I'm very glad of your news above. This means the rift between Wikidata and VIAF will narrow quickly.
- presentation "Wikidata, a target for Europeana's semantic strategy?" (https://nl.wikimedia.org/wiki/GLAM-WIKI_2015/Proposals/Wikidata,_a_target_for_Europeana%E2%80%99s_semantic_strategy%3F) upcoming at GLAM-WIKI 2015
- please participate in https://www.wikidata.org/wiki/Wikidata:WikiProject_Authority_control

#coreferencing works!

Andy Mabbett

This is a great move, and I'm happy to see it happening.

However, I'm not sure it's a good idea for you to store Wikipedia links, once you have the Wikidata ID.

Wikidata IDs should never change (even if two duplicates are merged, one will redirect to the other, so still be valid).

However, if another notable person called Lorcan Dempsey emerges, say a footballer, then the existing Wikipedia page may be moved to, say:


to allow for the hypothetical


and the original article:


would become what we call a "disambiguation" page, listing the others, but not in a machine-readable format.

Unless VIAF is going to regularly scan for such changes, and update its links, it may better for people (or software) using VIAF data to fetch the Wikidata links, then to fetch up-to-date Wikipedia links from WIkdiata.

Reply: We harvest Wikipedia/Wikidata each month, so the links should stay reasonably in sync. --Th

John Mark Ockerbloom

I'm very excited to hear about this move, and about the increased name coverage. I'm also glad you'll be keeping the article URLs in the links file as well as the data identifiers; although one can theoretically get one from the other, it's much more convenient to have them together. I look forward to hearing when the first enhanced links file is available.

(I also maintain a set of topical subject-article correspondences-- currently in Github rather than Wikidata-- and it's not too hard to keep the article titles in sync after each monthly English Wikipedia dump. I'd imagine it's not too hard for OCLC to keep on top of the article-title changes for names as well.)

Joachim Neubert

Great to hear about that move!

However, as a user having build an application using the justlinks service, one sentence is scaring:

"Jane Austen's WKP identifier will change from WKP|Jane_Austen to WKP|Q36322"

This will cause our application to break. We will have to check when this happens, and fix it in a hurry. (Thanks for the test environment for preparing this step.)

It would be great if you could provide the Wikidata link with a new WKD tag, while leaving the the Wikipedia links at WKP. Finally, one could argue that these are different datasets.

Cheers, Joachim

Reply: We went back and forth about changing the abbreviation, but decided there would be less confusion (and, to be candid,fewer changes on our end) if we left it the same. Most applications will have to change to cope with the large numbers of new Wikipedia links in any case.


Fantastic to hear. Wikidata is becoming more and more the way data can be organized so linking the two should make connecting data across the world that much easier.


I didn't know about Wikidata! That is really interesting! I also don't find Wikipedia that reliable, but I still like to use it for most of the things that I need to research! Thanks! Greets!

The comments to this entry are closed.

My Photo

April 2018

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30