The FRBR Blog has a post with some interesting comments about the FRBR Implementers meeting at ALA Midwinter. Since I talked there a bit about the differences of the groupings used for WorldCat.org, xISBN, and FictionFinder I thought it would be worth mentioning them here in more detail.
To compute the FRBR clusters at the work level we first extract keys as described in our published algorithm. Part of the algorithm uses cross-references derived from the NACO authority file to bring authors and author/titles together that might not otherwise. These clusters, however, are not ready for xISBN, since you might find the same ISBN in more than one group, and we want the groups to be consistent with each other. To fix this we do quite a bit of additional processing, part of which finds additional matches based on ISBN matches which let us relax the rather stringent author/title matching done in the first phase. We use these additional matches to create cross references which can then be used to redo the initial clusters.
Of course, the new clusters then change the xISBN clusters, which produce new cross references. We actually make three passes to arrive at relatively stable state.
Another wrinkle to this is that FictionFinder project is making intense usage of the FRBR work clusters and has identified numerous titles that should be brought together, but are not, mostly because of minor title variants. Diane and her group have generated tables of these which we use as cross references in the initial processing, in addition to those derived from the authority file. In fact our extra cross references contain hundreds of thousands of mappings and are growing rapidly. We are working on some new techniques to identify variant authors and titles which we think will add millions of new mappings.
The tables that come out of this process are used by both FictionFinder and WorldCat.org (see update below). The cross references also feed into the Oracle database from which some tables are being distributed in pilot FRBR projects that need stable work-level identifiers.
Although we are aware of instances where too many items are drawn into groups, our current research is on the variant author/title identification mentioned earlier which will increase the size of clusters rather than split them into smaller groups. The overly large groups are often because of our reliance on uniform titles and we have some ideas on how to address that problem too.
--Th
Update (February 2, 2007):
WorldCat.org now uses the FRBR sets as maintained in the Oracle database.
Thanks! Sorry I missed your name in my FRBR Implementer's report.
Posted by: Jonathan Rochkind | January 30, 2007 at 14:17