« Things to read | Main | VIAF at Access 2006 »

Melvyl Recommender Project

Cdlcropped The FRBR Blog shows an excerpt from the Full Text Extension Supplementary Report of the Melvyl Recommender Project for their current procedure for deciding whether two items should be considered the same work or not.  They do this by calculating a score based on how well authors, titles, dates, and identifiers match.  All in all, their procedure probably does a fairly good job of bringing together similar items, but I've never been a fan of assigning scores and then adding them up.  They mention 'twiddling of knobs' to adjust the scores and my experience is that you never finish twiddling and that changes and additions to the scoring are very difficult to get right.

My preference is to use a decision table.  Here's one for the Melvyl matching:

TitlesE E E - P P P
AuthorsP E - E E P P
Idents- - E E - - P
DatesP - E E P E -

Here's how to use the table.  For each of the rows, you decide whether the records have an Exact, Partial, or no match.  These are ordered, so a P in the table means that value has to be at least a partial match.  The first column then says that if you have an Exact title match, and at least Partial author and date match, then your records match.  The hyphen in the Idents row means that for this column it doesn't matter how well the identifiers match.  The last column shows that partial matches on all but dates result in a match, whether dates match or not.  In order to match two records they have to satisfy at least one column.

We've found this sort of table easier to understand, extend, and modify.  The Melvyl scoring fit very nicely into this because each of their criteria had three levels.  In our experience that's the right granularity.

--Th

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83459bf2269e200e550717dc78833

Listed below are links to weblogs that reference Melvyl Recommender Project:

Comments

I like the idea of the table, and of three levels of granularity. I also find it easier to understand than adding up scores, but it had never occurred to me to try it out. Much easier to debug problems with a table-based approach, and also easier to explain to people.

Verify your Comment

Previewing your Comment

This is only a preview. Your comment has not yet been posted.

Working...
Your comment could not be posted. Error type:
Your comment has been saved. Comments are moderated and will not appear until approved by the author. Post another comment

The letters and numbers you entered did not match the image. Please try again.

As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.

Having trouble reading this image? View an alternate.

Working...

Post a comment

Comments are moderated, and will not appear until the author has approved them.

My Photo

June 2009

Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30