« Pervasive content | Main | The day I almost met James Gosling »

Comments

Mickey Hawk

It seems to me that some more direct measure of current interest by the general public would be ideal to include in the formula. That could mean anything from recent search rankings to items that have advanced requests/holds on them. Numbers on items in current circulation get close, but it still seems like there would be a lag before you'd see what's truly of interest to people right now.

K.G. Schneider

I like Mickey's points, particularly about reserve lists. I suspect that "type library" will make a difference here as well. Thom, could you provide some examples of what you consider optiimal search results, with explanations of how they work (instructed liturgy, as it were)?

Kent Fitch

Amazon now provides citation counts for lots of books. Older works tend to have higher accumulated citation counts, and it is possible that combining citation counts with other factors such as circulation, sales rank and library holdings and subjective "review ratings" (both absolute number of reviews and average rating) could give useful ranking data. But ranking isn't enough on its own: all that MARC classification goodness can be used to generate clusters (and a lot "cheaper" than using Latent Semantic Analysis).

Jonathan

You guys are talking about ways to improve a ranking of 'popularity' or 'general interest level' that the OCLC holdings ranking is already intended to do. There might be a way to get a better measure of 'public interest level'.

But the problem Thom identifies isn't about that at all. It's about the fact that the most popular item in the result set isn't neccesarily the one that has anything to do with what the user wants. Should a record where the search query happens to match once in a 500 field outrank items where the seach query matches in an access field, merely becuase the first item is more popular? More popular, but probably not as relevant to the user's query. Fine-tuning measurements of popularity by using hold requests isn't going to help this issue.

But yeah, the key question is identifying what we consider 'optimal' results/ordering of results. Of course, different searchers will consider different things optimal. But we still have faith that some orderings are better than others.

I still think in a general library catalog, simply measuring popularity and nothing else for rankings isn't going to cut it. WorldCat, on the other hand, as opposed to a single library catalog--I suspect that at the moment, it's most frequent use is for known item searching. Simple popularity ranking of one sort or another may very well be the most useful order for a known item search.

Jonathan

Okay, I went and found a concrete example for you. Go and search for 'Orwell' as a keyword search in WorldCat. First hit? "Eats shoots & leaves." Becuase it's got an abstract/publisher's advert in the record, which just happens to mention Orwell in passing, and it's held by more libraries then any other item with orwell in the record (including books actually by or about Orwell).

Is "Eats Shoots & Leaves" likely to be what a user searching for "Orwell" is looking for? Only a minority of the book is even about Orwell.

The 2nd and 3rd items in the list are works by Orwell. The 4th is again an item with a tenuous connection to Orwell. The 5th is "Eyewitness to History", which includes one essay (among many dozens) by Orwell--but it's held by lots of libraries. 6-8 are all actually by or primarily about Orwell. 9 is another collection which includes one essay by Orwell among many not.

Is this really putting items most likely to be of interest to the user entering 'Orwell' as a query first? Doubtful.

Helen Anderson

This doesn't address popularity, but couldn't the presence and maybe position of the words in the subject heading help to some extent in determining relevance?

The comments to this entry are closed.

My Photo

April 2018

Sun Mon Tue Wed Thu Fri Sat
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30