Lorcan has a long posting about metasearch on his weblog and a comment by Judith Pearce, of the National Library of Australia asking questions about the role of fielded searching. An associated question that I've seen in some of NLA's papers is the role of controlled vocabularies. Since these questions get to the heart of what the OCLC's metadata services, this is important to us. In addition to a few opinions, I've actually been doing some experimentation that addresses some of the issues.
First some opinions.
- Everyone would rather not deal with fielded searching, if they can avoid it
- You can't avoid it, at least not always
- Controlled vocabularies help
- Centralized systems work better
- Retrieval systems need ranking
- Speed counts
1) is just true, and the arguments I've heard against it have faded over the years with only the occasional authorities librarian unconvinced. A well designed system should work for 'typical' use without requiring decisions on what fields to search.
2) The standard example to show the need for fielded searching, is whether you are looking for books by Shakespeare or about Shakespeare? Fielded searching helps with that. Just because you can make a system that works most of the time without it doesn't mean that your most vocal and intense users aren't going to need and demand it. Even Google offers fielded searching for the limited metadata tags it does have.
3) Controlled vocabularies (I'd group classifications systems here) are extremely helpful, once you find the term, but finding the term can be a real problem. The experience of RedLightGreen, which makes subject headings visible and easy to use, is that students really liked the 'librarian terms'. Controlled vocabularies bring together items that natural-language systems miss, and do it in a simple, clean, and understandable manner.
4) Working for a centralized system, I may be prejudiced here, but it seems clear to me, and the more 'magic' you want out of the systems, the more you will be pushed into centralized systems, if for no other reason than 5 & 6:
5) Ranking is the key. Of course the trick is to get a ranking that reflects your users' needs. It took the Web search engines to convince us that items have an intrinsic rank, quite apart from the vocabulary used to search them. It is hard to impossible to do ranking well in a distributed system.
6) Speed. I've heard that Google claims that every tenth of a second makes a difference in the use of their system. A tenth of second is just about what a person can discern, and it does affect use. I've done experiments over the years with systems with response times on that order, and the user experience is very different than even a 1-2 second wait. Anyone still remember the ZOG project at Carnegie Mellon? It was an early hypertext experiment that had instant links. They didn't have hypertext quite right (this was 1977) but they were right about the importance of speed. Another argument for a centralized system.
More on our experiments in avoiding/enhancing fielded searching later.
--Th
Hear, hear, on the controlled vocabulary issue! I'm delighted by the fact that we're finally getting to the point where systems (like RedLightGreen) can give users the benefits of controlled vocabularies while not requiring users to look terms up in books before searching an online system. More and more we're able to make better use of the syndetic structure of these vocabularies within our systems. We need more of that, and we need better relationships recorded within our standard vocabularies, but it's a start!
Posted by: Jenn Riley | August 31, 2005 at 09:45