Ralph LeVan, Jenny Toves and I recently published an article in D-Lib Magazine on parallel text searching. This is mostly Ralph's work, looking at how we could get fast (>100 searches/second) searching on our 60 million record database using commodity hardware. As a bonus, Ralph was able to get it working using standard protocols.
The trick, of course, is dividing up the work to run across our 24-node/48-cpu Beowulf cluster. Dividing up the searching wasn't too hard, although making sure that ranking can still be done quickly across the pieces takes some thought. It took a little more work to figure out how to divide the workload in aggregating the results of those searches (we ended up with two levels of aggregation). Another lesson was that, with care, XML can be used as an internal protocol. Some of the care was to use SRU to avoid the extra XML overhead of SRW.
This is all written in Java and available (in its current incomplete state) as open source software.
Related entry: Mapreduce and web services