As I mentioned in my last post about the DDC Browser, we've gone through a lot of versions. Partly this was to make the interface work better, but much of it was to get the server processing right. After an initial text-only prototype, all of the interfaces have used HTML (possibly generated from XML) for display. This is really the only way to go, and GMail has convinced me that there is precious little that can't be done now in standard Web browsers and ther is little justification for sending out anything but XML from your HTTP servers for an application like this.
Here are the main architectural stages the interface has gone through:
- Text-only
- Simple HTML server
- HTML server using XMLHTTPRequest and JavaScript
- XML server with XSLT in 4 iframes
- Pseudo-SRU server in 30+ iframes
- 3-level server (back to 4 iframes)
All of these worked (except maybe the 30+ iframe version). We like the 3-level server best though. First I'll explain what the three levels are and then why we like it so much. Harry Wagner has been doing all the Apache/Tomcat side of this.
As an aside, I've been doing all the prototyping using Python and its built-in HTTP server. There's not much to Python's default HTTP server, but once you get a skeleton class that can be modified at need, it's easy to bring up new web services, and you have complete control over what happens in that server.
Here are the stages:
- File Server
- DDC Consolidator
- SRU Server
Each of these is an HTTP server. Stage 1 is really just an HTTP server that is smart enough to send things other than simple file requests on to the next stage. You might be able to do this in the configuration files of your favorite HTTP server.
Stage 2 does XML transformations on XML supplied by stage 3. One of the main transformation is to take 10 searches against stage 3 and group them into a single XML record. This could be done with XSLT. Jeff Young has developed the XSLTProc which could probably do this, although we the current version is done with Python XML-DOM tools.
Stage 3 is just a standard SRU server with the indexes needed to do searches on DDC numbers and other terms in the records.
That's it. Really hardly any code, either in the server or the browser. The largest file is probably the main XSLT style sheet and that is only about 200 lines.
What do we like?
You've got to like the 'hardly any code' aspect of this. Another advantage is that once I got the system broken into pieces, different people could work on the different stages, and specialized tools, like an SRU server, can be applied to replace what were previously special-purpose modules. The whole system has become much more scalable, using a standard HTTP server to server up files, and an SRU server sitting on top of a Pears database that we know can get 50 millisecond response times on 50 million record files under load.
In the past it would have taken months to code an interface like this. Well, actually this did take a couple of months, but only part-time, and most of the time was spent trying new approaches, not just implementing a single approach. Diane Vizine-Goetz wants to do some useability tests and I'm sure that will result in interface changes. So far the testing has been mostly what Joel Spolsky calls 'hallway usability testing'. I generally try to make a change the second or third time I see a problem.
Ralph Levan is responsible for the Pears and SRU software, as well as the one that said 'since it's all XML, couldn't we just run this against SRU?' that got us to the current 3-level architecture.
--Th
Have you guys looked into an XML DB like eXist? The latest MODS demo bundled with it has some added Ajax features, and eXist is simply excellent, with integrated support for XQuery and XSLT, a RESTful interface, etc.
http://demo.exist-db.org/exist/mods/biblio.xq
There are a few different projects working with eXist and MODS, and I'm starting to like the idea of a core set of XQuery modules (and later included SRU support) that different projects could use.
Posted by: Bruce | April 12, 2005 at 10:57