« December 2005 | Main | February 2006 »

Twisted

Twistedlogosmall Although I try to do my Python programming with out-of-the-box (batteries included) Python, there are a lot of excellent Python libraries that are not included in the standard distribution.  XSLT support comes to mind as an obvious omission.  Python does come with a simple HTTP server which I've used extensively, but there is no getting around that its capabilities are fairly limited.  A couple of years ago we looked at the Twisted framework from Twisted Matrix Laboratories.  It looked good and seemed to work, but was obviously still under development and a bit hard to understand.

Well, Twisted is still under development, but it's become much easier to understand, since there is now a book (Twisted Network Programming Essentials) about it.Twistedcover_1

Twisted is more than just HTTP, it's a framework for asynchronous network programming in Python, but all I've used is the HTTP support, which is pretty good.  One of the problems with Python's BaseHTTPServer is that it isn't asynchronous.  Unless you go to the trouble of adding threads, requests are processed one at a time.  Twisted's approach to this is Deferreds.

Deferred objects are quite easy to use.  As an example, calling the twisted.web.client.getPage method with a URL returns a Deferred to which you can add call-back methods for success and failure.  When the requested page is ready it gets passed to the appropriate call-back for processing.  In my application I am accepting SRU searches, turning them into calls to a back-end SRU server, and then consolidating the back-end responses into my response.  Plus, of course, being a standard HTTP server responding to file requests (twisted.web has a static.File class that neatly encapsulates the expected functionality).

The twisted.web part of Twisted is being replaced by twisted.web2.  I haven't tried it out, mainly because twisted.web is more stable and what Abe Fettig's book describes.  Twisted claims to be able to support quite high traffic loads, and nothing I've seen argues against that.  Moving to a package like this gives us lots of new features, such as a standard way of creating daemon processes, Apache-like logging and, I hope, a robustness that would take a lot of work to achieve otherwise.  It took me a couple of days to learn enough Twisted to get my 200-line server ported, which ended up slightly shorter with more functionality.

As expected in Python, everything worked the same under both Windows XP and Linux, although running the scripts provided is easier in Linux.  Fettig's book is very good, although the constant repetition of the headings 'How Do I Do That?' and 'How Does That Work?' gets a little wearing.  I do wish it was a little longer.  I think by the time he got to Chapter 11 (Services, Processes, and Logging) the author may have gotten a little tired and the chapter is too short.  The documentation on the Twisted site leaves much to be desired -- I was glad I had the book next to me just to get it installed on Linux (as usual, the Windows installation was more automated).

--Th

Update (2006 February 6)

Since I wrote this I ran into this review of the book by someone that really knows Twisted.  The reviewer makes it very clear that twisted.web is not the way to write Web applications in Twisted.  With that disclaimer, here's the code I wrote.  It won't run for you as it depends too much on my environment, and it's a bit of a hack, but it might be of interest to David (see comment below).

Code size

We're getting close to being able to demo 'live' search integrated with DDC categories displayed as you type.  Jenny, Ralph, and I have probably been through a dozen different ways to structure the database, but we're getting closer and should have a million record database working with a few more days work.  Balancing the precoordination needed to be fast enough for interactive display with build times, disk space, and main memory has been more complicated than for any project I can remember.

We've been chipping away at this for a couple of months now, but it hasn't resulted in a lot of code.  Here's a breakdown of what we've ended up with:

LanguageLines
Python run-time 200
Python build-time 400
JavaScript 70
CSS 120
XSLT 215
XML 45
DB (Pears) Config 100
Total1,150

Some of these could certainly be slimmed down a bit, but the point is we can put together a fairly sophisticated retrieval system with only about a thousand lines of code.  That's remarkable.

Of course that thousand lines depends on a lot of other code, such as a database system that has all the features we needed (Pears), code to add DDC numbers to records that don't have them, FRBR code, and the whole Web infrastructure that makes the run time work.  But we didn't have to write any of that specifically for this, it was already there.

It seems like it's taken a long time, but over the last few years it has become much easier to build new applications because of a combination of advances in both hardware and software.  As main memory gets as cheap as disk is now, we'll see a similar increase in productivity as we spend less time worrying about managing data across that boundary.

--Th

My Photo

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31