Update: Python 3.1 is substantially faster. See this later post.
--Th
We run a lot of Python code and have been following the Python 3.0 development with interest. 3.0 is the new incompatible version of Python. The new version is still in testing with a final version expected in December 2008. It comes with a script (2to3) that can transform much of 2.x Python into 3.0 Python. I have run the script on small pieces of code, and for those it does all you could expect it to, which is a pretty good translation.Since one of the things we use Python for is processing large amounts of XML code(e.g. 120 million 3K chunks) , we worry about I/O speeds and XML processing speed (we use cElementTree that comes with Python for our XML support).
So I ran a small test. I took 10,000 XML records (79 megabytes). Read them in and parsed them using cElementTree's fromstring method.
The 2.5.1 code takes 3.8 seconds on my PC running 64-bit Vista.
In 3.0, opening the file with open('filename', encoding='utf-8') and passing the (Unicode) strings to cElementTree: 17.1 seconds (4.5 times as long as 2.5.1).
Opening the file as a binary file, and converting the input to a 3.0 string (strings are now Unicode) cut the run time from 17.1 seconds to 5.0.
Avoiding the conversion into a string by passing a bytearray to cElementTree cut the time down to 4.4 seconds.
We were disappointed in the speed of doing things in a standard manner, especially the time it took to read in UTF-8 data and return Unicode. Our 2.x programs are constantly worrying about whether a string is in Unicode or not and it would be great to just work in Unicode. But while we could probably live with slightly slower code to get that, when you have a program that runs all night, you don't want to see it take four times as long to run.
Maybe the final release will be faster.
Michael Watkins seems to have had quite a different experience. My timings on cElementTree in 3.0 show it parsing XML just slightly slower than in 2.5.1.
--Th