Last November I did a short post about Python 3.0's speed reading in some XML and parsing it, finding it quite a bit slower than 2.5. We haven't started switching to 3.0, partly because of that, but I noticed in some recent release notes that the I/O in 3.1 has been redone to increase its speed. So, I thought it was time to redo the test.
I couldn't find the old code and test file, but the idea was pretty simple: read in chunks of UTF-8 encoded XML and have CElementTree parse it. I'm happy to report that Python 3.1.1 does this faster on my 64-bit Vista machine than Python 2.5.2.
For testing I just read in 16 lines of XML that made up a 217 MByte file. In 3.1.1 I indicated the encoding during file opening as 'UTF-8' and passed the resulting (Unicode) strings directly into CElementTree's parsestring method. In 2.5.2 the text was read in as a standard string and passed unchanged to parsestring (as UTF-8).
Python 3.1.1: ~15.3 seconds
Python 2.5.2: ~22.8 seconds
Of course we do a lot more with Python than just read in XML and parse it, but until we got past this basic speed issue it was hard to even think about converting to 3.X.
--Th
Comments