64-bit Python
We've been using the beta version of Python 2.5 on some new 64-bit Linux boxes compiled for 64-bit addresses. This means no more two gigabyte limits (actually we were seldom able to use much more than one gigabyte before). The other day we actually created a 4,000,000,000 byte string. The machines actually have six gigabytes of memory on them, so our attempt at creating an 8 billion byte string (by making the string Unicode) looked like it was going to work, but took longer than we were willing to wait. Older versions of Python have been compilable as 64-bit, but 2.5 looks like it has fixed a lot of 64-bit related bugs.
Being able to read larger files directly into memory is already changing some of our approaches to name-matching problems. Four million bibliographic records fit into memory with room to spare.
One of the reasons we started using Python 2.5 before its expected final release in August is that it includes SQLite. SQLite does have some limitations, but on one of our workstations we have been able to load all 60 million records of a January copy of WorldCat into a table indexed by OCLC number very quickly (about 6,600 records/second). SQLite runs as an embedded database, that is each application using a SQLite database directly opens the files up (most databases run as servers and incur interprocess communication overhead). This is also where we ran into one of its limitations: at least on our Linux nodes this doesn't work if the file is NFS mounted.
In general, though, we like it. SQLite looks like it a reasonable alternative if you want an embedded database, and probably your best alternative if you are also working in Python. Don't forget to do a 'commit' before you exit!
--Th
How long did it take you to type that string?
Posted by: | May 17, 2007 at 18:44
I view the 'commit' issue as at least a documentation bug.
Posted by: Seun Osewa | June 16, 2007 at 05:24