« November 2005 | Main | January 2006 »

Opening software up

Opensource110x95_1 One of the things we do here in OCLC Research is develop software.  Much of this is 'throw-away,' in that it is so specialized we would never consider sharing it, but some packages are of interest to others and we need to decide whether and how to share them.

Over the last few years, the first question has become 'will we make it open source?'  There are at least two levels that need to be considered in answering that, either one of which may trump the other.

The first level is at the developer/code level.  Who is interested in using the software?  Are we interested in getting the software ready to share (it will take at least minimal documentation and packaging)?  Are the developers interested in supporting an open source project?  They need to be both able and interested in outside users' needs, suggestions, and submittals.  From a long-term maintenance point-of-view, does it look like it will be easier or more work to have outside users and contributors?

The next level is a more corporate view.  What does the cooperative gain by making the software available?  Will it be picked up by for-profit corporations in competition?  Will it open up new opportunities?  New partners?  Will it make it easier for us to work with others, or more difficult?  Will we be forgoing possible revenue?  Will others be profiting from our software in ways we wouldn't have, and will that bother us?  What will the benefit be for our members?  Can we put a dollar amount on the potential costs and benefits?

I've written before about some of our thoughts on software licenses.  In the interests of making our software as useful as possible, we've decided to start using the Apache License, Version 2.0.  The Apache license offers very few restrictions on what can be done with the code and is well known, so users of our software won't have to read a license specific to only our code.  We'll be re-releasing some of our older code under this new license.

A good book on the subject of software licenses is Andrew M. St. Laurent's Understanding open source and free software licensing.  Also, a link noticed in Ongoing about limiting the proliferation of open source licenses.

Thanks to Eric Childress for contributions and thoughts about this.

--Th

Work in progress

Liveddc2 I've been talking about 'live' or 'quick' search for a while now.  I finally got a demonstration (or maybe here if you have trouble accessing non-standard HTTP ports.  There's another file you might recognize here).  I'll describe that a bit, but really more interesting is what we might do next.

What you should be able to see is an index to all the records that a large public library holds in WorldCat.  We've extracted all the 5-word phrases from authors, titles, statement of responsibilty, and subject fields.  It's a bit of trick to get the right phrase from the right manifestation from the right work to display.  We get the speed by loading all the information into memory in several flat files, and generating the screens from those.

The screen shot above, though, shows a prototype we're working on that's quite different.  For one thing, most of the information is in a Pears database, although we've done a lot of precoordination to reduce retrieval time.  That's working well, even though there's a lot more going on with SRU replies in XML being converted to HTML with XSLT.  More important though, is the categorization we're trying to display on the left.  A couple of weeks ago Lorcan suggested to Diane and I that we should try to combine the features of the Dewey Browser and Live Search.  So, we've been thinking about it, and this is what we've come with so far.

As we decide what citations to display, we compute the most popular DDC categories associated with all the records that match the search.  The plan is that the user can then interact with the subject categorization much like the Dewey Browser works, except that the most popular records will display as you click on the captions.

We'll see if we can make all this work.  In addition to unaddressed interface issues, there are some real problems scaling this to 60+ million records while keeping the interaction 'live'.  Jenny is working on this, though.  Thanks, too to Ralph for database support.

--Th

Fall presentations

Shake In case you missed Lorcan's post about presentations some of us in OCLC Research have done lately, I thought I'd point out a couple of mine.

In October I had the opportunity to speak to the Swedish Library Association about New Approaches to the Catalog.  Stockholm is a great city, I got a glimpse of some of the Noble committee at a restaurant we ate at, and had a chance to talk with the Royal Library about some of their FRBR plans.

In November I gave a talk to the FedLink Fall Members Meeting at the Library of Congress on The Future of the Library Catalog: Open, Interactive, Participatory.  There's some overlap between the presentations.  In general I tried to stay with activities that my group here is involved in, more than what the rest of the world (or even OCLC) is doing.

--Th

Virtual hosting

Openhosting I wrote some time ago about the Sun Grid service.  The prices they were quoting seemed competitive with the actual use we were getting out of our Beowulf cluster.  Over the last few months, however, several groups at OCLC are experimenting with the cluster, so the in-house cluster might be ahead on cost, and at the time the Sun Grid was more promise than fact.  But last week I went ahead and set up a 'virtual server' on a commercial service.

I've been trying to get a 'quick' or 'live' search demonstration up for some time now, but because of various problems with our main Research Linux machine in terms of capacity and stability, it's taken longer than I wished.  Another problem is that I tend to prototype this sort of thing in Python using Python's simple built-in HTTP server.  Our security people like to know what software is exposed to the outside world, which pretty much restricts us to using Apache or Apache Tomcat for HTTP.  I see their point, but it does present a barrier for making something public quickly.

So, last week I decided to look into Python hosting services.  Some of these basically offer Zope or Plone support, but I was attracted by the more general service of renting a virtual server.  For a modest cost (looks like my costs will be in the $30-$50/month range) they give you root access to something that looks like your own server.  I can run anything I want with no fear that I'm putting OCLC's servers at risk.  Within a couple of hours of deciding to try it, I had an application up and running.  Of course, along with this freedom came the pleasure of playing sys-admin on my virtual server, but I might be able to convince someone else to do that for me.

Looking around, there are quite a few of these services (I found some links off of Python.org).  Most of them offer packages with set prices for a certain amount of disk, memory, bandwidth and cpu time, but OpenHosting.com bills on the basis of what you use.  Since my current use is very high on memory use, but low on everything else, that suited me better.  And if I run out of memory on this server, I can ask for another one.  Support is pretty good here at OCLC, but they have a production mindset and like to schedule things in an orderly way.  For experimental services, renting resources as needed might be easier.  We'll see, but I call this first try a success.

--Th

My Photo

July 2008

Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31