« May 2007 | Main | July 2007 »

ALA annual

Ala I always try to spend some time in the exhibits when I go to ALA.  It's fun to see the page-turning robots and book sorters.

But what I was struck by this year is not what was new, but what hasn't changed, and I've been going to ALA on and off since 1970.  The exhibits are larger than they used to be, Google and Microsoft show up now, and Google Book Search gets a lot of attention in the meetings, but most of the exhibits haven't changed much at all.  It's still mostly publishers and systems vendors and furniture makers and shelving systems and check-out systems.  Lots of these have some new technology associated with them, but it all seems like evolution not revolution.  The physical is more evident than the electronic, although maybe that's the nature of such a physical exhibition.

--Th

Asymmetric advantage

Asymmetric Jon Udell has a post about page popularity in Web search engines.  Currently a search for 'Jon' (at least if English is your preferred language) brings him up as the second and tenth entries.  So, I tried searching for 'Thom';  I come up 23rd (OCLC) and 25th (this blog).  Not too bad, but not the first page either.

Lessons that can be drawn from this:

  1. Jon Udell is more popular than I am
  2. An unusual name, or failing that, spelling helps (I'm not in the first 300 for 'Thomas', but am the 8th 'Hickey' today)
  3. The Web is a strange place

I'm certainly not the 8th most important Hickey in history, but I've been doing things on the Web for a long time, OCLC has a decent Web presence and I have a blog that I occasionally post to.  What is asymmetrical about this is that there are lots of corporations much larger than OCLC that have important people named Hickey, but they are just about invisible.  As the Web becomes more and more the way people get information, this can't be helping those companies.

I've noticed the same phenomenon when people leave OCLC and go somewhere else.  Their Web presence completely disappears if they go to a for-profit, and often goes down even if they are at an academic institution.

--Th (currently ranking #1 on Google for 'Outgoing')

WorldCat lists

Worldcat32 WorldCat has a new feature that lets you create lists of bibliographic items.  This first version has a few glitches and there are lots of features that we expect to add, but that doesn't mean that people aren't doing interesting things with it.  Look for example at a list done by Daniel Cornwall on Oddly titled Government Documents.  Actually, Daniel has several lists worth looking at.

There's an internal contest at OCLC to create lists, but I'm not sure we can beat our real users.

--Th

Thanks to Stu Weibel who suggested this was worth blogging about.

Identities beta

Hm_logo We have posted a page describing WorldCat Identities and the beta test of it that RLG Programs organized.  We think this is a good example of the sort of interaction we can expect between our two groups (the Office of Research and RLG Programs).

At least slightly related to this, we think we have resolved all the issues on the server that Identities lives on, so it and many of our other services (such as the NDLTD OAI catalog and info-URIs) should soon be back to normal.

--Th

Bibliographic statistics

Bibstats_2Occasionally we extract statistics from WorldCat about the usage of MARC fields.  The earliest published version is dated 1981, but there was at least one internal version before that.  The program has undergone several translations as our computer systems have evolved.  The runtime of the latest Python version was approaching a week as the database got larger, but Jenny Toves has reconfigured it so that it now runs in parallel in less than an hour.  The final output is an Excel spreadsheet which makes it a bit easier to look at the numbers.

Many of the figures in the spreadsheet are weighted by library holdings.  That gives one a better idea of how fields are used in a typical library than the unweighted numbers, which can be significantly different.  Record lengths were calculated as if the record was stored in MARC Communications Format using UTF-8 for the character set.

The Excel file has a Summary table, a table for all the records, and then separate tables for nine different formats (maps, serials, etc.).  Here's a brief explanation of the column headings in the file:

  • tag -- MARC21 tag
  • occ -- number of unweighted occurrences of the field
  • prec -- percent of records that have that field
  • wtocc -- weighted occurrences
  • wprec -- weighted percentage of records with the field
  • occRec -- occurrences/record (unweighted)
  • lenOcc -- length/occurrence (unweighted)
  • sub -- MARC21 subfield code
  • subocc -- number of unweighted occurrences of the subfield
  • subwtocc -- weighted occurrences of the subfield
  • suboccRec -- occurrences/record (unweighted)
  • sublenOcc -- length/occurrence (unweighted)

Of course there are lots of different statistics that can be drawn from 80+ million records.  Bill Moen analyzed a copy of WorldCat and has published some statistics and conclusions from it (see Assessing Metadata Utilization... and Examining MARC Records as Artifacts...).

--Th

My Photo

May 2008

Sun Mon Tue Wed Thu Fri Sat
        1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31