Tuesday, May 04, 2010

Notes from CNI Spring meeting 2010

I was fortunate to attend the CNI Spring 2010 Task force meeting in Baltimore, USA. This was my second time at a CNI, the first one being 2007. Compared to my previous experience, it struck me how policy has come to dominate the program, where it used to be technology. Maybe it’s because the direction where we’re heading is clear - complex objects, enriched publications, open access - and the question is now how we to get there.

Because the fragmented setup of research and academia in the US differs greatly from the situation elsewhere, this made the meeting more US-centric, which was a tad disappointing. However, it remains an interesting, intense pressure-cooker, of which afterwards it’s hard to believe it barely lasted a day and a half. Worth the jetlag.

Two sessions stood out for me. First one was a presentation by Jane Mandelbaum from the Library of Congress on a collaboration with Stanford Institute for Computational and Mathematical Engineering (iCME), to create “Metadata remediation tools” (great name!): generating summaries, short titles and geographical data from wads of text.

iCME is located in Silicon Valley, has close ties with companies there - Google, Yahoo, and small start-ups - and deals primarily with algorithms to understand text, especially with taxonomies. (which seems to be exactly what Google is trying, too, according to Steven Levy’s april 2010 article in Wired).

Interesting, as we’ve tried this in my organization, and failed miserably. This was made to work, though it took two years (!) to iron out the wrinkles between two very different cultures.  Also, it’s not an equal partnership; most of the coding takes place in summer jobs, paid for by LoC. Main reason is the nature of LoC’s metadata, in which collections exist that differ greatly but are internally consistent, which makes them good candidates to refine algorithms on.
Results for LoC: apart from the code (rough around the edges, scripts rather than applications) and the generated geographical and other metadata, insight in the usefulness and value-for-money of metadata.

Software via the projectsite: http://cads.stanford.edu/

Example of unexpected results, visualization of keyword patterns: http://cads.stanford.edu/lcshgalaxy/more.html

An incubator-approach, outside regular channels, to quickly respond to trends. This presentation struck a chord with the audience, at moments there was an audible roar of keypresses as dozens of people typed in notable phrases in their twitter, blogging clients or notepads. One of those was when a quote from The Simpsons’ Krusty the Clown came up: "It's not just good, it's good enough!", another was the motto “there is no blame in trying something that doesn't work”. Clearly those struck a chord.

I like the setup: a small group, consisting of staff from all departments, including circulation and rare books, that spend max 5% of their time. Membership is limited to two years. The group runs 3-5 risky projects, categorized as “from trivial to easy”.

Examples: putting PD image collections on flickr and youtube, POD books from those flickr streams with Blurb, maintaining Wikipedia pages, iPhone app (made by a CS student). For mobile devices they use Siruna. Some projects were successful, some not. When projects finish succesfully, they are transferred to the regular organization; if that doesn’t work, they are killed off rather than letting them languish or peter out, as that would be discouraging.

Very pragmatic and useful - and worth copying!

Finally, the lively Twitter traffic is archived at twapperkeeper.com/hashtag/cni10s