Tuesday, May 26, 2009

OR09: NSF Datanet-curating scientific data

Open Repositories 2009, Day 1, session 3. NSF Datanet-curating scientif data, John Kunze and Sayeed Choudhury.

The first non-split plenary (why a large part of the first two days consisted of 'split plenaries' baffled me, and I was not the only one).

Two speakers, two approaches. First John Kunze from UCDL, focussing in the microlevel with a strategy of keeping it simple. "Imagining the non-repository", "avoid the deadly embrace" of tight standards: decouple by design, lower the barrier of entry.

One of the ways to accomplish this is by staying lo-tech: instead of fullblown database systems, use a plain file system and naming conventions: pairtree. I really like this approach. I've worked in large digitization projects with third parties delivering content on harddisks. They bulk at databases and complicated metadata schemes, but this might just be doable for them. Good stuff.

CDL has a whole set of curation microsystems, as they call it. I'm going to keep an eye out for this.

The second talk, by Sayeed Choudhury (Johns Hopkins), focussed on the macro level of data conservancy. This was more abstract, and he started out with the admission that "we don’t have the answers, there are unsolved unknowns - otherwise we wouldn’t have gotten that NSF grant".

Interesting: one of the partner institutions (not funded by NSF) is Zoom Intelligence – a venture capital firm, interested in creating software services on research data. First VS's bought into ILS, now they pop up here... we must be doing something right!

Otherwise, the talk was mostly abstract and longer term strategy.

No comments: