Saturday, May 30, 2009

OR09: Repository workflows: LoC's KISS approach to workflow

Open Repositories 2009, day 2, session 6b.

Leslie Johnston (Library of Congress)

My summary:

A practical approach to dealing with data from varying sources, keep it as simple as possible, but not simpler.
The ingest tools look very useful for any type of digitization project, especially when working with an externel party (such as a specialized scanning company).
The inventory tool may be even more useful, as lifecycle events are generally not  well covered by traditional systems, be it CMS or ILS.

Background

LoC acts as durable storage deposit target for widely varying projects and institutions. Data transfers for archiving range between an usb stick in the mail to 2Tb transferred straight over the network. The answer to dealing with this: simple protocols, developed together with uc digilib (see also John Kunze).

Combined, this is not yet full a repository, but it covers many aspects of ingest and archive functionality. Rest will come. Aim: provide persistent access at file level.

Simple file format: BagIt

Submitter is asked to describe files it in BagIt format. 

BagIt is a standard for packaging files; METS files will fit in there, too. However, BagIt wascreated because we needed something much, much, much simpler. It’s not as detailed; description is a manifest, it may omit relationships, individual descriptions, etc. It is very lightweight (actually too light: we’ve started creating further profiles for certain types of content).

LoC will support Bagit similarly and simultaneously to MODS & METS.

Simple tools

Simple tools for ingest:
- parallel receiver (can handle network transactions over rsync, ftp, http, https)
- validator (checks file format)
- verifyit (checksums files)
These tools are supplied as java lib, java desktop application, and LocDrop webapp (prototype for SWORD ingest).

Integration between transfer and inventory is very important: trying to retrieve the correct information later is very hard.

After receiving, inventory tool records lifecycle events.
Why a standardized tool: 80% of workflow overlap between projects.


All tools availble open source [sourceforge]. What's currently missing will be added soon.

No comments: