Sunday, March 02, 2014

IDCC14 notes, day 1: 4c project workshop

Part 1 in a series of notes on IDCC 2014, the 9th International Digital Curation Conference, held San Francisco, 24-27 feb.

In stark contrast with the 'European' 2013 edition, held last year in my hometown Amsterdam, at this IDCC over 80% of the attendees were from the US. That’s what you get with a west coast location, and unfortunately it was not made up by more delegates from Asia and down under. However as the conference progressed it became clear that despite the vast differences in organisation and culture, we’re all running into the same problems.

IDCC 2014 Day 1: pre-conference workshop on 4C project is an EU financed project to make a inventory of the costs of curation (archiving, stewardship, digital permanence etc.) With a 2 year project span it’s relatively short. The main results will be a series reports, a framework for analysis (based on risk management) and the ‘curation cost exchange’, a website where (anonimized) budgets can be compared.
The project held a one-day pre-conference workshop “4C - are we on the right track?” at which a roadmap and some intermediate results were presented, mixed with more interactive sessions for feedback from the participants. It didn’t always work (the schedule was tight) but still it was a day full of interesting discussions.
Neil Grindley noted that since the start of the project the goal has shifted from “just helping people calculate the cost” to a wider context. Beyond the actual cost (model) of curation: also the context, risks management, benefits and ROI. ROI is especially important for influencing decision makers, given the limited resources.

d3-1 - evaluation of cost models and needs gaps analysis draft

Cost models

Cost models are difficult to compare and hard to do. Top topics of interest: risks/trustworthiness, sustainability and data protection issues. Some organizations are unable or unwilling to share budgets. Special praise was given to the Dutch Royal Library (KB) for being a very open organisation for disclosing their business costs.
The exponential drop of storage costs has stopped. The rate has fallen from 30-40% to at most 12%. It is impossible to calculate costs for indefinite storage. This lead to a remark from the audience: "we're just as guilty as the researchers really, our horizon is the finish of the project.” We have to use different time scales - you have to have some short time benefits, but also keep the long term in scope.
However, costs are much more than storage. Rule of a thumb: 1/2 ingest, 1/3 storage, 1/6 access. Preservation and access are not necessarily linked. Example is the LOC twitter archive which they keep on tape. Once (if) legal issues currently prohibiting opening this archive are resolved, access might be possible via amazon’s 'open data sets' where you pay for acces by using EC2. The economics work because amazon keeps it on non-presistent media and provides access, and LOC keeps it on persistent media but no access.

Other misc notes

A detailed mockup of the cost exchange website was demoed and if all the functionality can be realized, this may be a very useful resource.

The workshop included a primer on professional risk management, based on ISO 31000 standard. “Just read this standard, it's not very boring!”. Originally from engineering, risk management is now considered mature for other fields as well. 

German Nestor project, really clear definitions on what a repository is, a useful resource comparable to the JISC reports:

Open Planets Foundation - great tools.

CDL DataShare is online - a really nice, clean interface.

