Library spring

IDCC14 notes, day 2: keynote Atul Butte

2014-03-06T17:52:00.002+01:00

Part 2 in a series of notes on IDCC 2014, the 9th International Digital Curation Conference, held San Francisco, 24-27 feb.

Day two kicked off with a fantastic keynote by Atul Butte, Associate Professor in Medicine and Pediatrics, Stanford University School of Medicine: Translating a trillion points of data into therapies,diagnostics and new insights into disease [PDF] [Video on Youtube]. This one was well worth a separate blogpost.

Butte starts his presentation with some great examples of how the availability of a wealth of open data has already radically changed bio/medial research. Over one million datasets are now openly available in the GeneChip standardized format. A search for breast cancer samples in NCBI Geo datasets database gives 40k results, more than the best lab will ever have in their stores. And PubChem has more samples than all pharma companies combined, completely open.

The availability of this data is leading to new developments. Butte cites a recent study that by combining datasets revealed ‘overfitting’, where everybody does an experiment in exactly the same way leading to reproducable results that are irrelevant to the real world.

But this is tame compared to the change in the whole science ecosystem with the advent of online marketplaces. Butte goes on to show a number of thriving ecommerce sites - “add to shopping cart!” - where samples can be bought for competitive prices. Conversant Bio is a marketplace for discarded samples from hospitals, with identifiers stripped off. Hospitals have limited freezer space, have biopsy samples that can be sold, and presto. What about the ethics? "Ethics is a regional thing. They can get away with a lot if stuff in Boston we can't do in Palo Alto." Now any lab can buy research samples for a low price and develop new blood marker tests. This way recently a test was developed for preeclampsia, the disease now best known from Downton Abbey.

Marketplaces also have sprung up for services, such as AssayDepot.com. This is a clearinghouse for medical research services, including animal tests. Thousands of companies provide these worldwide. Butte stresses that it's is not just a race to the bottom and to China, but that this also creates opportunity for new specialised research niches, such as a lab specializing in mouse coloscopies. Makes it possible to do real double blind tests by just buying more tests from different vendors (with different certifications, just to spread). This makes it especially interesting to investigate other effects of tested and approved drugs. Which is a good thing, because the old way of research on new drugs is not sustainable when patents run out (the “pharma patent cliff of 2018”).

This new science ecosystem is built on top of the availability of open data sets, but there are questions to be solved for the sustainability. Butte sees two players here, funders and repositories themselves.

Incentives for sharing are lacking. Altmetrics are just beginning, and funders need to kick in. Secondary use grants are an interesting new development. Clinical trials may be the next big thing. The most expensive experiments in the world, costing $200 mln each. 50% fails and not even a paper is written about them... Butte expects funders to start requiring publications on negative trails and publishing of the raw data.

The international repositories are at the moment mostly government funded and this may run out. Butte thinks that mirroring and distributing is the future. He also stresses that repositories need to bring the cost down - outsourcing! - and real show use cases, that will inspire people. The repositories that will win are the ones that yield the best research.

IDCC14 notes, day 1: 4c project workshop

2014-03-02T19:35:00.001+01:00

Part 1 in a series of notes on IDCC 2014, the 9th International Digital Curation Conference, held San Francisco, 24-27 feb.

In stark contrast with the 'European' 2013 edition, held last year in my hometown Amsterdam, at this IDCC over 80% of the attendees were from the US. That’s what you get with a west coast location, and unfortunately it was not made up by more delegates from Asia and down under. However as the conference progressed it became clear that despite the vast differences in organisation and culture, we’re all running into the same problems.

IDCC 2014 Day 1: pre-conference workshop on 4C project

4cproject.eu is an EU financed project to make a inventory of the costs of curation (archiving, stewardship, digital permanence etc.) With a 2 year project span it’s relatively short. The main results will be a series reports, a framework for analysis (based on risk management) and the ‘curation cost exchange’, a website where (anonimized) budgets can be compared.
The project held a one-day pre-conference workshop “4C - are we on the right track?” at which a roadmap and some intermediate results were presented, mixed with more interactive sessions for feedback from the participants. It didn’t always work (the schedule was tight) but still it was a day full of interesting discussions.
Neil Grindley noted that since the start of the project the goal has shifted from “just helping people calculate the cost” to a wider context. Beyond the actual cost (model) of curation: also the context, risks management, benefits and ROI. ROI is especially important for influencing decision makers, given the limited resources.

d3-1 - evaluation of cost models and needs gaps analysis draft

Cost models

Cost models are difficult to compare and hard to do. Top topics of interest: risks/trustworthiness, sustainability and data protection issues. Some organizations are unable or unwilling to share budgets. Special praise was given to the Dutch Royal Library (KB) for being a very open organisation for disclosing their business costs.
The exponential drop of storage costs has stopped. The rate has fallen from 30-40% to at most 12%. It is impossible to calculate costs for indefinite storage. This lead to a remark from the audience: "we're just as guilty as the researchers really, our horizon is the finish of the project.” We have to use different time scales - you have to have some short time benefits, but also keep the long term in scope.
However, costs are much more than storage. Rule of a thumb: 1/2 ingest, 1/3 storage, 1/6 access. Preservation and access are not necessarily linked. Example is the LOC twitter archive which they keep on tape. Once (if) legal issues currently prohibiting opening this archive are resolved, access might be possible via amazon’s 'open data sets' where you pay for acces by using EC2. The economics work because amazon keeps it on non-presistent media and provides access, and LOC keeps it on persistent media but no access.

Other misc notes

A detailed mockup of the cost exchange website was demoed and if all the functionality can be realized, this may be a very useful resource.

The workshop included a primer on professional risk management, based on ISO 31000 standard. “Just read this standard, it's not very boring!”. Originally from engineering, risk management is now considered mature for other fields as well.

German Nestor project, really clear definitions on what a repository is, a useful resource comparable to the JISC reports:
www.crl.edu/focus/article/394
www.langzeitarchivierung.de/Subsites/nestor/DE/Home/home_node.html

Open Planets Foundation - great tools.

CDL DataShare is online - a really nice, clean interface.

The EJME plugin: improving OJS for articles with data

2012-02-22T14:46:00.002+01:00

The EJME project has wrapped up and delivered! To quote the press release from SURFfoundation: "Enhanced Publications now possible with Open Journal Systems - Research results published within tried-and-tested system using plug-ins". That's all great, and so is the documentation, but aimed at those in the know already. A little more explanation is needed.

Who is EJME for?
Any journal that uses OJS for publishing and that wants to make it possible to have data files attached to articles (and as of December 2011, that's 11,500 Journals!).

What does it do?
Three things:

improves the standard OJS handling of article 'attachments': files are available to editors and peers during the review process, and the submission process has been made (a little) easier;
plays nicely with external data repositories: an attachment can be a link to a file residing elsewhere (but work just like an internal OJS attachment in the review and publishing stage), and an internal attachment that an author has submitted with the article can also be submitted to a data repository, creating a 'one-stop-shop' experience for the author;
on publication, it automagically creates machine-readable descriptions of an article and its data files (in tech-speak: these are small XML files, so-called Resource Maps, in the OAI-ORE standard). These can be harvested by aggregators such as the Dutch site Narcis that can then do more great and wonderful things with it, for example slick visualizations.

Great, but I only want some of that!
That's perfectly possible. If you want only improved handling, they're included in the latest OJS version. The other two are in separate plug ins, install only what you need. Though I do recommend to install the resource map plug-in, it won't require any work after installing.

What does it cost?
Just like OJS itself, the plug-in is open source and free of cost. Installation is as easy as most OJS plug-ins.

What does the journal have to do?
Of course, software is only a tool. The real question is deciding what to do with it. Does the journal want a mandatory Data Access policy? Is there a data repository in the field to cooperate with? Once these questions are answered, the journal policy and editorial guidelines will need to changed to reflect them.

Why would my journal want data along with articles?
As science becomes more and more data-oriented (and that includes the humanities), publishing data along with articles becomes essential for the peer review system to function. There have been too many examples lately of data manipulation that would have been found out by reviewers if they would have checked the data. And for that, they need access to the data. Reviewers of course won't change their habits suddenly once data is available to them, but it's a necessary first step.
(There are many other reasons, both carrots and sticks, for the greater good or the benefit of journal and author, but IMHO this is the pivotal point).

Q: Why name it EJME, such a silly name?
Enhanced Journals Made Easy was a little optimistic, I admit. Enhanced Journals Made (A Little) Easier would have been better. You live and learn!

Want to know more about EJME? Get started with the documentation.

OR11: Misc notes

2011-07-02T09:21:00.001+02:00

I like going to conferences alone, it’s much easier to meet new people from all over the world than when you’re with a group, groups tend to cling together. With a multitracking conference like OR11 however, the downside is that there’s so much to miss. Especially since I like to check out sessions from fields I’m not familiar with. At OR11, I wanted to take the pulse of DSpace and Eprints, and not just faithfully stick with the Fedora talks.

In this entry, I focus on bits and bobs I found noteworthy, rather than give a complete description. I skip over sessions that were excellent but have already widely covered elsewhere (for instance at library jester) such as Clifford Lynch closing plenary.

“Sheer Curation” of Experiments: data, process and provenance, Mark Hedges
slides [pdf]

"Sheer curation" is meant to be lightweight, with curation quietly integrated in the normal workflow. The scientific process is complex with many intermediate steps that are discarded. The deposit at the end approach misses these. Goal of this JISC project is to capture provenance experimental structure. It follows up on Scarp (2007-2009).

I really liked the pragmatic approach (I've written this sentence often - I really like pragmatism!). As the researchers tend to work on a single machine and heavily use the file system hierarchy, they wrote a program that runs as a background process on the scientists’ computer. Quite a lot of metadata can be captured from log files, headers, filenames. Notably, it also helps that much work on metadata and vocabulary has already been done in the field in the form of limited practices and standards.

Being pragmatic also means discarding nice-to-haves such as persistent identifiers. That would require the researchers to standardise beyond their own computer and that’s asking too much.

The final lesson learned sounded familiar: it took more, much more time than anticipated to find out what it is the researchers really want.

SWORD v2

SWORD2: looks promising and useful, and actually rather simple. Keeping the S was a design constraint. Hey, otherwise we’d end up with Word, and one is more than enough!

Version 2 will do full Create/Read/Update/Delete (CRUD). Though a service can always be configured to deny a certain actions. It’s modelled on Google’s Gdata and makes an elegant use of Resource Maps and dedicated action URLs.

CottageLabs, one of the design partners, made a really introduction video to Sword v2 demonstrating how it works:

It looks really useful and indeed still easy (as per Einstein's famous quip, as simple as possible but not simpler). If you’re a techie, dive into SwordApp.org. If you’re not, just add Sword compliance to your project requirements!

Ethicshare & Humbox, two sessions on community building

Two examples of successful subject-oriented communities that feature a repository, each with some good ideas to nick.

Ethicshare is a community repository that aggregates social features for bioethics:

one of the project partners is a computer scientist who studies social communities. Because of this mutual interest (for the programmer it’s more than just a job) they have had the resources to fine tune the site.
the field has a strong professional society that they closely work with.
glitches at beginning were a strong deterrent to success - so yes, release early and often, but not with crippling bugs!
the most popular feature is a folder for gathering links, and many people choose to make them public (it’s private by default).
before offering it to the whole site, new features are tried out on a small, active group of around 30 testers.
for the next grant phase they needed more users quickly, so they bought ads. $300 for Facebook ads yielded for 500 clickthroughs, $2000 Google ads 5000. This (likely) contributed to number of unique visitors rising from 4k to 20k per month. Tentative conclusion: these ads cost relatively little and are effective for such a specialized subject, the targeting is really quite good.

Lessons from the UK based Humbox project:

approach: analyse what scientists were doing already in real life, in paper and file cabinets, mimic it and extend it.
"the repository is not about research papers, it is about the people who write them": the profile page is the heart, putting the user at the centre. Like Facebook’s, it has two distinct views: an outside version about you (to show off), and internal version for you (with your interests). This reminds me of the success of the original, pre-yahoo delicious, which also cleverly put self-interest first with the social sharing as a side-effect.
Find a need that's not covered by existing systems: Humbox fills a need to share stuff, not just with students - for that the LCMS is the natural place to go to - but with colleagues, since the course-centric nature of LCMS’s tends to lock colleagues out.
Most feedback came from community workshops. Participants often became local evangelists.
Comments often were corrections. 60% of the authors changed a resource after a comment - and the 40% comments not leading to a correction also include positives, so the attitude towards criticism was quite positive.
over 50% of users modified or augmented material from others, sometimes reuploading it to the site.
Humbox only takes Creative Commons licenses, with an educational side-effect: some users indicated they also started looking in other places (such as flickr) for cc material as a result.

The Learning Registry: “Social Networking for Metadata”
slides [google docs]

I just want to mention this for the sheer scope and size of this initiative. It’s [explicative] ambitious.

The aim to gather all social networking metadata! To limit the scope, they won’t do normalising, or offer search or a query api, that's all left to the users of the gathered dataset. But all, they mean everything on the net: data, metadata and paradata (by which I understand they mean the relationships with other data).

Agreements are in the works with major partners (see last slide). The big elephant in the room was Facebook (no surprise, sigh) which wasn’t mentioned at all. (as I'm writing this, Google+ has just been announced, there is some hope after all of the slightly creepy evil eventually triumphing over the even more evil).

They call their approach a do-ocracy. Very agile design principles. Real-time everything in the open: all code and specs are written directly in Google Docs (table of contents, a google spreadsheet). NoSQL master-master storage system, well thought-out architecture, production will run on ec2. Everything will be open, except data harvested from commercial partners.

Something to keep an eye on: www.learningregistry.org.

Finally...

MODS is the new DC. In recent projects, MODS seems to have replaced Dublin Core as the baseline standard for metadata exchange. Interesting development.

OR11: New in EPrints 3.3: large scale research data, and the Bazaar.

2011-06-29T13:08:00.002+02:00

As I mentioned in the overview, I was very impressed by what's happening in the Eprints community. The new features of the upcoming 3.3 are impressive as they seem to strike the right balance between pragmatism and innovation. Thanks to an outstandingly enthousiastic and open developer community, they're giving DSpace (and to a lesser extend Duraspace) a run for the money.

"Energize":
could've been the motto of the Eprints community

Support for research data repositories

The new large scale research data is also a hallmark for pragmatic simplicity. EPrints avoid getting very explicit about subject data classification and control, taking a generic approach that can be extended.

Research data can come in two container datatypes, ‘Dataset’ and ‘Experiment’. A Dataset is a standalone, one-off collection of data. The metadata reflects the collection. The object can contains one or more documents, and must also have a read-me file attached, which is a human-oriented manifest, as, though machine-oriented complex metadata is possible, it would deter actual use.

The other datatype is Experiment. This describes a structural process that may result in many datasets. The metadata reflects process and supports the Open Provenance Model.

Where the standard metadata don’t suffice, one of the data streams belonging to the object can be an xml file. If I understood correctly, xpath expressions can then be used for querying and browsing. Effectively this unleashes the shackles of the standard metadata definitions and creates flexibility similar to Fedora. It's very similar to what we're trying to do in the FLUOR project with a SAKAI plugin that acts as a GUI for a data repository in Fedora. Combining user-friendliness with configurable, flexible metadata schemes is a tough one to pull off, I'll certainly keep an eye out on the way EPrints accomplishes this.

The Bazaar

The EPrints Bazaar is plug-in management system and/or an ‘App Store’ for EPrints, inspired by Wordpress. For an administrator it's fully GUI driven, versatile and pretty fool-proof. For developers it looks pretty easy to develop for (I had no trouble following the example with my rusty coding skills).

The primary design goal was that the repository including API must always stay up. They’re clever bastards: they based the plug-in structure on the Debian package mechanism, including the tests for dependencies and conflicts, which makes it very stable. Internally, they’ve run it for six months without a single interruption. Now that’s eating your own dog food!

Off the beaten track

EPrints as a CRIS

The third major new functionality of 3.3 is CERIF import & export. Primarily this is meant to link eprints repositories automatically to CRIS systems, but for smaller institutions that need to comply with reports in CERIF format but don’t have a system yet, using eprints itself may suffice as pretty much all the necessary metadata is in there. The big question is whether the import/export would allow a full lossless roundtrip, as I joined this session halfway (after an enthousiastic tweet prompted me to change rooms) I might've missed that.

This sounds very appealing to me. Unfortuntaly, the situation in the Netherlands is very different, as a CRIS has been mandatory for decades for the Dutch Universities. Right now we’re in the middle of an European tender for a new, nationwide system, and the only thing I can say is that it’s not without problems. How I’d love to experiment with this instead in my institution, but alas, that won't be possible politically

The EPrints attitude

As Les Carr couldn’t make it stateside, he presented it from the UK. The way this was set up was typical for the can-do attitude of the eprints developers: Skypeing in to a laptop which was put before a mike, and whenever the next slide was needed Les would cheerily call out ‘next slide please!’, after which the stateside companion theatrically reached out for the spacebar of the other laptop, connected to the beamer. Avoid neat technology for technology’s sake and keep it simple and effective.

OR11: opening plenary

2011-06-22T16:49:00.000+02:00

See also: OR11 overview

The opening session by Jim Jagielski, President of the Apache Software Foundation, focussed on how to make an open source development project viable, whether it produces code or concepts. As El Reg reports today, doing open source is hard. The ASF has a unique experience in running open projects (see also is apache open by rule). Much nodding in agreement all around, as what he said made good sense, but hard to put in practice. Some choice advise:

Communication is all-important. Despite all the new media that come and go, the mailing list still is king. Any communication that happens elsewhere - wikis, IRC, blogs, twitter, FB, etc - needs to be (re)posted to the list before it officially exists and can be considered. A mailing list is a communication channel which is asynchronous and participants can control themselves, meaning read or skip it at their time of choice, not the time mandated by the medium. A searchable archive of the list is a must.

Software development needs a meritocracy. Merit is built up over time. It’s important that merit never expires, as much open source committers are volunteers who need to be able to take time off when life gets in the way (babies, job change, etc).

You need at least three active committers. Why three? So they can take a vote without getting stuck. You also need ‘enough eyeballs’ to go over a patch or proposal. A vote at ASF needs minimally three positive votes and no negatives.
To create a community, you also need a ‘shepherd’, someone who is knowledgable yet approachable by newbies. It’s vital to keep a community open, so not to let the talent pool become too small. To stay attractive, that you need to find out what’s the ‘itch’ that your audience wants to scratch.

The more 'idealistic' software licenses (GPL and all) are "a boon firstmost to lawyers", because the terms ‘share alike’ and ‘commercial use’ are not (yet) clear in juridical context. Choosing an idealistic license can limit the size of the community for projects where companies play a major role. A commenter added that this mirrors the problems of the Creative Commons licenses. In a way, the apache license mirrors CCzero, which CC created to tackle those.

Open Repositories 2011 overview

2011-06-21T21:01:00.010+02:00

Open Repositories was great this year. Good atmosphere, lots of interesting news, good fun. It's hard to make a selection from 49k of notes (in raw utf8 txt!). This post is a general overview, more details (and specific topics) will follow later.

Texas State History Museu

My key points:

1. Focus on building healthy open source communities

The keynote by Jim Jagielski, President of the Apache Software Foundation, set the tone for much what was to come. An interesting talk on how to create viable open source projects from a real expert. The points raised in this talk came back often in panel discussions, audience questions and presentations later.
More details here.

2. The Fedora frameworks are growing up

Both Hydra and Islandora now have a growing installed base, commercial support available, and a thriving ecosystem. They've had to learn the lessons on open source building the hard way, but they have their act together. Fez and Muradora were only mentioned in the context of migrating away.
Also, several Fedora projects that don't use Hydra still use the Hydra Content Model. If this trend of standardizing on a small number of de facto standard CM's, that would greatly ease mixing and moving between Fedora middleware layers.

3. Eprints’ pragmatic approach: surprisingly effective and versatile

Out of curiosity I attended several EPrints sessions, and I was pleasantly surprised, if not stunned by what was shown. Especially the support for research data repositories looks to strike the right balance between supporting complex data and metadata types, while keeping it simple and very usable out-of-the box. And also the Bazaar, which tops Wordpress in ease of maintainance and installation, but on a a solid engineering base that's inspired by Debian's package manager. Very impressive!
More details here.

Texans take 'em by the horns!

Misc. notes
See part #3: Misc notes

Elsewhere on the web

OR11 Conference program, presentations.
Richard Davis, ULCC: #1 overview, #2 the Developers Challenge, #3: eprints vs. dspace.
Disruptive Library Technology Jester day 1, day 2, day 3.
Leslie Johnson - a good round-up with focus on practical solutions.
#or11 Tweet archive on twapperkeeper

Photosets: bigD, keitabando, yours truly, all Flickr images tagged with or11, Adrian Stevenson (warning: FB!).

Other observations

Unlike OR09, the audience was not very international. Italians and Belgians were relatively overrepresented with three and six respectively. I spotted just one German, one Swede and one Swiss, and I was the lone Dutchman. The UK was the exception, though many were presenters of JISC funded projects, which usually have budget allocated for knowledge dissemmination.

As OR alternates between Europe and the US, the ratio of participants tends to be weighed to the 'native continent' anyway. But the recession seems to be hitting travel budgets hard in Europe now.
As there were interesting presentations from Japan, Hong Kong and New Zealand, the rumour floating around that OR12 might be in Asia sounded attractive, I'd be very curious to hear more about what's going on there in repositories and open access. The location of OR12 should be announced within a month, let's see.

[updated June 27th, added more links to other writeups; updated June 28, added Hydra CM uptake]

2011-06-20T13:43:00.001+02:00

Catching up on old news, I came across an interesting presentation on CNI this spring on the Data Management Plans initiative. Abstract, recording of the presentation on youtube, slides.

DMP online is a great starting point (and one of the inspirations for CARDS) and this looks like the right group of partners to extend it into a truly generic resource. What's notable about the presentation is also the sensible reasons outlined for collaboration between this quite large group of prestigious institutions.All in all, something to keep an eye on.

Don't panic! Or, further thoughts on the mobile challenge

2010-10-05T17:16:00.000+02:00

Two weeks ago, I posted some notes on the CILIP executive briefing on 'the mobile challenge', where I presented the effort of my library, the quick-wins 'UBA Mobiel' project. Those notes concentrated on the talks on the day. Now that it's had time to simmer (and a quick autumn holiday), I want to add some reflection on the general theme.

Which basically boils down to Don't Panic (preferably in large, friendly letters on the cover).

Is there really such a thing as a 'mobile challenge' for libraries? Well, yes and no. Yes, the use of internet on mobile devices is growing fast, and is adding a new way of searching and using information for everyone, including library patrons. The potential of 'always on' is staggering. And it is a challenge.

However, it is also just another challenge. After twenty years of continuous disruption, starting with on-line databases, then web 1.0 and web 2.0, change is not new any more. Libraries are still gateways to information, rare and/or expensive (the definition of expensive and rare depending and varying on the context, also changing of course). And the potential of the paperless office may finally come to fruit with the advent of the iPad, but meanwhile printer makers are having a boon selling ever more ink at ridiculous prices.

So, what to do?

There are three ways to adapt. On one side are the forerunners, with full focus on the new and shiny. Forerunners get the spotlights, and tend to be extroverts that make good presentations. However, not everyone can be in front - it would get pretty crowded. It takes resources, both money and a special kind of staff. Two prominent examples given at several of the Cilip talks were NCSU and DOK Delft. Kudos to them, they're each doing exciting stuff, but they are also the usual suspects, and that's no coincidence.

On the other extreme, there's not changing at all. For the institution, a certain road to obsolescence. For a number of library staff the easy way to retirement. Fortunately, their number seems to be rapidly dwindling, but nevertheless, finding the right staff to fulfil the jobs at libraries or publishers when the descriptions of these jobs are in flux was a much talked about topic, both in the talks and in the breaks.

In practice, most libraries are performing a balancing act in between. And it is perfectly acceptable to be in the middle. Keep an eye on things. Stay informed. Make sure your staff gets some time to play with the toys that the customers are walking around with, and if they find out what's on offer in the library is out of sync, do something about it.

[from tuesday tech]

Which is pretty much what we did with UBA Mobiel. Nothing worlds hattering, not breaking the bank. We're certainly not running in front, but we're making sure our most important content (according to the customers) is usable. This way, when the chance comes along to do Something Utterly Terrific (Birmingham) or merely a Next Step Forward (upgrading our CMS) we know what to focus on.

The response on our humble little project has been very positive. We may have hit a nerve, and I'm really glad to hear that it is inspiring others to get going. Go-Go Gadget Libraries!

Becoming upwardly mobile - a Cilip executive briefing

2010-09-17T14:43:00.001+02:00

Cilip office in Bloomsbury, London

On September 15, Cilip (the UK Chartered Institute of Library and Information Professionals) and OCLC held a meeting on the challenge that mobile technology proves for libraries, called Becoming upwardly mobile Executive Briefing.

The attendees came from the British Isles (UK and Ireland). Some of the speakers however came from elsewhere. Representing The Netherlands, I presented the UBA Mobiel project as a case study, which went well.

The mere fact that I was asked to present our small low-key project - which in the end cost less than 1100 euro and 200 hours - as a case study along the new public library in Birmingham with a budget of 179 million pounds sterling shows how diverse the subject 'the mobile challenge' is.

Thus the talks varied widely, and especially the panel discussion suffered from a lack of focus. It was interesting nevertheless.

Attendees were encouraged to turn their mobiles on and tweet away, and a fair number of them did. See Twitter archive for #mobexec at twapperkeeper.

1. Adam Blackwood, JISC RSC

A nice wide-ranging introduction in a pleasant presentation, using lots of lego animation. In one word: convergence. To show what a modern smartphone can do, he emptied his pockets, then went on from a big backpack, until the table in front of him was covered with equipment, a medical reference, an atlas and so on. "And one more thing…". The versatility of the devices coming at us means not only that current practices will be replaced, but also that they are going to merge in unexpected ways. Reading a textbook online is a different experience from reading it on paper, for instance. Augmented reality (in the broad sense of the word, not just the futuristic goggles) is a huge enabler that we should not block by sticking to old rules (such as asking to turn devices off in the library or during lectures).

As for the demoes, it's a bit unfortunate that it always seem to be the same that are pointed to (NCSU, DoK), though they're still great. Using widgetbox to quickly create mobile websites was new to me, worth checking out further (the example was ad-enabled, hope they have a paid version, too).

All in all, a great rallying of the troops.

2. Brian Gambles, Birmingham

A talk about the new public library in Birmingham. An ambitious undertaking, inspired by amongst others the new Amsterdam public library. The new library should put Birmingham on the cultural map, and itself become one of the major touristic attractions for the city, opening in 2013. It's also meant to 'open up' the vast heritage collection (the largest collection of rare books and photography of any public library in Europe). And to pay for it, they'll have to monetize those as well.

A laudable goal, great looking plans, I wish them luck in these difficult times.

The library is not just the books (the new Kansas city library sends all the wrong messages). The mobile strategy comes forth from the general strategy: open up services and let others do the applications. Open data, etc. They are working with apple to get on iTunesU for instance (partnership with the uni). Get inspiration from cultural sector, many interesting & much downloaded apps have come from museums. Notable especially is the Street museum of London (flash-y-website, direct iTunes ap link)

Also, can't afford to hire enough cataloguers for the special collections - open up this as well, let crowdsurfing as a helpful tool. Surprised that there are people that like to correct OCR texts, which he thinks is a dreadful chore. So let's use it.

3. Panel discussion.

This wasn't as good as it could have been unfortunately, due to the wide range of the topic. Still some interesting points:

Adrian Northover-Smith from Sony of course very much pro e-ink devices and against the iPad. It's a cultural challenge for the company that their e-reader customers are female and older, most of their wares are peddled to young males. In a way, not dissimilar to libraries adjusting to the new 'digital native' generations, especially those catering to students.

Q: mobile use for people with visual impairment? A: epub format allows for more formats, larger letters, reading aloud. In some studies (art, fashion) up to 30% of students are dyslectic, and they're helped greatly by different presentation from the content. (DH: this is yet another field in which rights are the big hurdle, given the skirmishes over audiobook vs text-to-speach rights...).

Simon Bell from the British Library talked about the challenge of mass digitization. The definition of availability is shifting, and digital born data is especially volatile. Mobile access is just another form of presenting content, the content comes first now.

Jonathan Glasspool from Bloomsbury Academic talked about the publishing point of view. He presented a new platform for online publishing, using CC licenses to allow non-commercial use online. I'm curious how this compares to the European OApen project in which our uni participates.

In his view, the main challenge today is that the industry needs a new type of people. Bloomsbury has weekly voluntary 'elevenses' sessions, where staff can brief each other on new ideas and online uses they found, which seem to work well as a motivator.

Simon Bains and bevanpaul noted via tweets that there seems to be a big divide between those focussing on generating content versus those interested in new platforms, and I agree. You can't have one without the other, it's a chicken & egg situation. On the other hand, the reality is that the size of the problems are so big that to get anything done, focus is needed.

Brian Gambles mentioned that railway ticket machines were recently redesigned to deal with the visual impaired, resulting in a design that's much better for everyone. Better to incorporate it from the start: "accessibility should be in the DNA of new products".

4. Jeff Penka, OCLC worldwide

As I was preparing for my own talk, only a few notes. The main point of technology is barrier elimination for the user. We tend to think in systems, in details, jargon and acronyms: ILS, OPAC, SFX. The user just thinks a button should be "Get it". See also the importance of 'one-click' shopping in the Amazon and iTunes stores: such a seemingly small step key to dominance.

The worldcat mobile interface is very 'beta' - every 2-3 days a new release, to try things out. Expected to stabilize in spring 2011 though. An interesting remark: OCLC believes that a mobile interface should not come as an extra, at a high cost. Rimshot! Too many vendors are trying to squeeze their clients by doing exactly that.

5. Driek Heesakkers on Uba Mobiel

Download the presentation (licensed under creative commons BY-NC-SA).

Then it was my turn. I presented our small 'agile' project. See the presentation. It will be described in more detail in the upcoming book 'Catalogue 2.0' - A little ironic, as one of the themes of the day was that the catalogue is much less important to the users as it is to library professionals.

To summarize: by giving space to enthusiastic early adopters amongst staff, in the form of a low-overhead, fast-moving project that focuses on possible quick wins, a library can bridge the gap for the current transition period. In the long term, vendors will come up with solutions that present content (whether a catalogue, website or digitized objects) equally well in a mobile content as in others. This will take a while though, and in the meanwhile we can't afford that our services are (nearly) unusable on a mobile device.

Basically, the message is "just do it" - it will be easier than you think!

6. Benoit Maison on Pic2shop

A highly specialized topic. The pic2shop application offers an interesting way of merging functionality that web apps can't access (in this case, barcode scanning) with regular web apps. In the case of their worldcat enabled scanner, a user can scan a book (in a bookstore I presume), the app then passes the code on to an external website which does something useful with it (looks it up in worldcat) and the app displays the result from this website inside the app interface. To the user it's transparant, for the developer it's relatively light-weight.

It's an elegant concept. Might be useful for other specific device functionality that can't be accessed via web apps as well, though there are currently no plans for that.

The day ended with a session on augmented reality by Lester Madden, who did a good job I heard. Unfortunately my flight connection was too tight to stay for this one. The flight experience was pretty bad anyway... next time Eurostar for me!

Finally, for a little balance: on the same day, Aaron Tay wrote A few heretical thoughts about library, which deals amongst other things with the relative unimportance of mobile use at the moment. To a certain extent he has a point. It's not bad to stop for a moment and check if you're just following the pack.

Notes from CNI Spring meeting 2010

2010-05-04T16:53:00.005+02:00

I was fortunate to attend the CNI Spring 2010 Task force meeting in Baltimore, USA. This was my second time at a CNI, the first one being 2007. Compared to my previous experience, it struck me how policy has come to dominate the program, where it used to be technology. Maybe it’s because the direction where we’re heading is clear - complex objects, enriched publications, open access - and the question is now how we to get there.

Because the fragmented setup of research and academia in the US differs greatly from the situation elsewhere, this made the meeting more US-centric, which was a tad disappointing. However, it remains an interesting, intense pressure-cooker, of which afterwards it’s hard to believe it barely lasted a day and a half. Worth the jetlag.

Two sessions stood out for me. First one was a presentation by Jane Mandelbaum from the Library of Congress on a collaboration with Stanford Institute for Computational and Mathematical Engineering (iCME), to create “Metadata remediation tools” (great name!): generating summaries, short titles and geographical data from wads of text.

iCME is located in Silicon Valley, has close ties with companies there - Google, Yahoo, and small start-ups - and deals primarily with algorithms to understand text, especially with taxonomies. (which seems to be exactly what Google is trying, too, according to Steven Levy’s april 2010 article in Wired).

Interesting, as we’ve tried this in my organization, and failed miserably. This was made to work, though it took two years (!) to iron out the wrinkles between two very different cultures. Also, it’s not an equal partnership; most of the coding takes place in summer jobs, paid for by LoC. Main reason is the nature of LoC’s metadata, in which collections exist that differ greatly but are internally consistent, which makes them good candidates to refine algorithms on.

Results for LoC: apart from the code (rough around the edges, scripts rather than applications) and the generated geographical and other metadata, insight in the usefulness and value-for-money of metadata.

Software via the projectsite: http://cads.stanford.edu/

Example of unexpected results, visualization of keyword patterns: http://cads.stanford.edu/lcshgalaxy/more.html

The other session I want to mention was on Cornell’s LoL approach. Taking the Library Outside the Library: A Light-weight Innovation Model for Heavy-weight Economic Times.

An incubator-approach, outside regular channels, to quickly respond to trends. This presentation struck a chord with the audience, at moments there was an audible roar of keypresses as dozens of people typed in notable phrases in their twitter, blogging clients or notepads. One of those was when a quote from The Simpsons’ Krusty the Clown came up: "It's not just good, it's good enough!", another was the motto “there is no blame in trying something that doesn't work”. Clearly those struck a chord.

I like the setup: a small group, consisting of staff from all departments, including circulation and rare books, that spend max 5% of their time. Membership is limited to two years. The group runs 3-5 risky projects, categorized as “from trivial to easy”.

Examples: putting PD image collections on flickr and youtube, POD books from those flickr streams with Blurb, maintaining Wikipedia pages, iPhone app (made by a CS student). For mobile devices they use Siruna. Some projects were successful, some not. When projects finish succesfully, they are transferred to the regular organization; if that doesn’t work, they are killed off rather than letting them languish or peter out, as that would be discouraging.

Very pragmatic and useful - and worth copying!

Finally, the lively Twitter traffic is archived at twapperkeeper.com/hashtag/cni10s

The Red Room: workflow photo tour

2010-02-22T14:34:00.003+01:00

(part two in a short series)

In response to questions on the RFID_LIB list, I created a short photo tour of the red room, focussing on the staff side of things: the types of crate used, usability issues we encountered etc.

I've used the full range of Flickr metadata to describe the issues, unfortunately the slideshow doesn't show descriptions by default, and notes not at all. So best viewed as set: Flickr Red Room.

Alternatively, when watching the slideshow, in the options turn 'always show description' on, and watch it fullscreen (bottom right).

The red crates are made of sturdy plastic. When it became clear that custom crates were way too expensive, we settled for industry standard parts in standard sizes, and we adjusted our shelves accordingly. Same for silkscreening the numbers, so we used industrial strength plastic numbers, which turned out very well, in half a year I haven't even seen one beginning of peeling. The lesson learned: don't try to be special, and look outside the box, err, book world.

For staff determining when to add to an existing crate, and when to pick a new, we use these rules-of-a thumb:

The display shows a filling % of each existing crate and the # of items inside. This is enough for staff to figure out if there's still room. If not, new crate. If there is:
in peak periods, when the number of empty crates becomes small: always add.
otherwise, it depends on the day on which the items in the existing crates were added. If the same, we add; if in the past, pick a new.

This way, we have the flexibility to deal with peak periods with slightly more than 1000 boxes; and in less busy times, we can avoid crates with content from multiple days, which makes the workflow for processing of items not picked up more complicated, or forces us to leave the whole box until all items are expired, causing delays for other patrons.

The Red Room: self-service for a closed stack library

2010-02-17T15:34:00.009+01:00

Recently, the Libraries of the University of Amsterdam (UvA, not to be confused with Virginia's UVa - yet another reason to avoid small caps for abbreviations!) and the Amsterdam Polytechnic (HvA) completed the introduction of RFID technology for security and selfservice. It was an interesting project in many of ways. And not just because it finished within budget!

European tendering was mandatory as the costs were well above the 200k€ limit. At first, I balked at this as a necessary bureaucratic evil. My personal opinion on this has completely reversed, however: with an unexpected outsider, Autocheck Systems, winning with a clear margin both in price and quality, this was a textbook case for the merit of the tendering process.

By clearly committing our demands to paper in a neutral way, prejudice is taken out of the equation, or at least reduced to a minor multiplier. The trick is writing good specifications.

Selfservice: for open and closed stacks

Public libraries have used RFID technology for over a decade now. This has created a mature market for open stacks. However, as an academic library, the vast majority of our circulation comes from closed stacks. Here, a different solution is needed, and when we embarked on this journey two years ago, turnkey products that are affordable for the amount of traffic did not exist.

We were hoping for a clever, high-tech solution, not limited to our own imagination. We wanted to tap the creativity of the vendors, bring on fresh ideas! But we most certainly also did not want to write a blank check.

The tender therefore was split up in lots. One for the mature technology, where the functional requirements were formulated clearly, and the scoring algorithm favoured price over extra features (to be precise, in a 7:3 ratio).

For the closed stack solution however, we described our situation, with detailed circulation figures. The nature of the solution - intelligent shelves, lockers, and so on - was left to the vendor. To judge functionality against cost, the vendor would have to supply a detailed description of number of staff still needed to run the closed stacks, and all the actions in their workflow.

Closed stack circulation: the old situation

In the old days, patrons would request materials in the online catalogue. The items would be picked up by the warehouse staff and brought to the backoffice to be checked and processed, and in piles on stacks behind the desk, sorter by patron name, accessible only by staff. A few hours or one day later, depending on the location of the items, the patron would come to the desk, and staff would retreive their material.

For this system, a large number of staff was needed. Not only because the patron was serviced, but also since in the absence of a proper tracking system, the piles had to be checked time and time again, to add new requests for patrons that had already more material waiting, to remove materials that had not been picked up, and to keep everything sorted on alphabet... There was clearly room for improvement. Self-service was only one aspect of the overall workflow.

However, there was one important restriction: privacy. A patron must only be able to borrow items that he or she requested, not items requested by others; and the name of the requesting patron must never be visible to others. In other words, the system must be fully anonymous. We've had run-ins in the past with professors that were spying on each others requested items...

To cut a long story short, we're very pleased with the end result of this project, for both the open and closed stack solutions. In the remainder of this post, I'll concentrate on the Red Room, the closed stack.

The red room

Autocheck Systems supplied the RFID technology and innovative workflow systems. The eye-catching design of the room is by Bureau Ira Koers and Roelof Mulder.

Winner of the Great Indoors 2009 award - Trendhunter: showy red reading rooms - ArchDaily - Abitare (Italy) - ...

Some have called it the most beautiful circulation desk in the world... people love it or hate it, but it leaves hardly anyone untouched. Which is precisely what a library needs in these dark days, isn't it?

Patron's point of view
From the patron point of view, it works like this:

Patron requests an item in the online catalogue;
when the item is ready to be picked up in the Red Room, the patron receives an email with box number(s);
the patron can also check the box numbers where requested items may lie by scanning the library card at an 'infostation' outside the Red Room;
patron picks up the objects from the box, checks them out using selfservice machine inside the room and leaves (and when checking out is forgotten, the alarm at the entrance/exit of the room will go off).

The first weeks, we scheduled staff to stand by, but we stopped this when it became clear that patrons were just 'getting it' very well by themselves. Of course, staff is still available at the information desk nearby, which deals with various oddities that can come up, as well as patrons that need handhelding.

As for the design of the room that leaves nobody untouched, it fulfills its purpose well: it looks quite glamourous, which invites the patrons to treat it with care; and it does not invite to linger, which is good as patrons are supposed to get their items, check them out and then leave the room, either to the many study places in the building or outside.

On the whole, patrons have quickly adopted the new system, and reacted very positively. Requested items can now be picked up during opening hours, seven days a week, every day except Sunday til midnight. The original manned desk was open during office hours, two brief evening windows and Saturday morning.

To our surprise, what our patrons liked even more was the email service announcing the availability of the item. Because this email comes not from the ILS, but from the system that handles the boxes (more about that later), it is sent out the very moment the material is there. In the early days of the system, a glitch caused the mails to be sent out a few minutes early - which caused angry patrons at the information desk requesting why their box was empty...!

There is occasional dismay on the loss of face-to-face interaction. That was to be expected. There are however still plenty of opportunities for human interaction, both on-topic and off-topic. For the former, our information desk is doing brisk business. As for the latter... coffee can be had literally around the corner.

Also for special materials
The Red Room has security gates, forcing a patron to always check their materials with the selfservice machines inside. The security gates of the Red Room check the EAS bit, the regular gates at the main entrance check the AFI. This strategy with two separate zones enables us to also use the Red Room for materials that patrons can only use inside the building, but are not permitted to take elsewhere. For these materials, the selfservice machine leaves the security bit for the outer zone protected.

For this to work however, the ILS needs to send the object handling status to the machine. This is not a part of the SIP2 protocol, as implemented by Aleph! Luckily for us, for materials which may not be taken home, the return date in the SIP2 string is always set to today (it's implemented in the system as 1-day loan). By checking this date, we can work around this limitation.

As any professional knows, RFID cannot provide 100% security. The truly rare material - 100+ years old, collectors items and such - is therefore not handled by the Red Room, but sent to the Rare Books department, where the security measures borders on paranoia, and rightly so.

This also relieves us from nasty dilemmas, such as where on this Blaeu Atlas would I glue this tag... with an estimated lifespan of one or two decades?

Internal workflow
Behind the scenes, the system is, as per Einsteins quip, as simple as possible, but not simpler.

Tagging happens on per-need-basis - with 3+ mln items on closed stacks, it would be impractical and needlessly expensive to tag all items in advance.

Requested items come in from the stacks. If not yet tagged, staff add and program an RFID tag first. In rare occasions, they add a barcode if the item doesn't yet have it. Even though barcodes have been added to all requested materials since 1985, still relatively frequently items pass by that apparently have not been requested in 25 years... another reason to tag only when needed.

After tagging (as well as several other checks), incoming items are sorted by patron name. This is printed on the slip that starts the process inside the warehouse, no interaction with the ILS is needed.

Finally the items are paired with one or more boxes. This is an RFID-supported process that takes place at one of two specialized workstations with a touchscreen and large RFID antenna.
The staff member takes a pile of items and puts it on the RFID antenna of the station. The item ID's are read from the tags. From the item ID's, the user that has made the request is queried from the ILS (unfortunately this is not part of SIP2 or other standards, so a custom webservice needs to be set up. A big thank you to Leiden University for sharing their code!).
The station then first check whether all items on the pile are requested by one patron. Some names are common, and where people work, mistakes inevitably are made. If not, the staff needs to take away items until only items from one patron remain.
Then, the system checks whether this patron already has materials ready in the Red Room. In that case, the staff is presented with a list of boxes. Staff can choose to add items to an existing box, in which case a little slip is printed with book title and ID, and box number to add it to.
Finally, if the item(s) are to be put in a new box, the staff takes an empty box. The boxes are visibly numbered, but also have a tag, so the box only needs to be put on the RFID reader, and the link is made.
The box, filled with the items, is put on a trolley. The request slips are removed - important as the patron's name is printed on them.

After some time, or when it is full, it's driven to the Red Room and the boxes are put in place. On returning, the staff member sends off the patron emails.

In the old days, ~20% of the requested items were never picked up. Retreiving these overdue items was a labourious process that the new system has greatly simplified. At off-peak times, staff print a list of boxes that can be emptied.

Conclusion: happy with lo-tech
In practice, after some initial quirks, the system has been working remarkably well. It's fast, and does not get in the way; it is indeed simple. In the end, lo-tech proved the way to go.

Unfortunately, this summer also saw the migration from our old ILS to Aleph. This makes it hard to calculate the actual staff saving, since the entire workflow has changed in many ways. Current estimates are however that the business case is sound.

Our tender documents are available on request (and have already been used by one other institution), and I'll be happy to answer any further questions.

OR 09: three more neat Fedora implementations

2009-06-15T17:45:00.002+02:00

Open Repositories 2009, Day 4

Three more notable sessions on implementing Fedora. Hopefully, the penultimate post before a final round-up. What a frantic infodump this conference was...

Enhanced Content Models for Fedora - Asger Blekinge-Rasmussen (State and University Library Denmark)

A hardcore technical talk, though impressive in the elegance of the two points shown: bringing the OO model to Fedora object creation, and a DB style ‘view’ for easy creating searching and browsing UIs.

The first is created as an extension of Fedora 3’s standard Content Models, yet backward-compatible, which is a feat. Notable extra’s: declares allowed relations (in OWL lite), schema for xml datastreams. Includes validator service (which is planned as disseminator, too). Open source [sourceforge].

Fedora objects can be manipulated at quite high level using API, but population needs to be done at much lower level. Thus most systems roll their own. Our solution: templates, data objects created as instances of CM’s, not unlike OO programming. Makes default values very easy. No need for handcoded foxml anymore, halleluja! Create, discover, clone templates using template web service.

Then there are repository views, which bundle atomic objects into logical records. Search engine record might be made up of bundle of Fedora objects.
Defined by annotated relations; view angles to create different logical records.
‘view = none’: then omitted from results (useful for small particles you don’t want to have show up in queries, for instance separate slides).

These simple API additions make it easy to create elaborate, simple GUI’s. Which includes the first one I’ve seen that comes close to a workable interface for relationship management - not quite a full drag’n drop, but getting there.

Beyond the Tutorial:Complex Content Models in Fedora 3 - Peter Gorman, Scott Prater (University of Wisconsin Digital Collections Center)
[presentation]

Summary: A hands-on walk through of the Wisconsin DIY approach. Also, an excellent example of what a well-done Prezi presentation can look like: literally zooming in on details then zooming out on the global context was really helpful to see the forest for the trees.

The outset: migrating >1million complex, heterogeneous digital objects into Fedora. Use abstract CM’s, atomistic, gracefully absorb new kinds and new combinations of content. Philosophy: 'fit the model to the content, not the content to the model'.
(Not in prodction yet, prototype app; keep eye out for 'Uni Wisconsin digital collections')

Prater starts out with the note that it’s humbling to see that the Hydra and escidoc people have been working on the same problem. However IMHO there’s no reason for embarrassment, as their basic solution is very elegant.

Using MODS for toplevel datastream (similar approach to Hydra). STRUCT datastream: a valid METS document, tying objects to hierarchy. Important point: CM’s don’t define structure, that’s for STRUCT and RELS-EXT.

Every object starts with a FirstClassObject, which points to 0-n child objects of arbitrary types. If zero it’s a citation. To deal with sibling relationships (ie 2 pages in specific order), an umbrella element is put on top with a METS resource map. This allows full METS functionality. Linking using simple STRUCT and RELS-EXT. Advantage over doing everything in RESLEXTS: that doesn’t allow to express sequencing.

Now, to tie this ‘object soup’ together in an app (common problem for lots of objects, to turn the soup into a tree), the solution is simple: always use one monolithic disseminator, viewMETS(). This takes PID for FirstClassObject, returns valid METS doc containing object and all its (grand)children.

This is brilliant: a one-stop API to get the full object tree from a given PID, hiding the complexity of the umbrella object and the METS description involved.

The only part they’re not very satisfactory yet about is how to relate related items between FirstClassObjects and relations between two top-level logical objects (ie journal and article) that are sometimes parent/child, sometimes not.

To which Asger chimed in that his ‘angle view’, demonstrated in the talk before, would be a possible solution for this. I saw them discussing later... I love it when a plan comes together.

When Ruby Met Fedora- Matt Zumwalt (Media Shelf)

A live demonstration of ActiveFedora which made my fingers itch to start coding straight away - until I remembered Ruby’s Unicode issues, rats.

The philosophy behind: use Fedora for long-lived content, but be able to quickly create short-timed services and apps.

ActiveFedora can be used without Rails, or even without Ruby (you can call it from the shell). However, Ruby’s OO model maps very well on Fedora. The key difference with say java or C++: you don’t know what kind of object you’ll get back to a call.

The demo shows the standard rails environment, except the Model directory. There, calls to ActiveRecord are replaced with calls to ActiveFedora. AF exposes Fedora objects with multiple properties. Qualified DC is built-in, but the has_properties function allows for easy extension.

An interesting advantage of this approach is that the methods as used by developers use the same jargon as the metadata users are used to. “they communicate much better when a method’s called dc.subject.”

There’s quite a bit to do ATM. They’ve received funding to hire a student to finally write real documentation. Other extensions: built-in SOLR integration, more generators for standard situations, basic CM integration. Interesting is the approach to integrating MODS: use the existing, mature java libraries, which is easy when using JRuby as interpreter.

OR 09: eScidoc's infrastructure

2009-06-11T17:00:00.002+02:00

eSciDoc Infrastructure: a Fedora-based e-Research Framework - Frank Schwichtenberg, Matthias Razum (FIZ Karlsruhe)

I had not expected this presentation to be as good as it was - it was a real eye-opener for me. It dealt solely and bravely on the underlying structure of eScidoc, not the solutions built on top of them (such as PubMan). So, delving into the technical nitty gritty.

So far, to me eSciDoc has been an interesting promise that seemed to take forever to materialize into non-vaporware. DANS wanted to use it as the basis for the Fedora-based incarnation of their data repository EASY, a plan they had to abandon when their deadline was looming near and the eScidoc API's were still not frozen. Apart from that, the infrastructure seemed also needlessly complex - why was another content model layer necessary on top of Fedora's own?

The idea behind the eScidoc approach is to take a user-centric approach, which in case of the infra, that's the programmer. What would she like to see, instead of Fedora's plain datastreams?
Tentative answer: an application-oriented object view.

eScidoc takes a full atomistic approach to content modelling: an Item is mapped to a fedora object (without assumption about the metadata profile - keeping it flexible). Then, Item has Component. An Item in practice consists of two fedora objects, with a ‘hasComponent’ relation between.

Object can be in arbitrary hierarchies: except the top hierarchies which are reserved for ‘context’, which can be used for institutional hierarchies (a common approach, I can live with that). All relationships are expressed as structmaps.

So far so good, but now the really neat part.

Consequences of the atomistic content model for versioning: a change can occur in any of the underlying fedora objects of a compound object, with consequences for both.
The eScidoc API's store the Object lifecycle automatically. And when one Component changes or is added, the Item object also changes version, but not the other Components.
(the presentations slides are really instructive on this, worth checking out when they're online).

This also delivers truly persistent ID’s (multiple types supported: DOI, handle, etc), separate from fedora’s PID’s which are not really persistent. And every version has one - both of the compound and the separate Item objects. All changes (update/release/submit events etc.) are logged in version log has events, if I remember correctly this log can be used for rollback ie it is a full transaction log.

This is the reason that the security model has to be in the escidoc layer, not fedora's (though the same policies & structures xacml are used). This is eScidoc's answer to the question common to many fedora projects: how to extend fedora's limited security? It might be best to take the whole security layer out of Fedora.

IMHO this is very exciting. This is about the last thing that a project would need to roll yourself - it is incredibly complex to get working correct and durable - and here it is, backed by a body of academic research - it is a German project after all. For me, this puts eScidoc firmly on the shortlist of frameworks.

OR 09: blogosphere links

2009-06-10T17:37:00.003+02:00

Nearly three weeks afterwards, it's time to round up the OR 09 posts... Unfortunately, library life got in the way. Meanwhile, why not read the opinions of these honoured colleagues, that are undoubtly better informed:

loomware.typepad.com/ (Mark Leggott)

Open Repositories 2009 - Peter Sefton's trip report (ptsefton.com)
Open Repositories 2009 – Peter Sefton's further thoughts (caulcairss.wordpress.com)

Leslie Carr (repositoryman.blogspot.com)

John Robertson (Strathclyde)

http://repositoryblog.com/archives/18

http://www.weblogs.uhi.ac.uk/sm00sm/2009/05/

http://jhulibrariestravel.blogspot.com/2009/05/open-repostories-2009.html (Elliot Metsger, Johns Hopkins)

Finally, another bunch'o'links:
http://repositorynews.wordpress.com/2009/05/28/open-repositories-2009/

OR09: Four approaches to implementing Fedora

2009-06-05T15:08:00.003+02:00

Open repositories 2009, day three, afternoon.

So far, the conference had not been disappointing, but now it got really interesting. The sessions I followed in the afternoon each highlighted a specific approach of the problem that IMHO has been standing in the way of wider Fedora acceptance: middleware.

What these four have in common, is that they all take leverage an existing OSS product and adapt it to use Fedora as datastore.

1. Facilitating Wiki/Repository Communication with Metadata - Laura M. Bartolo

Summary: interesting approach, a traditional Fez spiced up with Mediawiki. With minimal coding a relative seamless integration.
For this to work, contributors need to know MediaWiki markup, and to really integrate, must learn the fez-specific search markup. Also, I'm not sure how well this can be scaled up to true compound objects, given Fez' limitations.

Notes:
Goal: disseminating of research resources. Specific sites for specific science fields, ie soft matter wiki, materials failure case studies.
MatDL repository: has a repository (Fedora+Fez), want to open up two-way communicating. Example: Soft matter expert community, set up with MediaWiki. "Mediawiki hugely lowers the barrier for participating": familiarity gives low learning curve.

The question: how to integrate the repository with the wiki two-way.

Thinking from user-centric approach. Accommodate user; support complex objects (more useful for research & teaching) thus describe them parts as individual objects.

Components:
- Wiki2Fedora
Batch run. Finds wiki upload file, converts referencing wiki pages to DC metadata for ingest in rep. (wiki has comment, rights, author sections -> very doable) Manual post-processing (Fez Admin review area function)
-Search results plug-in for wiki: display repository results in wiki search. Adds to mediawiki markup, to enable writing standard fez queries in the content.

Sites: Repository - Wiki

2. Fedora and Django for an image repository: a new front-end - Peter Herndon (Memorial Sloan-Kettering Cancer Center)

Summary: using Django as a CMS, internally developed adapters to Fedora 3.1.

My gut feeling: A specific use case, images only, so rather limited in scope. Despite choosing the 'hook up with mainstream package' strategy, effectively still a NIH-based rolling their own. That makes the issues even more instructive.

Notes:
Adapting a CMS that expects SQL underneath is challenging - the plugin needs to be a full object-to-relational database mapper.
Also, Fedora 'delete' caused 'undesired results', 'inactive' should be used.
Further, some more unexpected oddities: had to write their own LDAP plugin to make it work, django has tagging but again plugin was needed to limit this to controlled vocabularies. Performance was not a problem.
Interesting: repository for images only, so exif and the like can be used - tags added using Adobe Bridge! The tested, successful strategy: make use what is already familiar.
In the Q&A the question came up: why use Fedora in this case anyway? Indeed the only reason would be preservation, otherwise it would have saved a lot of trouble to use Django Blobstore.

The django-fedora plugins are available at bitbucket.org.

3. Islandora: a Drupal/Fedora Repository System - Mark A Leggott (University of PEI)

Summary:
Islandora looks *very* promising. I noted before (UPEI's Drupal VRE strategy) that UPEI is a place to watch - they are making radical choices with impressive outcomes.

Notes:
UPEI's culture is opensource friendly. They use Moodle and Evergreen (apparently, they were the first Evergreen site in production).

Rationale: opensourcing an in-house system reinforces good behaviour: full documentation, quality code.

As noted before, UPEI's repositories are hidden behind VRE (see [link]). VRE's are geared towards the researchers. Example of approach: the first thing people do when they set up a VRE is create a webpage. That's what a project needs, and so it's used as a hook to reel people in, they're up and running within a few hours.

The VRE is Drupal; Fedora is for data assets, metadata, policies.
Base Islandora consists of three plugins: Drupal-Fedora connection plugin, xacml filter, rule engine for searches.

This 'rule engine' is indeed very cool.
In a later private conversation with Mark Leggott, he clarified that Islandora indeed uses an atomistic complex object model for research data; the rule engine declares how these can be searched from within Drupal. Example, a dataset consisting of a number of measuring points, each with a set of instruments, atomistically in Fedora; can be queried as 'all the results from specific measure point', 'all the result from instrument x', 'instrument x in specific period' etc.
We haven't reached Nirvana yet, to make the deconstructing of the data objects possible, they have to adhere to specific format (xml). But it's impressive nevertheless.

Other Drupal plugins add functionality for specific data. Impressive example: Drupal FCK editor used as TEI editor, after editing, automatically ads version to datastream. Very cool and 'Just Works' (cheery tweet).

Marine Natural Products Lab: best example of the setup for VRE which includes extensive repository (searchable within the critter xml).

Previous versions used drupal 5/fedora 2, not maintained; currently drupal 6/fedora 3.1

Q: did you replace the drupal storage layer, or do you sync?
A: sometimes it’s saved in the drupal layer, when it doesn’t need to go into fedora (temporary data, while we build the content model). Drupal filesystem is a potential bottleneck when large datablobs

Q: are you bound to content models?
A: standard fedora cm’s, you can build them yourself or change the delivered one. The models are exposed, you can see how it works. We first installed Fez to see how Fedora worked.

4. Project Hydra: Designing & Building a Reusable Framework for Multipurpose, Multifunction, Multi-institutional Repository-Powered Solutions - Tom Cramer (Stanford University), Richard Green (University of Hull), Bess Sadler (University of Virginia) et al.

Summary:
I'm even more excited about Hydra than about Islandora. Different approach: create "A lego set of services". In other words, a toolkit for the common parts of applications.
It all looks really good. Two gotchas though. Firstly, it is still a work in progress. Can we afford to wait? Secondly, there are issues with the Unicode support of Ruby on Rails.

For more info: D-Lib.

Notes:
Modelled after the current 12+ use cases of repositories in use at partner institutions, both institutional and personal.
It needs generic templates - which sometimes may do the job - otherwise it won’t come off the ground.
Hydra will have common content models and datastream names. But ultimately they want Hydra to be able to cope with almost anything. A MODS datastream will always have to be there, but not necessarily as primary (so can be done via dissemminator).

Four multifunctional sections:

Deposit
manage (edit objects, set access)
search & browse
deliver
plus plumbing: authent, author, complex workflow

Using Rails with ActiveFedora. Turns out Rails lives up to its reputation: they are way ahead of their initial roadmap, now expect full production app by fall.

Specs 3/4 ready, coding 1,5/4.
Demo: http://hydra-dev.stanford.edu/etds

Presentation builds on top of blacklight OPAC. Virgina already has a beta version of their catalogue up using blacklight.

OR09: On the new DuraSpace Foundation, and Fedora in particular

2009-06-01T19:53:00.002+02:00

Open Repositories 2009, day 3, morning: three sessions on Fedora.

The morning started with a joint presentation by Sandy Payette (Fedora Commons) and Michele Kimpton (DSpace Foundation), focussing on strategy and organisation; after caffeine break, Fedora+DSpace tech overview by Brad McLean; finally, developers' open house.

I'll cover it in one blog post (this or09 series is getting a bit long in the tooth, isn't it?). For the actual info on DuraSpace and all, see the DuraSpace website. The tech issues were covered more in depth in further sessions.

The merger, by new almost old news, though the incorporation lies still in the future: Fedora Commons and the Dspace user Group will become DuraSpace. The 'cloud' product, that originally had the same name, is renamed DuraCloud.

Not the easiest of presentations, as there is a good deal of scepticism around the merger, and not just on the twitter #or09 channel. Payette and Kimpton handled it very professionally, dare I say gracefully. Both standing on the floor, in front of the audience, talking in turns (did I imagine it, or did I really hear them taking over a sentence, in Huey & Dewey style?), while an assistant standing behind the laptop was going back and forth through the slides in perfect timing.

All in all, they pulled it off to come across as a seamless team. That bodes well.

Also well was a frankness in the Q&A (as well as later in the developers open house). After noting some difficulties in finding the right strategy for open source development: "we do not aim to mold DSpace's opensource structure to the Fedora core committer, on the contrary".

"We have to ask ourself: are we really community driven in the Fedora project? We've been closed in the past, we're opening up." Fedora has started using a new tracker, actually modelled on DSpace's model; "please use it, our tracker is our new inbox."

On the state of Fedora - many and diverse new users.

Escidoc is now deployable.

WGBH OpenVault - including annotated video

Forced Migration Online

Jewish Women Archive - runs in EC2, first of a new wave of smaller archives now coming online using limited resources.

Notably missing on a slide listing 'major contributors': Mediashelf, Sun, and Microsoft Research: VTLS. Possibly a sponsoring issue? It was more than a bit odd, given their standing in the past.

Q: "How do yo see the future of DSpace vs. fedora - do they compete?"

A: "Fedora’s architecture is great, but we also need ‘service bundles’. CMS style on top for instance. The architecture will stay open for any kind of app on top. DSpace is going the other direction. Opportunity is to make sure we're not doing identical things with different frameworks."

It is *so* easy to read this as 'the products will meet in the middle', but this was carefully avoided. However, in the tech talk later it was mentioned that Fedora-DSpace replication back and forth experiments are actively worked on.

I think I'm not alone in thinking that the products will merge eventually. It will take some time, but they will.

Q: (cites another software company merger, IIRC Oracle and Peoplesoft) – merger brings great unrest in communities, which one is going to die? Are F&D moving together? Technical and cultural changes for both communities? etc.

A: Payette: any kind of software eventually becomes obsolete. We are determined not to let that happen, and for that it needs to be modular and organic. Side by side, cause they both do things well. When overlap starts to happen, that may change, but by the module.

Peter Sefton chimed in: very positive. Right decision at the right time. Focus on cloud computing is essential, feels that this is what we’re moving towards, and our current monolithic repositories need to adapt to that.

Some DSpace 1.x upcoming features: statistics, embargo, batch editing. I don't know that much about DSpace, and it shows: I was surprised that these weren't covered yet. Esp. batch editing and embargo, pretty basic features. I know too little of DSpace to judge the announced 2.0 features, apart from the DuraCloud integration using Akubra.

Fedora 3.2 highlights:

SWORD API 1.3. Of course. Nice though
new web admin client. Not all of the features implemented, so the java client hasn't been deprecated - it will in future. This is a big deal, as the client is also useful for metadata editing staff.
akubra: store files by ID, pluggable, stackable, multiplexing (ie on multiple storage environments that to the API look as one big one). Experimental, meaning included but not turned on by default.

Finally, the Fedora developer open house was like getting the pulse of the developer community. Summary: there are pains, communication has been problematic, with a gap between the committers and the community. My impression is that it is finally being talked about, and the core developers in the panel admitting that a change is needed. A constructive and open approach.

OR09: Repository workflows: LoC's KISS approach to workflow

2009-05-30T16:24:00.003+02:00

Open Repositories 2009, day 2, session 6b.

Transfer and Inventory Components of Developing Repository Services

Leslie Johnston (Library of Congress)

My summary:

A practical approach to dealing with data from varying sources, keep it as simple as possible, but not simpler.

The ingest tools look very useful for any type of digitization project, especially when working with an externel party (such as a specialized scanning company).

The inventory tool may be even more useful, as lifecycle events are generally not well covered by traditional systems, be it CMS or ILS.

Background

LoC acts as durable storage deposit target for widely varying projects and institutions. Data transfers for archiving range between an usb stick in the mail to 2Tb transferred straight over the network. The answer to dealing with this: simple protocols, developed together with uc digilib (see also John Kunze).

Combined, this is not yet full a repository, but it covers many aspects of ingest and archive functionality. Rest will come. Aim: provide persistent access at file level.

Simple file format: BagIt

Submitter is asked to describe files it in BagIt format.

BagIt is a standard for packaging files; METS files will fit in there, too. However, BagIt wascreated because we needed something much, much, much simpler. It’s not as detailed; description is a manifest, it may omit relationships, individual descriptions, etc. It is very lightweight (actually too light: we’ve started creating further profiles for certain types of content).

LoC will support Bagit similarly and simultaneously to MODS & METS.

Simple tools

Simple tools for ingest:

- parallel receiver (can handle network transactions over rsync, ftp, http, https)

- validator (checks file format)

- verifyit (checksums files)

These tools are supplied as java lib, java desktop application, and LocDrop webapp (prototype for SWORD ingest).

Integration between transfer and inventory is very important: trying to retrieve the correct information later is very hard.

After receiving, inventory tool records lifecycle events.

Why a standardized tool: 80% of workflow overlap between projects.

All tools availble open source [sourceforge]. What's currently missing will be added soon.

OR09: Repository workflows: ICE-TheOREM, semantic infra for theses

2009-05-30T15:54:00.003+02:00

Open Repositories 2009, day 2, session 6b.

ICE-TheOREM - End to End Semantically Aware eResearch Infrastructure for Theses

Jim Downing (University of Cambridge), Peter Sefton (University of Southern Queensland)

Summary: great concept, convincing demonstration. Excellent stuff.

Part if ICE project, a JISC funded experiment with ORE.

[paper] (seems stuck behind login?)

Importance of ORE: “ORE is a really important protocol – it has been missing for the web for most of its life so far.” (DH: Amen!)

Motivations for TheOREM: check ORE – is it applicable and useful? What are different ways of using? How do SWORD and ORE combine?

Pracitally: improving theses visibility, embargoes as enabler.

Interesting: in the whole repository system, the management of embargoes is separated from the repository by design. A special system serves resourcemaps for the unembargoed, IR polls these regularly. Interesting: this reflects the real-world political issues, and makes it easier to bring quite radical changes.

Demonstrator (with the Fascinator) with one thesis, with reference to data object: molecule description in chemical markup language (actual data).

Simple authoring environment in openoffice Writer (Word is also supported), stylesheet + convention based approach. When uploaded, the doc is taken apart to atomistic xml objects in Fedora. The chemical element is a separate object with relation to the doc, versioning etc.

Embargo metadata is written as text in the doc (on title page; date noted using convention,KISS approach), and a style (p-meta-date-embargo) is applied. The thesis is again ingested - and voila, the part of the thesis with embargo is now hidden.

This simple system also allows dialogue between student and tutor - remarks on the text - to be embedded in the document itself (and hidden to the outside by default). It looks deceivingly like Words's own comments, which I imagine will ease the uptake.

Sidenote: policy in this project is that only submitter can ever change embargo data. So it is recommended to use openID rather than institutional logins, as PhD graduates tend to move on, and then nobody can change it anymore.

Q (from Les Carr): supervisors won’t like to have their interaction with students complicated by tech. What is their benefit?

A: automatic backing up is a big benefit, also of the workflow (ie. the comments in the document text). We *know* students appreciate it. Supers may not like it but everyone else will, and then they’ll have to.

(note DH: this is of course in the sciences, it will be an interesting challange to get the humanities to adhere to stylesheet and microformatting conventions)

Q: can this workflow also generate the ‘authentic and blessed copy’ of the final thesis?

A: Not in project scope, we still produce the pdf for that. In theory this might be a more authentic copy, but they might scream at the sight of this tech.

OR09: Social marketing and success factors of IR’s.

2009-05-30T15:21:00.004+02:00

Open Repositories 2009, day 2, session 5b.

Social marketing and success factors of IR’s: two thorough but not very exciting sessions. Though the lack of excitement is maybe also because the message is quite sobering: we already know what needs to be done, but it is very hard to change the (institutional) processes involved.

Social marketing approach to IR, a Canadian perspective.

(where social marketing doesn’t stand for web2.0 goodness, but for marketing with the aim of changing social behaviour, using the tools of commercial marketing).

Generally, face to face contact works best - on faculty scale, or in smaller institution like UPEI.

One observation that stuck with me is that the mere word repository is passive, where we want to emphasize exposure. This is precisely our problem as a whole in moving the repository into an active part at the center of the academic research workflow, instead of a passive end point.

Finaly, the list of good examples started out with Cream of science! We tend to take it for granted here in the Netherlands, and focus on where we're stuck; it’s good to be reminded how well that has worked and still does.

Secrets of succes - identifying success factors in IR's.

Interim news from uMich Miracle project (Making Institutional Repositories A Collaborative Learning Environment).

Not very exciting yet, might change when they’ve accumulated more data (it’s a work in progress, five case studies of larger US institutions, widely varying in policy, age, technology).

Focus on “outcome instead of output”.

Focus on external measurements of success, instead of internal (ie number of objects etc). Harder to enumerate, less easy, but gets more honest results.

OR09: Keynote by John Wilbanks

2009-05-27T17:23:00.003+02:00

Open Repositories 2009, day 1, keynote.

Locks and Gears: Digital Repositories and the Digital Commons - John Wilbanks, Vice President of Science, Creative Commons

Great presentation - in content as well in format. Worth looking at the slides [slideshare - of a similar presentation two weeks earlier]. [Which was good, because it was awkwardly scheduled at the end of the afternoon, that's great with a fresh jetlag, straight after the previous panel session without as much as a toilet break.]

The unfortunately familiar story of journals on the internet, scholars' rights eroding, which causes interlocking problems that prevent the network effect.

Choice quotes:
“20 years ago, we would have rather believed there be a worldwide web of free research knowledge, than Wikipedia.”
"The great irony is that the web was designed for scientific data, and now it works really well for porn and shoes."

The CC licenses are a way of making it happen with journals. However, for data even CC-BY is making it hard to do useful integration of different datasets. Survey of 1000 bio databases: >250 different licenses! Opposite law of open source software: the most conservative license wins.

Example of what can happen if data is set free: Proteomecommons.org: bittorent for genomes. Thanks to CC Zero.

What can we do?
Solve locally, share globally.
Use standards. And don’t fork them.
Lead by example.

Q: opinion on wolfram alfa? Or Google Squared?
A: pretty cool, doubts about scaling. It may be this or something else, rather open source than ‘magic technology’. But it’s a sign that the web is about to crack.
“The only thing that’s proven to scale is distributed networks.”

(my comment - with an estimated 500.000 servers, that is precisely what Google is...)

OR09: Panel session - Insights from Leaders of Open Source Repository Organizations

2009-05-27T16:42:00.003+02:00

Open repositories 2009, day 1, session 4.

A panel with the big three open source players (Dspace’s Michelle Kimpton and Fedora Commons’ Sandy Payette, freshly merged into Duraspace, ePrints’ Les Carr) and Lee Dirks from Microsoft. Zentity (no, not Zentity - 1.0 was officially announced at this conference) brings up lots of good questions. Unfortunately it didn’t get to an interesting exchange of ideas.

I’ll concentrate on Microsoft, as they were the elephant in the room. Warning: opinions ahead.

Microsoft is walking a thin line, their stance has been very defensive. Dirks started out quipping that “We wanted to announce Microsoft merging with ePrints, we got together yesterday, but we couldn’t agree on who was going to take over who.”

He went on stressing that this is Microsoft Research and they're not required to make a profit. Putting on a philanthropist guise, he went on that their goal is to offer an open source repository solution to organizations that already have campus licenses. “How can we help you use software that you already paid for but maybe don’t use?”. They claim they don't want to pull people away from open source solutions.

The most interesting parts were what he was *not* saying. Which open source does MS not want to pull us away from - Java? MySQL? Eclipse? Or did he only mean open source repository packages?
Yeah right… getting visual studio, IIS, SQL server and the most dangerous of all, Sharepoint a foot in the door.

An audience question that nailed the central issue: "The question will be lock-in. commitment in other parts of the lifecycle are therefore more important. Zentity hooks you up everywhere in the MS stack."
Dirks responded with "Everything we’ve done, is built on open API’s, be it Sharepoint or Office or whatever. You could reconstruct it all yourself."

Well with all respect to the Mono and Wine efforts, I wouldn't call Sharepoint and Office API's you could easily replace. The data will still be in a black box. Especially if you want to make any use of the collaboration facilities. Having open API's on the outside is fine and dandy, but one thing we're learned so far with repositories is that it is hard to create an exchange (metadata)format that is neither too limited nor so complicated it hinders adoption.

On an audience question his stance on data preservation, Dirks initially replied that ODF would solve this, including provenance metadata. No mention of the controversy around this file format - what use is an xml format that cannot be understood? - or on filetypes outside the Office Universe.

When this debate stranded, Sandy Payette turned the mood around by mentioning that MS has contributed much to interoperability issues. It is indeed good to keep in mind that MS is not just big and bad - they aren't. A company that employs Accordionguy can't be all that bad. The trouble is, you have to stay aware and awake, for they aren't all that good, either. Imagine an Office-style lock-in for collaboratories.

OR09: NSF Datanet-curating scientific data

2009-05-26T14:29:00.003+02:00

Open Repositories 2009, Day 1, session 3. NSF Datanet-curating scientif data, John Kunze and Sayeed Choudhury.

The first non-split plenary (why a large part of the first two days consisted of 'split plenaries' baffled me, and I was not the only one).

Two speakers, two approaches. First John Kunze from UCDL, focussing in the microlevel with a strategy of keeping it simple. "Imagining the non-repository", "avoid the deadly embrace" of tight standards: decouple by design, lower the barrier of entry.

One of the ways to accomplish this is by staying lo-tech: instead of fullblown database systems, use a plain file system and naming conventions: pairtree. I really like this approach. I've worked in large digitization projects with third parties delivering content on harddisks. They bulk at databases and complicated metadata schemes, but this might just be doable for them. Good stuff.

CDL has a whole set of curation microsystems, as they call it. I'm going to keep an eye out for this.

The second talk, by Sayeed Choudhury (Johns Hopkins), focussed on the macro level of data conservancy. This was more abstract, and he started out with the admission that "we don’t have the answers, there are unsolved unknowns - otherwise we wouldn’t have gotten that NSF grant".

Interesting: one of the partner institutions (not funded by NSF) is Zoom Intelligence – a venture capital firm, interested in creating software services on research data. First VS's bought into ILS, now they pop up here... we must be doing something right!

Otherwise, the talk was mostly abstract and longer term strategy.

OR09: Institutional Repositories: Contributing to Institutional Knowledge Management and the Global Research Commons

2009-05-25T17:47:00.004+02:00

Day 1, session 2b.

Institutional Repositories: Contributing to Institutional Knowledge Management and the Global Research Commons - Wendy White (University of Southampton)

Insightful, passionate kick-ass presentation, with some excellent diagrams in the slides (alas I found no link yet), especially one that puts the repository in the middle of the scientific workflow. The message was clear: tough times ahead for repositories – we have to be an active part of the flow, otherwise we may not survive.

Current improvements (see slides: linking into HR instead of LDAP to follow history of deployment, lightbox for presentation of nontext material) are strategy-driven, which is a step forward from tech-driven, but still piecemeal.

Predicts grants for large scale collaboration processes could be tipping point for changing lone researcher paradigm.

(in my opinion, this may well be true for some fields, even in the humanities, but not for all. Interesting that for instance The Fascinator Desktop aim to serve those ‘loners’).

Stress that Open access is not just idealism, it can also benefit in highly competitive fields – cites a research group that got a contract because the company contacted them after they could see what their researchers where doing.

“build on success stories: symbols and mythology”.
“Repository managers have fingers in lots of pies, we are in a very good position to take on the key bridging role.”
It will however require a culture change, also in the management sphere. In the Q&A she noted that Southhampton is lucky to have been through that process already.

All in all, a good strategic longer term overview, and quite urgent.