Friday, June 05, 2009

OR09: Four approaches to implementing Fedora

Open repositories 2009, day three, afternoon.

So far, the conference had not been disappointing, but now it got really interesting. The sessions I followed in the afternoon each highlighted a specific approach of the problem that IMHO has been standing in the way of wider Fedora acceptance: middleware.

What these four have in common, is that they all take leverage an existing OSS product and adapt it to use Fedora as datastore.

1. Facilitating Wiki/Repository Communication with Metadata - Laura M. Bartolo

Summary: interesting approach, a traditional Fez spiced up with Mediawiki. With minimal coding a relative seamless integration.
For this to work, contributors need to know MediaWiki markup, and to really integrate, must learn the fez-specific search markup. Also, I'm not sure how well this can be scaled up to true compound objects, given Fez' limitations.

Goal: disseminating of research resources. Specific sites for specific science fields, ie soft matter wiki, materials failure case studies.
MatDL repository: has a repository (Fedora+Fez), want to open up two-way communicating. Example: Soft matter expert community, set up with MediaWiki. "Mediawiki hugely lowers the barrier for participating": familiarity gives low learning curve.

The question: how to integrate the repository with the wiki two-way.

Thinking from user-centric approach. Accommodate user; support complex objects (more useful for research & teaching) thus describe them parts as individual objects.

- Wiki2Fedora
Batch run. Finds wiki upload file, converts referencing wiki pages to DC metadata for ingest in rep. (wiki has comment, rights, author sections -> very doable) Manual post-processing (Fez Admin review area function)
-Search results plug-in for wiki: display repository results in wiki search. Adds to mediawiki markup, to enable writing standard fez queries in the content.

Sites: Repository - Wiki

2. Fedora and Django for an image repository: a new front-end - Peter Herndon (Memorial Sloan-Kettering Cancer Center)

Summary: using Django as a CMS, internally developed adapters to Fedora 3.1.

My gut feeling: A specific use case, images only, so rather limited in scope. Despite choosing the 'hook up with mainstream package' strategy, effectively still a NIH-based rolling their own. That makes the issues even more instructive.

Adapting a CMS that expects SQL underneath is challenging - the plugin needs to be a full object-to-relational database mapper.
Also, Fedora 'delete' caused 'undesired results', 'inactive' should be used.
Further, some more unexpected oddities: had to write their own LDAP plugin to make it work, django has tagging but again plugin was needed to limit this to controlled vocabularies. Performance was not a problem.
Interesting: repository for images only, so exif and the like can be used - tags added using Adobe Bridge! The tested, successful strategy: make use what is already familiar.
In the Q&A the question came up: why use Fedora in this case anyway? Indeed the only reason would be preservation, otherwise it would have saved a lot of trouble to use Django Blobstore.

The django-fedora plugins are available at

3. Islandora: a Drupal/Fedora Repository System - Mark A Leggott (University of PEI)

Islandora looks *very* promising. I noted before (UPEI's Drupal VRE strategy) that UPEI is a place to watch - they are making radical choices with impressive outcomes.

UPEI's culture is opensource friendly. They use Moodle and Evergreen (apparently, they were the first Evergreen site in production).

Rationale: opensourcing an in-house system reinforces good behaviour: full documentation, quality code.

As noted before, UPEI's repositories are hidden behind VRE (see [link]). VRE's are geared towards the researchers. Example of approach: the first thing people do when they set up a VRE is create a webpage. That's what a project needs, and so it's used as a hook to reel people in, they're up and running within a few hours.

The VRE is Drupal; Fedora is for data assets, metadata, policies.
Base Islandora consists of three plugins: Drupal-Fedora connection plugin, xacml filter, rule engine for searches.

This 'rule engine' is indeed very cool.
In a later private conversation with Mark Leggott, he clarified that Islandora indeed uses an atomistic complex object model for research data; the rule engine declares how these can be searched from within Drupal. Example, a dataset consisting of a number of measuring points, each with a set of instruments, atomistically in Fedora; can be queried as 'all the results from specific measure point', 'all the result from instrument x', 'instrument x in specific period' etc.
We haven't reached Nirvana yet, to make the deconstructing of the data objects possible, they have to adhere to specific format (xml). But it's impressive nevertheless.

Other Drupal plugins add functionality for specific data. Impressive example: Drupal FCK editor used as TEI editor, after editing, automatically ads version to datastream. Very cool and 'Just Works' (cheery tweet).

Marine Natural Products Lab: best example of the setup for VRE which includes extensive repository (searchable within the critter xml).

Previous versions used drupal 5/fedora 2, not maintained; currently drupal 6/fedora 3.1

Q: did you replace the drupal storage layer, or do you sync?
A: sometimes it’s saved in the drupal layer, when it doesn’t need to go into fedora (temporary data, while we build the content model). Drupal filesystem is a potential bottleneck when large datablobs

Q: are you bound to content models?
A: standard fedora cm’s, you can build them yourself or change the delivered one. The models are exposed, you can see how it works. We first installed Fez to see how Fedora worked.

4. Project Hydra: Designing & Building a Reusable Framework for Multipurpose, Multifunction, Multi-institutional Repository-Powered Solutions - Tom Cramer (Stanford University), Richard Green (University of Hull), Bess Sadler (University of Virginia) et al.

I'm even more excited about Hydra than about Islandora. Different approach: create "A lego set of services". In other words, a toolkit for the common parts of applications.
It all looks really good. Two gotchas though. Firstly, it is still a work in progress. Can we afford to wait? Secondly, there are issues with the Unicode support of Ruby on Rails.

For more info: D-Lib.

Modelled after the current 12+ use cases of repositories in use at partner institutions, both institutional and personal.
It needs generic templates - which sometimes may do the job - otherwise it won’t come off the ground.
Hydra will have common content models and datastream names. But ultimately they want Hydra to be able to cope with almost anything. A MODS datastream will always have to be there, but not necessarily as primary (so can be done via dissemminator).

Four multifunctional sections:
  • Deposit
  • manage (edit objects, set access)
  • search & browse
  • deliver
  • plus plumbing: authent, author, complex workflow
Using Rails with ActiveFedora. Turns out Rails lives up to its reputation: they are way ahead of their initial roadmap, now expect full production app by fall.

Specs 3/4 ready, coding 1,5/4.

Presentation builds on top of blacklight OPAC. Virgina already has a beta version of their catalogue up using blacklight.

No comments: