Monday, June 15, 2009

OR 09: three more neat Fedora implementations

Open Repositories 2009, Day 4

Three more notable sessions on implementing Fedora. Hopefully, the penultimate post before a final round-up. What a frantic infodump this conference was...


Enhanced Content Models for Fedora - Asger Blekinge-Rasmussen (State and University Library Denmark)

A hardcore technical talk, though impressive in the elegance of the two points shown: bringing the OO model to Fedora object creation, and a DB style ‘view’ for easy creating searching and browsing UIs.

The first is created as an extension of Fedora 3’s standard Content Models, yet backward-compatible, which is a feat. Notable extra’s: declares allowed relations (in OWL lite), schema for xml datastreams. Includes validator service (which is planned as disseminator, too). Open source [sourceforge].

Fedora objects can be manipulated at quite high level using API, but population needs to be done at much lower level. Thus most systems roll their own. Our solution: templates, data objects created as instances of CM’s, not unlike OO programming. Makes default values very easy. No need for handcoded foxml anymore, halleluja! Create, discover, clone templates using template web service.

Then there are repository views, which bundle atomic objects into logical records. Search engine record might be made up of bundle of Fedora objects.
Defined by annotated relations; view angles to create different logical records.
‘view = none’: then omitted from results (useful for small particles you don’t want to have show up in queries, for instance separate slides).

These simple API additions make it easy to create elaborate, simple GUI’s. Which includes the first one I’ve seen that comes close to a workable interface for relationship management - not quite a full drag’n drop, but getting there.


Beyond the Tutorial:Complex Content Models in Fedora 3 - Peter Gorman, Scott Prater (University of Wisconsin Digital Collections Center)
[presentation]

Summary: A hands-on walk through of the Wisconsin DIY approach. Also, an excellent example of what a well-done Prezi presentation can look like: literally zooming in on details then zooming out on the global context was really helpful to see the forest for the trees.

The outset: migrating >1million complex, heterogeneous digital objects into Fedora. Use abstract CM’s, atomistic, gracefully absorb new kinds and new combinations of content. Philosophy: 'fit the model to the content, not the content to the model'.
(Not in prodction yet, prototype app; keep eye out for 'Uni Wisconsin digital collections')

Prater starts out with the note that it’s humbling to see that the Hydra and escidoc people have been working on the same problem. However IMHO there’s no reason for embarrassment, as their basic solution is very elegant.

Using MODS for toplevel datastream (similar approach to Hydra). STRUCT datastream: a valid METS document, tying objects to hierarchy. Important point: CM’s don’t define structure, that’s for STRUCT and RELS-EXT.

Every object starts with a FirstClassObject, which points to 0-n child objects of arbitrary types. If zero it’s a citation. To deal with sibling relationships (ie 2 pages in specific order), an umbrella element is put on top with a METS resource map. This allows full METS functionality. Linking using simple STRUCT and RELS-EXT. Advantage over doing everything in RESLEXTS: that doesn’t allow to express sequencing.

Now, to tie this ‘object soup’ together in an app (common problem for lots of objects, to turn the soup into a tree), the solution is simple: always use one monolithic disseminator, viewMETS(). This takes PID for FirstClassObject, returns valid METS doc containing object and all its (grand)children.

This is brilliant: a one-stop API to get the full object tree from a given PID, hiding the complexity of the umbrella object and the METS description involved.

The only part they’re not very satisfactory yet about is how to relate related items between FirstClassObjects and relations between two top-level logical objects (ie journal and article) that are sometimes parent/child, sometimes not.

To which Asger chimed in that his ‘angle view’, demonstrated in the talk before, would be a possible solution for this. I saw them discussing later... I love it when a plan comes together.


When Ruby Met Fedora- Matt Zumwalt (Media Shelf)

A live demonstration of ActiveFedora which made my fingers itch to start coding straight away - until I remembered Ruby’s Unicode issues, rats.

The philosophy behind: use Fedora for long-lived content, but be able to quickly create short-timed services and apps.

ActiveFedora can be used without Rails, or even without Ruby (you can call it from the shell). However, Ruby’s OO model maps very well on Fedora. The key difference with say java or C++: you don’t know what kind of object you’ll get back to a call.

The demo shows the standard rails environment, except the Model directory. There, calls to ActiveRecord are replaced with calls to ActiveFedora. AF exposes Fedora objects with multiple properties. Qualified DC is built-in, but the has_properties function allows for easy extension.

An interesting advantage of this approach is that the methods as used by developers use the same jargon as the metadata users are used to. “they communicate much better when a method’s called dc.subject.”

There’s quite a bit to do ATM. They’ve received funding to hire a student to finally write real documentation. Other extensions: built-in SOLR integration, more generators for standard situations, basic CM integration. Interesting is the approach to integrating MODS: use the existing, mature java libraries, which is easy when using JRuby as interpreter.

1 comment:

紅包 said...
This comment has been removed by a blog administrator.