Monday, May 25, 2009

OR09: Institutional Repositories: Contributing to Institutional Knowledge Management and the Global Research Commons

Day 1, session 2b.

Institutional Repositories: Contributing to Institutional Knowledge Management and the Global Research Commons - Wendy White (University of Southampton)

Insightful, passionate kick-ass presentation, with some excellent diagrams in the slides (alas I found no link yet), especially one that puts the repository in the middle of the scientific workflow. The message was clear: tough times ahead for repositories – we have to be an active part of the flow, otherwise we may not survive.

Current improvements (see slides: linking into HR instead of LDAP to follow history of deployment, lightbox for presentation of nontext material) are strategy-driven, which is a step forward from tech-driven, but still piecemeal.

Predicts grants for large scale collaboration processes could be tipping point for changing lone researcher paradigm.

(in my opinion, this may well be true for some fields, even in the humanities, but not for all. Interesting that for instance The Fascinator Desktop aim to serve those ‘loners’).

Stress that Open access is not just idealism, it can also benefit in highly competitive fields – cites a research group that got a contract because the company contacted them after they could see what their researchers where doing.

“build on success stories: symbols and mythology”.
“Repository managers have fingers in lots of pies, we are in a very good position to take on the key bridging role.”
It will however require a culture change, also in the management sphere. In the Q&A she noted that Southhampton is lucky to have been through that process already.

All in all, a good strategic longer term overview, and quite urgent.

Sunday, May 24, 2009

OR09: PEI's Drupal strategy for VRE and repositories

OR09, day 1, session 2a. Research 2.0: Evolving Support for the Research Landscape by Mark Leggott (University of PEI) - [slides here]  - [blog here]

Small province in Canada, middle of nowhere, pop 140k, only uni on the island. UPEI is doing very some good stuff, made some radical choices. They fundamentally transformed the library from traditional staff to techies. Number of staff didn’t change (25), but the number of techs increased from 1 to 5, plus a pool of freelancers.

VRE's using Drupal

Strong push for VRE’s, using Drupal as platform. Low entry barrier: any researcher can request one! All customisations are non-specific as a rule, so all users benefit in the end. If researcher brings additional funding, contract devs are hired to speed up the process.

Some clients have developed rich Drupal plugins themselves (depends on a willing postgrad :-)

Currently 50+ VRE’s. Example of a globe-spanning VRE: Advancing Interdisciplinary Research in Singing

But the same environment is also used for local history projects with social elements (“tag this image”).

Why going opensource? Improves code and documentation quality by emberrassment factor: “Going opensource is like running through the hotel at night naked – you want to be at least presentable”.

Repository: Drupal+Fedora=Islandora

PEI developed Islandora as frontend for Fedora repository. However, from the users POV it is completely hidden: they log in to the VRE, this silently handles depositing in the rep.

Both Drupal and Fedora are ‘strong systems’ with a lot of capabilities. However by definition all data and metadata go in Fedora, to separate data from application layer and make migration possible. This needs to be strongly enforced as some things are easier in Drupal.

Very neat integration betwee data objects in repository and VRE: Researchers can search specifically within the objects, as in “search for data sets in which field X has value between 7 and 8”. Done by mapping the data to an xml format, then mapping xml fields to search params. For fields where xml data formats are available and commonly used this is a real boon (example of marine biology).

Great stuff altogether. The small size may give them an advantage, they operate like a startup, listen to their users, pool resources effectively and are not afraid to make radical choices.

BTW fifteen minutes in the talk I connected the acronym PEI with the name Prince Edward Island. PEI must be so famous in the repository world that it either needn't be explained at all, or that it was mentioned so briefly that it slipped me by...

OR09: Purdue's investigation on Research Data Repositories

OR09 day 1, session 2a: Michael Witt (Purdue University) "Eliciting Faculty Requirements for Research Data Repositories

Preliminary results of investigation in what researchers want regarding data (repositories). Some good stuff. Hope the slides will be published soon - or the report for that matter.

See Seans weblog for the ten primary questions, good for self-evalution also. Mark Leggott then quickly added an additional 11th question to his slideshow - how much is in your wallet...

Method: interviews and followup survey with twenty scientists, transcribed (using Nvivo). “It was like drinking from a firehose.” For each, a “data curation profile” was created, with example data & description. Will beinteresting when it comes out.

OR09: on subject based repositories

Open Repositories 2009, day one, session 1b.

Phew! OR09 is over, and my jetlag almost. An intense conference that was certainly worth it, the content was generally interesting and well-presented. I'll be posting my conference notes here the coming few days.

First session on Monday morning were two talks on two subject based repositories. The planned third one, on a Japanese one, was cancelled - unfortunately as I know very little of what’s happening there regarding OA.

First came Julie Ann Kelly (University of Minnesota) on AgEcon, a repository for Agricultural Economics, a field with a strong working paper tradition. It was set up in the gopher days (not so surprising, as the critter originated in Minnesota).

Interesting was the reason: in this fields, working papers are citable, but the reference format was a mess.

Even more interesting: because of this, it also became the de facto place for depositing appendices to articles - datasets! The repository accepts them and they have the same citing format. There is a lesson here... solve a real problem, and content will come.

Usage statistics: only 53% of downloads comes from people, 43.6% is googlebot (rest other spiders). 66% of visitors come through google straight to results, not through the frontend anymore. Then 19% are some other search engines: leaves 14% coming through front.

Further notes:

Why is life easier in a subject repository?

  • Focussed topic makes metadata easier, common vocabularies exists etc.
  • Recruitment (of other institutions) is easier (specialists in one profession tend to meet frequently, recruiting can piggyback on conferences etc).

And why is it harder?

  • organising the community is hard work - 170 institutions with each between 1 and 300 submitters creates a lot of traffic on quality issues. They frequently hire studens for the correcting.

Minnesota is consolidating its repositories from 5-6 different systems to Islandora. AgEcon will be one of them.

They want to use this Drupal based system also to add social networking, akin to Ethicshare. Ethicshare is interesting: a social citation manager (a la Citeulike/Bibsonomy) plus repository plus social network plus calendar and then some more, for a specific field of study, in this case ethic research. Commoditisation coming?

The second subject repository was on Economists Online, presented by Vanessa Proudman of Tilburg University. Interesting to see this is in many ways the opposite approach. EO is a big European project that works top-down, tries to get the big players aboard first as incentive for the others, and emphasizes quality above all. Whereas AE was a grassroots bottom-up model, that empowered small institutions.

It's a work in progress, only mockups shown. These look slick, with a well thought-out UI. Interesting: with every object in the result list, statistics will be shown inline (ajax), and can be downloaded in multiple formats.

Small pilot with 10 datasets per participating institution, DDI format, Dataverse as preferred solution. Provenance of datasets is very complicated: there are many contributors to the data life cycle, dataset owners, sources, providers, all must be accredited.

Like AE, EO stresses that subject-based repositories have different characteristics. They will organize a dedicated conference on subject repositories in january 2010 in London, as they note that the subject rarely comes up at general repository conferences.

Interest in attending: mail subjectrep_parts@lists.uvt.nl

Friday, October 03, 2008

RFID for libraries: HF or UHF? (2)

Finally! With the European tender wrapped up, Autocheck Systems chosen as our partner, our RFID project is finally on its way. Now I can share a little more on the technology choice, following up on a post from, *cough*, one year ago.

When we started preparing, it looked like UHF had great potential to overcome some of the shortcomings of HF. To check whether this would work in practice, we organized a test in our stacks with UHF gear together with one of the major vendors. The test looked specifically at speed and reliability of inventory with a hand-held device. Unfortunately, the test resulted in a muddled answer: UHF showed great potential indeed, but needed more finetuning to get consistent results. Meanwhile, other HF vendors were showing that they were still able to tweak their systems further to reach speeds that, although not as high as UHF's, were still closing the gap to the point where reading with a hand-held was becoming notably faster than checking by eye.

Because of this, we decided not to specify HF or UHF in the tender. Instead, we asked for vendors to specify the performance of their system in terms of speed and accuracy for three scenarios. The lower the speed and the higher the accuracy, the more points could be earned, calculated on a logarithmic scale, starting from zero at 98% accuracy and a different number of seconds for each scenario, via hundreds or so for expected HF speeds to thousands for UHF.

To prevent a vendor bluffing, these numbers would need to be proven in a trial setup, failing which would lead to automatic exclusion. You could say that we tested the trust the vendors had in their systems.

Now I obviously can't give out details of the bids, but here's the general outcome. UHF vendors scored well, but at a relatively high price. And some - though not all! - HF vendors wrote in with a performance higher than expected for HF, though still below the UHF figures. The clear winner, Autocheck, was one of these high-performing HF vendors. They scored best on the combination of high HF performance with a very decent pricetag (needless to say, they were able to prove their performance figures).

So, an interesting outcome, not quite what we expected. A side effect is that it changed my opinion on the European tendering process. Yes, it is tedious, bureaucratic and can lead to unexpected results. But by tendering for functional requirements, rather than for a specific technology, we actually ended up with a good deal. That it was not what we expected is all the better. The trick is to properly investigate what you want and specify that, rather than how you want it done.

Tuesday, September 23, 2008

Selfportrait with legal document

Finally, we're almost there with our BDP. I hope that I can 'zip up the kimono' (*) on our RFID plans (and imminent execution) in just a few more days...

(*) Steve Jobs' words on being allowed to visit Xerox PARC.



Thursday, February 21, 2008

We have a tender!

Quite a long time ago I started what I hoped would be a series on UHF vs HF RFID. That proved most optimistic, and the real world came in the way.

However, after a *lot* of work the past weeks and months, which I could not write about - oh the horror - I can safely announce that we have sent our Tender Request off to TED (Tenders European Daily). This is the followup to an earlier pre-announcement we made on TED - which is handy, because having pre-announced, we can shave off some time off the procedure that follows. Unfortunately, not much time to write about it now - but if you're interested, keep an eye on TED for the publication.

And what came of the HF vs UHF question? In the end, we decided to specify on functionality rather than choosing ourselves, leaving it to the companies to propose the best system for our wishes. In our quality demands, we do value speed and reliability with a formula which multiplies the squared values for these. This makes for an interesting challenge, we hope - a functioning UHF system that lives up to the expectations could earn a *lot* of points, but it would need to be both fast and reliable for that. And the values need to be proven in a proof of concept.

More later... interesting times!

Thursday, January 10, 2008

Chişinău (Moldova) - National Library

With all the innovation we're working on, it's hard to remember that the library world has many facets. This is a blast from the past. Not just the picture; check out the description of the bureaucracy around obtaining a book in the flickr page. Library 1.0, it seems so long ago.

Sunday, January 06, 2008

A virtual library in Urbino, Italy

I've been having some wonderful time off in Italy with my wife (flickr set) and, as a sign that I am to return to my working life, the television news today made a big deal of a new Virtual Library in Urbino. I've looked for a web site but found none, the library may very well have one but my italian is still pretty rudimentary. But luckily, this blog posting has some pictures.

There was a big event in the village, the ciaspolada, and the family made sure to not miss a single bit of coverage, hence I saw an unusual amount of tv news (of dubious quality) today. They all made a big deal of the interface, which is indeed interesting: fully gesture-based. You either sit or stand in front of a wall, on which an image of an ancient library (leather bound volumes) is projected, and by gesturing wildly with your arms, you can take a book from a shelf, and the flip through it. I didn't see any library staff interviewed, apart from the interaction designer, whose name I sadly can't recall now.

All old manuscripts of course, at least for the demo. As a library professional I would be interested in the approach to the digitization to these works, as from this coverage I have the impression that they were scanned cover to cover. We know that costs an arm and a leg, so which selection criteria were applied? Also, no mention of how to find a book in this library apart from the pretty but pretty useless picture of the bookshelves, or the amount of calories browsing takes with the many wide arm movements.

But still, way to go, Urbino! And if someone could point me to a website I'd be most grateful.

Arrivederci,
Driek

Tuesday, November 06, 2007

Yet another danger of the current state of copyright...

Techcrunch reports on the launch of Attributor, a startup company that monitors copyright infringement on the web. The timing is interesting: just now there is a mild uproar in some blog circles in the Netherlands about Cozzmozz. Where Attributor only monitors, and leaves it to the copyrightholders to decide whether and which action to take, this company takes it a significant step further: they also takes legal action on behalf of the authors (in exchange for a nice cut of course).

This blog quoted a short article in full. Dutch copyright law allows for a kind of 'fair use', which is vaguely formulated though, making this a borderline case. The full article was quoted, a no-no, but it was so short it would have been difficult to leave out something in the context of the argument made against the stance of the interview. This took place a while ago. Then out of the blue, several days ago Cozzmozz threatened to sue but offered a deal, first 240 then 160 euro instead of 600, which the blogger in case chose to pay. Though offers of support were flooding in, as an ME patient, her energy is limited and she decided that she could not afford a long story.

The real danger here is that when copyright holders transfer their rights to outifits, which exploit them in exchange for a cut, the grey area between the legal and the moral right disappears. Would the freelance journalist who wrote the article have chosen to sue this blogger herself?

What's legally right may not differ from what's morally right. Yet another reason to clear up copyright laws.

(edited 6/12, 20.20 - made the difference between attributor and cozzmozz clearer).