Thursday, September 28, 2006

Too old for that

A team of students is investigating the use of social bookmarking in academic education as placement project [1]. I had coffee with two of them yesterday. Of course, the conversation turned to the social web, and came onto the myspace-phenonemon (which in Holland means

Student #1: "Well, I used to be on hyves... but I'm too old for that now."

Student #2: "Yeah, now you've got a girlfriend."

Mind you, these guys are barely twenty.

The technology gap is no longer one wide crack between generations. We're dealing with technology craquelée! [2] Not that this makes it easier to bridge it, but at least the burden's distributed more evenly.

Back to work - preparing for a presentation of Ex Libris' Primo tomorrow. Eager!

[1] the results of which will be published in the open, of course.
[2] technology craquelée, it sounds like the perfect name for a blog.

Monday, September 18, 2006

boasting a boost

Lists are lists, that is all they should be. A model only as good as the imagination of the creator.

With that out of the way, let me boast for a moment! Webometrics ranks my uni as the number 21 within Europe on its commitment to open access (July count). That's not bad, and we're the highest in the Netherlands.

There is so much work to do in changing the research workflow, it's hard to imagine how we're ever going to get there. So this is a nice little boost, meanwhile. Onwards!

Thursday, September 14, 2006

The perceived value of recommendations

On the TidBITS Talk list, a discussion was started on the quality of recommender systems. An rather laughable suggestion by email from Amazon prompted the original poster to ask why these systems are still mediocre at best.

It reminded me of the session on the Techlens project at the CNI fall 2005 task force conference. There were some interesting observations on when recommenders work, and when they don't (most in the Q&A, so not covered in the abstract).

There are two problems for such systems: the quality of the underlying data, and the problem of the desired neighbourhood. I'll start with the latter.

How widely do you, as user, want to have the recommendations vary? When you are new to a subject, you want the defining standard works - a narrow view. As you get more versed in the subject, you actually don't want those predictable results anymore, as you will be already familiar with them. Without surprise, it has no value. Different users want different results.

Research on users expectations showed that users were most content with a recommender service if it would give 5 suggestions (in an unobtrusive interface), as long as out of these five one or two would be 'interesting'. Keep in mind though that this was research on users in a strictly defined research field, which can't be translated directly to other fields, but it gives an indication, and at least it is real, non-anecdotal data.

How does this translate to amazon? Like the original poster, I get the occasional amazon suggestion by email, most of which I delete instantly. Only rarely they were actually interesting. As a result, I find them annoying or amusing, depending on the actual suggestion - and they irritate me almost as much as spam.

However, when I browse amazon, the recommendations are much less obtrusive, so I glance at them when I want, and then I sometimes do find something interesting in there. And I find myself agreeing with the outcome of the techlens research: my amazon miss:hit ratio is 25:1, and I would like more hits, but it needn't be 1:1.

Now the data. The suggestions depend on the quality of the data. The ACM techlens used citations to see which objects were linked. That provides high-quality information on the links between objects.

Amazon however has to rely on more primitive metadata, such as the author, and refines this with buying and browsing patterns. It is actually surprisingly good at this, but as with all 'social sites' (of which amazon arguably is the granddaddy) this needs a critical mass to get reliable. In the dustier corners of the inventory, you get oddball results.

(nothing new here BTW - until recently, in our rare books department, the quality or even availability of indeces of specialized collections depended totally on the personal interest of the specialist...)

A good recommender system will always give you some surprising suggestions. It may not always be the surprise you wanted, but if it would be predictable, it would be of no value at all! So by definition, there is a high miss to hit-ratio. The key is that the system must be unobtrusive enough, so the misses can be ignored.


PS: in the long run, this will all change, when the systems will be able to parse the actual objects and build relationships based on the content. There is a lot of research in this area, largely spin-off of 'Homeland Security' projects. But it is still years away.

Wednesday, September 06, 2006

SFX integrated in search results

Yes, another TICER-inspired post! FastSearch's Bjørn Olstad made a few interesting remarks in his talk. One of those I heartily agree with: search results should be rich enough so the user won't have to open each line to see if it is actually what he was looking for.

This, now, is where OpenURL resolvers such as SFX can shine. As currently implemented, SFX is Yet Another Button, a separate action - and thus a burden to the user. I don't have the statistics handy of UvA-Linker (the name we gave to our SFX - by the way, why does everybody have to give SFX its own name? It's very unhandy in spreading the word to users. But I digress.) I know we had to upgrade our server hardware, so it is used; however, I am positive it is still not used as much as it could, and that's a bloody shame, for it has so much potential.

And then, if the button is clicked, the services offered are pretty boring. Useful, sometimes; but not inspiring. Why offer links to google and amazon searches? They are a dime a dozen, you get those with your cereal. And as a current student, I probably googled already before I looked at the library's search service.

It could be so much better! SFX should work as a web service instead, that can be integrated wherever objects are displayed. Imagine a search results page (for instance in Metalib or an OPAC) where the results are enriched with SFX services! A little AJAX magic will do to insert a few lines. Direct link to fulltext. The first lines from the abstract, with a little button to display the remainer directly inline (again with AJAX). If available, the cover image from amazon or a dozen of other sources. Cited by links. Citation ranking. For chemical articles, graphic molecule displays. And who knows what other services the future holds.

Not just two clicks and a long wait away (not to mention all those new windows, ugh!) - instantly. It shouldn't be that hard for the current generation of openURL resolvers to add a web service interface (provided they *cough* have a simple and straight architecture for the publishing side).

I'm very curious what the SFX community thinks of this idea. The SFX conference is actually happening this week in Stockholm, so I'm a little late for that. When our SFX people are back from Sweden though, I'll see if I can convince them...

PS: I'm aware I am mixing the terms OpenURL Resolver and SFX liberally in this post, which strictly speaking is not correct. I must confess that I don't know what other players there are on this market besides Ex Libris. Note to self: check this out.

Monday, September 04, 2006

Thinking about the future research workflow - A TICER 06 post

The final talk of day 1 of TICER 2006 was by Herbert van de Sompel (abstract powerpoint). In short, it was an elaboration of the article Rethinking Scholarly Publication (D-Lib, vol 10, no 9) and the april 2006 meeting on Augmenting interoperability across scholarly repositories.

It's always a pleasure to hear Herbert talk. He's an inspiring presenter, which unfortunately is not too common amongst conference speakers. When he speaks, he makes clear what tends to be hard to grasp in his writings, that tend to have an information density that is hard to follow for mere mortals. When I read the D-Lib article a while ago, I did not fully grasp the depth and vision; with this talk, the penny dropped.

What really made this a great finish of a a day that was pretty good already, was that this was about thinking further into the future. The other talks were about the next step, current dilemmas, what to this year, maybe the next: using instant messaging and blogs to communicate with clients, or improving your OPAC's search results. All find and dandy, I certainly got some ideas; but the scope of this talk was a much wider vision, which might take a decade to come to fruition.

We've made the first transition, from writing articles on paper, to writing articles electronically. But the idiom of the research workflow, the way scientists and scholars cite sources, has not changed yet. We've merely swapped one medium for another. With this transition well on its way, it's time to re-think the whole scholarly workflow.

And the first step is to build a uniform mechanism for referencing objects. This is necessary for machines to follow the research workflow from one source to another, which will make it possible to build all kind of services on top of the objects. Think recommenders on basis of recent citations (recent as in: without the publication lag, that can run up to two years for slow-moving journals!). Think overlay journals that work. Think archiving services (LOCKSS) that function automatically. It could work! And if anyone could pull it off, it's van de Sompel, who brought us the OAI and OpenURL architecture (*).

But even if this particular direction won't be not the way, it is refreshing to think beyond the normal event horizon. The digitilization of the research workflow can mean so much more than the way the system is growing now, which, in Herberts words, is nothing more than scanning in printed journals with the paper left out. The revolution has only just begun.

(*) He's not alone, of course, he's surrounded by some really bright people, and LANL is the perfect place for this type of research. But he's the essential hub.

[Note: what I find very interesting about this proposal, is that it is a step on the road towards the original vision of Ted Nelson's Project Xanadu: an hypertext system where source and target not just blindly link, but are aware of each others existance. This goes way beyond our limited corner of the web, of course, but still].

Friday, September 01, 2006

On mixing work and play

Hello readers from Library Stuff. I peeked at the stats in a moment of curiosity and was pleasantly surprised. Thanks Steven! And a nice blog to boot - it's listed.

Commenting on my opening post, Library stuff makes a point about blurring the lines between work and play. Funnily enough, I'm writing this at nine on a friday evening, as I'm about to go dancing.

Yes, I want to show my true self here, my honest and personal self. After all, what's the point? That does not mean a full mix of work and play in one place though. After all, I do not live in my library. There's a time for working and a time for dancing. I will do the occasional work at home, and the occasional dance at work (yes, true story!), but both have their own place.

Same "blog voice" talking, just a different focus. Though the occasional vintage photograph may seep through.