Thursday, March 06, 2014

IDCC14 notes, day 2: keynote Atul Butte

Part 2 in a series of notes on IDCC 2014, the 9th International Digital Curation Conference, held San Francisco, 24-27 feb.

Day two kicked off with a fantastic keynote by Atul Butte, Associate Professor in Medicine and Pediatrics, Stanford University School of Medicine: Translating a trillion points of data into therapies,diagnostics and new insights into disease [PDF] [Video on Youtube]. This one was well worth a separate blogpost. 

Butte starts his presentation with some great examples of how the availability of a wealth of open data has already radically changed bio/medial research. Over one million datasets are now openly available in the GeneChip standardized format. A search for breast cancer samples in NCBI Geo datasets database gives 40k results, more than the best lab will ever have in their stores. And PubChem has more samples than all pharma companies combined, completely open.

The availability of this data is leading to new developments. Butte cites a recent study that by combining datasets revealed ‘overfitting’, where everybody does an experiment in exactly the same way leading to reproducable results that are irrelevant to the real world.

But this is tame compared to the change in the whole science ecosystem with the advent of online marketplaces. Butte goes on to show a number of thriving ecommerce sites - “add to shopping cart!” - where samples can be bought for competitive prices. Conversant Bio is a marketplace for discarded samples from hospitals, with identifiers stripped off. Hospitals have limited freezer space, have biopsy samples that can be sold, and presto. What about the ethics? "Ethics is a regional thing. They can get away with a lot if stuff in Boston we can't do in Palo Alto." Now any lab can buy research samples for a low price and develop new blood marker tests. This way recently a test was developed for preeclampsia, the disease now best known from Downton Abbey.

Marketplaces also have sprung up for services, such as This is a clearinghouse for medical research services, including animal tests. Thousands of companies provide these worldwide. Butte stresses that it's is not just a race to the bottom and to China, but that this also creates opportunity for new specialised research niches, such as a lab specializing in mouse coloscopies. Makes it possible to do real double blind tests by just buying more tests from different vendors (with different certifications, just to spread). This makes it especially interesting to investigate other effects of tested and approved drugs. Which is a good thing, because the old way of research on new drugs is not sustainable when patents run out (the “pharma patent cliff of 2018”). 

This new science ecosystem is built on top of the availability of open data sets, but there are questions to be solved for the sustainability. Butte sees two players here, funders and repositories themselves.
Incentives for sharing are lacking. Altmetrics are just beginning, and funders need to kick in. Secondary use grants are an interesting new development. Clinical trials may be the next big thing. The most expensive experiments in the world, costing $200 mln each. 50% fails and not even a paper is written about them... Butte expects funders to start requiring publications on negative trails and publishing of the raw data.
The international repositories are at the moment mostly government funded and this may run out. Butte thinks that mirroring and distributing is the future. He also stresses that repositories need to bring the cost down - outsourcing! - and real show use cases, that will inspire people. The repositories that will win are the ones that yield the best research.

No comments: