Archive for category bioinformatics
A new book has been released on the Semantic Web and the applications in Life sciences, intuitively entitled Sematic Web: Revolutionizing Knowledge Discovery in the Life Sciences. I have purchased a similar book before entitled Ontologies for Bioinformatics and was not overly impressed with its content. This book however, as pointed out by Duncan on Nodalpoint is written by people who are actually using the technology, I have met several of the people involved in authoring some of the chapters and I think I will definitely be purchasing a copy, or rather the lab will be purchasing a copy.
Open data is a concept I came across while attending the 2nd International Digital Curation Conference and felt it deserved a post in its own right, rather than subsumed by the conference report. I am an advocate of Open Access and feel that Open Data must be a part of this process. What is the point of being able to freely read the publication if you cant freely access the data the publication refers to?
The concept of Open Data was presented to me by Peter Murray-Rust, via the DCC conference, who regularly blogs about the subject. There is a Wikipedia entry which defines the concept and a mailing list which promotes discussion of Open Data.
Steps are underway in the bio-science domain to define Minimum reporting requirements of data for repositories and publications. Two prominent examples are the MGED community with MIAME and the Proteomics community with MAIPE. Within these noble efforts there is no mention of Open Data although it seems the next logical step in the data curation pipeline;
- record the defined minimum information and metadata.
- structure and present the data.
- allow access of the data
Maybe when the effort is made to properly record, structure and describe the data, as these minimum reporting requirements advocate, the scientist and journals will be only too happy to take the next step and declare it Open Data, for the sake of scientific knowledge and progression.
The Data Curation Center has just held their 2nd International Digital Curation Conference, in Glasgow. The official DCC blog for the conference tracks the thoughts and discourse over the two days and the full program can be found here. As the conference name suggests, the meeting has the particular focus on different aspects of the digital curation life-cycle including managing repositories, educating data scientists and understanding the role of policy and strategy.
One particular talk I was interested in was “The Roles of Shared Data Collections in Neuroscience”. This was presented by a social scientist, as the results of communications with Neuroscientists. Ironically the shared data collection was called “NeuroAnatomical Cell Repository” a pseudonym to “protect the confidentiality of the participants”, so much for the “shared” component! The general conclusions re-iterated what is already know in the bio-sciences; that more experiments are producing large volumes of heterogeneous data that need to be stored, preserved and presented in a manner that allows the efficient use and re-use of the data. There was particular mention that Neuroscience doesn’t have any data reporting standards, a particular buzz-topic in biological sciences.
As a result of this talk, the issue of how we publish this data was again raised, with the provoking statement from the floor, “we should bypass the traditional journals and publish the data ourselves” (a summation of the statement, not an actual quote). This is an issue I have been hearing more and more at recent conferences and in general discussions, a topic that appears to be gathering momentum. Some discourse has already been presented within this blog on some of these issues.
The open panel session on day two, engaged some interesting discussion and I heard a term which I had never heard before, “Open Data“, put forward by Peter Murray-Rust (University of Cambridge). We have all heard of Open Access publishing, (and should not be publishing any other way), but to date this means open access to the journal publication and not the the data that the publication refers to. In something as simple as a graph in a journal publication, generally the access to the numbers/values, has to be re-calculated via a print-out and a ruler. It would be so much easier (and logical) for re-use, analysis or even review, if the presented image was accompanied by the data (even if this was in an excel spreadsheet).
So in summation, the conference presented numerous issues for consideration by a “data scientist” (this may well be the new name for bioinformaticians). The concept of digital data curation is something that is becoming more prevalent in the life-sciences both at the level of the bench scientist (generating metadata) and the analysis, presentation and preservation of the resulting data. No doubt conferences like the DCC will continue to grow in stature and the issues will be further presented in their newly launched International Journal of Digital Curation.
Today is my first day in my new job as a RA for the CARMEN project, which stands for Code, analysis, repository and modelling for e-Neuroscience. My role in this exciting project, is as a Metadata researcher for experiment context, which should involve ontology development and data representation in neuroscience.
Just as a matter of interest I thought I would compare bioinformatics and neuroinformatics on Google Trends. Looks as if I have plenty of work to do!
I cant lay claim that I discovered this all on my own, but as mentioned on Dan Swan’s blog PubMed can be added as a search engine in Firefox 2.0. I checked this out for Hubmed too, a much nicer interface to PubMed, with an RSS feed for search query’s and it works very well.
"Called EXPO, it can be used to translate scientific experiments into a format that can be interpreted by a computer."
Wow! translate experiments, that's impressive I would love to find out how the scrawny hand-writing, contained inside the standard lab notebook, dog-eared and drenched in all manner or reagents, gets translated into an ontology, that's more impressive than the ontology itself.
However on a more serious note, I agree with the concept, that something like this should exist. It would definately help in dissemination and analysis of data (providing the data is freely provided and in a standard structure of format), but this kind of process, an ontology for all of science, would have to have a major open development across all scientific disciplines, as FUGO (Functional Genomics Investigation Ontology) are doing, in order to be agreed upon and for the claim that it represents all scientific experiments.
Going by the article, and snooping around the EXPO site, a cross discipline development process doesn't appear to have taken place, or planned for the future (my apologies if this is not the case). And according to the article it has been tested on two use-cases, one on particle physics and one on evolutionary biology, this must be the only wet-lab science that exists these days. It would be interesting to see if the large scale collaboration effort of FUGO agrees with or already has a similar structure to EXPO
A quote at the bottom of the article says "Software to speed up this process could be a big boost ", I think he meant to say, "This would be impossible without software being available from the wet-lab bench, to data storage, to the publication process".
There is an interview on Wired magazine on Harold Varmus, the Nobel laureate and former director of the National Institute of Health. The premise of the article is not on his prize winning exploits but rather his involvement and the setting up of PLoS (The Public Library of Science), which is an open access scientific publishing platform. Many comments and opinions exist on PLoS and its impact but I think the most interesting section of the article is on the new PLoS journal which is due to be launched in the summer called PLoS ONE.
"And this summer, Varmus and his colleagues will launch PLoS One, a paperless journal that will publish online any paper that evaluators deem “scientifically legitimate.” Each article will generate a thread for comment and review. Great papers will be recognized by the discussion they generate, and bad ones will fade away. "Our mission is to transform how science publishing is done,” Varmus says. “We aren’t trying to torpedo the industry. But we are definitely going to change it.” "
This effectively destroys the current peer review system and allows for continuous peer review or an article as more knowledge and opinions develop over the years. It will also permit comment by "experts" that would not have been involved in the traditional peer-review process.
Although there also could be a dark side to the comments where "my paper is better than your paper", or biased comments where a competing interest is evident.
All in all however, I think this is a massive step forward and a massive shake up of the scientific publication process. I hope there is a strong emphasis on providing all the available data in the appropriate standard formats for the domain and if this is the case then I am all for it.