Archive for category open data
I described in an earlier post that data sharing in Neuroscience is relatively non-existent. Some commentary on the subject has appeared since then via the 2007 SfN Satellite Symposium on Data Sharing entitled Value Added by Data SharingLong-TermPotentiation of Neuroscience Research, published in Neuroinformatics. I was also excited to see an article published last week Data Sharing for Computational Neuroscienc, also in Neuroinformatics. However, there is a caveat or two. Apart from ignoring all the data representation issues presented in other domains such as bioinformatics, the re-use of data models such as FuGE, or contribution to ontology efforts such as OBI, all these articles are not open access! How ironic, or should that be how embarrassing. Phil also covers this issue in his blog.
Oh well, looks as if there is still a challenge in the domain of Neuroscience for access to valuable insights into information flow in the brain. Who want to know how the brain works anyway? You can always pay $32 to springer if you want to find out.
When finnished, I would have liked it to be published some where like Nature Preceedings, however they only accept proprietary Microsoft files and pdfs rather than XML documents. I also though of creating a Google code project for it, but it seems quite elaborate for something nobody else would be contributing to and once completed would be rather static. Any suggestions are very welcome.
The full complement of the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI), Minimum Information About a Proteomics Experiment (MIAPE), recommended reporting guidelines are now available for community review on the Nature Biotechnology website.
The manuscripts range from the definition of the MIAPE concept to the individual guidelines themselves which cover, Mass Spectrometry, Mass Spectrometry Informatics, Gel Electrophoresis and Molecular Interaction experiments. A further paper on the PSI protein modification ontology (PSI-MOD) is also listed.
Several other Minimum reporting requirements are also listed from other domains such as genome sequences (MIGS) and in Situ Hybridization and Immunochemistry (MISFISHIE).
There is also a paper on the Functional Genomics Experiment Model (FUGE) which is an “extensible framework” or data model for standards in functional genomics, although equally extensible to most scientific experiments.
As this process of community review, hosted on the Nature Biotech website, is a relatively new process and open to anyone, both identified or anonymous, I would encourage anybody with the relevant knowledge to comment on the papers. A greater response from the community ultimately means the guidelines are actually representative of the domain and technology they represent.
Open data is a concept I came across while attending the 2nd International Digital Curation Conference and felt it deserved a post in its own right, rather than subsumed by the conference report. I am an advocate of Open Access and feel that Open Data must be a part of this process. What is the point of being able to freely read the publication if you cant freely access the data the publication refers to?
The concept of Open Data was presented to me by Peter Murray-Rust, via the DCC conference, who regularly blogs about the subject. There is a Wikipedia entry which defines the concept and a mailing list which promotes discussion of Open Data.
Steps are underway in the bio-science domain to define Minimum reporting requirements of data for repositories and publications. Two prominent examples are the MGED community with MIAME and the Proteomics community with MAIPE. Within these noble efforts there is no mention of Open Data although it seems the next logical step in the data curation pipeline;
- record the defined minimum information and metadata.
- structure and present the data.
- allow access of the data
Maybe when the effort is made to properly record, structure and describe the data, as these minimum reporting requirements advocate, the scientist and journals will be only too happy to take the next step and declare it Open Data, for the sake of scientific knowledge and progression.
The Data Curation Center has just held their 2nd International Digital Curation Conference, in Glasgow. The official DCC blog for the conference tracks the thoughts and discourse over the two days and the full program can be found here. As the conference name suggests, the meeting has the particular focus on different aspects of the digital curation life-cycle including managing repositories, educating data scientists and understanding the role of policy and strategy.
One particular talk I was interested in was “The Roles of Shared Data Collections in Neuroscience”. This was presented by a social scientist, as the results of communications with Neuroscientists. Ironically the shared data collection was called “NeuroAnatomical Cell Repository” a pseudonym to “protect the confidentiality of the participants”, so much for the “shared” component! The general conclusions re-iterated what is already know in the bio-sciences; that more experiments are producing large volumes of heterogeneous data that need to be stored, preserved and presented in a manner that allows the efficient use and re-use of the data. There was particular mention that Neuroscience doesn’t have any data reporting standards, a particular buzz-topic in biological sciences.
As a result of this talk, the issue of how we publish this data was again raised, with the provoking statement from the floor, “we should bypass the traditional journals and publish the data ourselves” (a summation of the statement, not an actual quote). This is an issue I have been hearing more and more at recent conferences and in general discussions, a topic that appears to be gathering momentum. Some discourse has already been presented within this blog on some of these issues.
The open panel session on day two, engaged some interesting discussion and I heard a term which I had never heard before, “Open Data“, put forward by Peter Murray-Rust (University of Cambridge). We have all heard of Open Access publishing, (and should not be publishing any other way), but to date this means open access to the journal publication and not the the data that the publication refers to. In something as simple as a graph in a journal publication, generally the access to the numbers/values, has to be re-calculated via a print-out and a ruler. It would be so much easier (and logical) for re-use, analysis or even review, if the presented image was accompanied by the data (even if this was in an excel spreadsheet).
So in summation, the conference presented numerous issues for consideration by a “data scientist” (this may well be the new name for bioinformaticians). The concept of digital data curation is something that is becoming more prevalent in the life-sciences both at the level of the bench scientist (generating metadata) and the analysis, presentation and preservation of the resulting data. No doubt conferences like the DCC will continue to grow in stature and the issues will be further presented in their newly launched International Journal of Digital Curation.