Do scientists really believe in open science?

I am writing this post as a collection of the current status and opinions of “Open Science”. The main reason being I have a new audience; I am working for the CARMEN e-Neuroscience project. This has exposed me, first hand, to a domain of the life-sciences to which data sharing and publicly exposing methodologies has not been readily adopted, largely it is claimed due to the size of the data in question and sensitive privacy issues.

Ascoli, 2006 also endorses this view of the neuroscience and offers some further reasons why this is the case . He also includes the example of exposing neuronal morphological data and argues the benefits and counters the reticence to sharing this type of data.

Hopefully, as the motivation for the CARMEN project is to store and share and facilitate the analysis of neuronal activity data, some of these issues can be overcome.

With this in mind I want to create this post to provide a collection of specific blogs, journal articles, relevant links and opinions which hopefully will be a jumping-off point to understanding the concept of Open Science and embracing the future methodologies in pushing the boundaries of scientific knowledge.

What is Open Science?

There is no hard and fast definition, although according to the Wikipedia entry:

“Open Science is a general term representing the application of various Open approaches (Open Source, Open Access, Open Data) to scientific endeavour. It can be partially represented by the Mertonian view of Science but more recently there are nuances of the Gift economy as in Open source culture applied to science. The term is in intermittent and somewhat variable use.”

“Open Science” encompasses the ideals of transparent working practices across all of the life-science domains, to share and further scientific knowledge. It can also be thought of to include the complete and persistent access to the original data from which knowledge and conclusions have been extracted. From the initial observations recorded in a lab-book to the peer-reviewed conclusions of a journal article.

The most comprehensive overview is presented by Bill Hooker over at 3quarks daily. He has written three sections under the title “The Future of Science is open”

In part 1, as the title suggests, Bill presents an overview on open access publishing and how this can lead to open-science (part 2). He suggests that

“For what I am calling Open Science to work, there are (I think) at least two further requirements: open standards, and open licensing.”

I don’t want to repeat the content already contained in these reviews, although I agree with Bill’s statement here. There is no point in having an open science philosophy if the data in question is not described or structured in a form that facilitates exchange, dissemination and evaluation of the data, hence the requirement of standards.

I am unaware of community endorsed standard reporting formats within Neuroscience. However, the proliferation of standards in Biology and Bioinformatics, is such, that it is fast becoming a niche domain in its own right. So much so, that there now exists a registry for Minimum Information reporting guidelines, following in the formats of MIAME and MIAPE. This registry is called MIBBI (Minimum Information for Biological and Biomedical Investigations) and aims to act as a “one-stop-shop” of existing standards life-science standards. MIBBI also provides a foundry where best practice for standards design can be fostered and disparate domains can integrate and agree on common representations of reporting guidelines for common technologies.

Complementary to standard data structures and minimum reporting requirements, is the terminology used to described the data; the metadata. Efforts are under way to standardise terminology which describes experiments, essential in an open environment, or simply in a collaboration. This is the goal of the Ontology of Biomedical Investigations (OBI) project which is developing “an integrated ontology for the description of biological and medical experiments and investigations. This includes a set of ‘universal’ terms, that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain“. Already OBI is gaining momentum and currently supports diverse communities from Crop science to Neuroscience.

Open licensing of data may address the common arguments I hear for not releasing data, that “somebody might use it”, or the point blank refusal of “not until I publish my paper”. This is an unfortunate side effect of the “publish or perish” system as commented on bbgm and Seringhaus and Gerstein, 2007, and really comes down to due credit. In most cases this prevents real time assessment of research, complementary analysis or cross comparisons with other data sets to occur alongside the generation of the data, which would in no doubt enforce the validity of the research. Assigning computational amenable licenses to data, such as those proposed by Science Commons, maybe one way of ensuring that re-use of the data is always credited to the laboratory that generated the data. It is possible paradigm that “Data accreditation impact factors” could exist analogous to the impact factors of traditional peer-reviewed journals.

Open science may not just be be about releasing data associated with a peer-review journal, rather it starts from exposing the daily recordings and observations of an investigation, contained in the lab-book. One aspect of the “Open data” movement is that of “Open Notebook Science” a movement pioneered by Jean-Claude Bradley and the Useful Chemistry group, where their lab-book is is open and access-able on-line. This open notebook method was further discussed by a recent Nature editorial outlining the benefits of this approach. Exposing you lab-book could allow you to link the material and methods section of your publication, proving you actually did the work and facilitating the prospect of other researchers actually being able to repeat your ground breaking experiments.

Already many funders are considering data management or data sharing policies, to be applied to future research proposals. The BBSRC have recently released their data sharing policy which states that, “all research proposals submitted to BBSRC from 26th April 2007 must now include a statement on data sharing. This should include concise plans for data management and sharing or provide explicit reasons why data sharing is not possible or appropriate“. With these types of policies a requirement to research funding the “future of science is open“.

The “Open Science” philosophy appears to be gaining some momentum as is actively being discussed within the scientific blogosphere. This should not really come as a great surprise as science blogging can be seen as part of the “Open science” movement, openly sharing opinions and discourse. Some of the more prominent science blogs focusing on the open science ideal are Open access News, Michael Eisen’s Open Science Blog, Research Remix, Science Commons, Peter Murray-Rust.

There are of course alot more blogs discussing the issue. Performing an “open science search” on Postgenomic (rss feed on search terms please, Postgenomic) produces an up to the minute list of the open science discourse. Although early days, maybe even the “open science” group on Scintilla (still undecided on Scintilla) will be the place in the future for fostering the open science community.

According to Bowker’s description of the traditional model of scientific publishing, the journal article “forms the archive of scientific knowledge” and therefore there has been no need to hold on to the data after it has been “transformed” into a paper. This, incorporated with in-grained social fears, as a result of “publish or perish”, of not letting somebody see the experimental data before they get their peer-reviewed publication, will cripple the open science movement and slow down knowledge discovery. Computational amenable licences may go some way to solve this. But raising the awareness and a clear memorandum from the major journal publishers that, exposing real-time science and publishing data will not prevent publication as a peer-reviewed journal, can only help.

In synopsis I will quote Bill again as I think he presents a summary better than I could;

“My working hypothesis is that open, collaborative models should out-produce the current standard model of research, which involves a great deal of inefficiency in the form of secrecy and mistrust. Open science barely exists at the moment — infancy would be an overly optimistic term for its developmental state. Right now, one of the most important things open science advocates can do is find and support each other (and remember, openness is inclusive of a range of practices — there’s no purity test; we share a hypothesis not an ideology). “

This entry was posted on June 26, 2007, 9:39 am and is filed under bioinformatics, CARMEN, data standards, Journals Publishing, MIBBI, neuroinformatics, ontology, open data, open science, Social Media. You can follow any responses to this entry through RSS 2.0. You can leave a response, or trackback from your own site.

#1 by Jean-Claude Bradley on June 26, 2007 - 8:09 pm

Nice review of Open Science and good luck with CARMEN!

#2 by James Henderson on June 27, 2007 - 12:40 pm

Frank, your review of Open Science is of great interest to me and raises many issues that need to be addressed, sooner rather than later.

I think that there is a great deal of paranoia and mistrust in the world of ‘science’, some of which is justified. In fact, a colleague of mine was recently omitted from the author list of a paper despite having contributed significantly to the work. To add insult to injury, my colleague found that he had been mentioned in the acknowledgements, confirming that this wasn’t a case of ‘I forgot to put you on’, but something altogether more sinister. When my colleague challenged the lead author (his one-time-mentor) he was told that a decision had been made that having 4 names rather than 5 in the author list would be better for the paper (it’s a good job this chap hasn’t been involved in any genome mapping projects!). When I suggested to my colleague that he should contact the journal directly his response was ‘what’s the point, it has already been published’ and he’s right. I can only speak for myself, but I think situations of this nature would be eradicated if an ‘open’ approach was to be universally adopted.

I am in the process of publishing some interesting data myself yet I find the turn around time between submission and publication frustrating lengthy. My data is already out of date before it’s even published! Open science would allow me to present my findings immediately and at the same time, offers an arena in which data can be discussed, challenged and most importantly, shared.

The only way in which ‘Open Science’ can become a reality would be for the entire community to embrace it whole heartedly; I won’t be holding my breath.

Keep up the good work.

#3 by Bill on June 27, 2007 - 7:13 pm

Frank: it’s always good to meet another Open Science advocate! This sort of conversation is exactly what I was hoping the 3Quarks columns would stimulate.

James: I think your colleague was wrong, and you were right. He would have done better to contact the journal in question and ask that an addendum be published, adding him as an author. As long as we continue to shrug and say “what’s the point”, parasites like the lead author in question will continue to ride free on our backs.

In re: your own data, perhaps you could consider using Nature Precedings as a way of both establishing “ownership” and rapidly disseminating the information.

#4 by Jean-Claude Bradley on June 30, 2007 - 10:42 am

Bill is right – tools like Nature Precedings should be very effective in clarifying everyone’s contribution.

#5 by Bertalan Meskó on July 5, 2007 - 8:36 am

Great review! Kudos to you!

#6 by Antryg on January 3, 2008 - 2:57 pm

So, what happens when Open Science competes against Cathedral/Closed Science so much that it /threatens/ Cathedral/Closed Science’s Authority?

You think the violence committed by Microsoft against other companies is significant ( & what they did to Stac Tech e.g. was violence, the judge made that clear )?

Science-*establishment* involves governmental bodies, national budgets, military-backing, etc. . .

I find it naive to think that it’ll be simply unfolding pervasiveness for OS…

Will OS subsume all in the end? Probably: nature kills-off closed “evolution” very consistently. . .

Knowing that science is doing what gnu/linux did, though, at least one knows it takes years, it’s gonna have casualties, and it’s going to get “legal” in the nature of its conflict ( though I don’t yet know /how/ it’s going to do that ), and there are going to be at least 2 significant phases: up-hill & flowing-everywhere.

Yes, grant-giving bodies want OS, but the journals are not legal in public libraries, anymore: only paying/licensed subscribers may discover current knowledge through the electronic publications ( PLoS excluded, o’course ).

How is /that/ fight going to play-out? ( & it’s only one of many )