Archive for category bioinformatics

CFP: Bio-Ontologies 2008: Knowledge in Biology

Call for Papers for Bio-Ontologies 2008. Submissions are now invited Bio-Ontologies 2008: Knowledge in Biology, a SIG at Intelligent Systems for Molecular Biology 2008.

Key Dates to remember:

  • Submission due: Friday 2nd May
  • Notifications: Friday 23rd May
  • Final Version Due: Friday 30th May
  • Workshop: Sunday 20th July


Bio-Ontologies has existed as a SIG at ISMB for more than a decade, making it one of the longest running. For this time, Bio-Ontologies has provided a forum for discussion on the latest and most cutting edge research on ontologies. In this decade, the use of ontologies has become mature, moving from niche to mainstream usage within bioinformatics. Following on from last year’s reflective look, this year we are broadening the scope of SIG; we are interested in any formal or informal approach to organising, presenting and disseminating knowledge in biology.

So, for example:

  • Semantic and/or Scientific wikis.
  • Multimedia blogs
  • Folksonomies
  • Tag Clouds
  • Collaborative Curation Platforms
  • Collaborative Ontology Authoring and Peer-Review Mechanisms

are topics which will be of relevance to the SIG, in addition to the more traditional areas for bio-ontologies.

  • Biological Applications of Ontologies
  • Reports on Newly Developed or Existing Bio-Ontologies
  • Tools for Developing Ontologies
  • Use of Ontologies in Data Communication Standards
  • Use of Semantic Web technologies in Bioinformatics
  • Implications of Bio-Ontologies or the Semantic Web for drug discovery
  • Current Research In Ontology Languages and its implication for Bio-Ontologies

Please note, that this year ISCB have made an innovative schedule, holding some of the SIGs DURING ISMB. Bio-Ontologies is on the Sunday parallel to the main conference.


Submissions are now open and can be submitted through easychair.
Instructions to Authors

We are inviting two types of submissions.

Short papers, up to 4 pages.
Poster abstracts, up to 1/2 page.

Following review, successful papers will be presented at the Bio-Ontologies SIG. Poster abstracts will be provided poster space and time will be allocated during the day for at least one poster session. Unsuccesful papers will automatically be considered for poster presentation; there is no need to submit both on the same topic.


  • Phillip Lord, Newcastle University
  • Susanna-Assunta Sansone, EBI
  • Nigam Shah, Stanford
  • Matt Cockerill, BioMedCentral

Programme Committee

The programme committee, organised alphabetically is:

  • Mike Bada, University of Colorado
  • Judith Blake, Jackson Laboratory
  • Frank Gibson, Newcastle University
  • Cliff Joslyn, Pacific National Laboratory
  • Wacek Kusnierczyk, Norwegian University of Science and Technology
  • Robin MacEntire, GSK
  • Helen Parkinson, EBI
  • Daniel Rubin, Stanford University
  • Alan Ruttenberg, Science Commons
  • Robert Stevens, University of Manchester
  • and the conference organisers.


Submission templates are available from the Bio-Ontologies website.



Zotero library re-visioned

I have been wanting to use Zotero now for a while for my reference library but could never work out how back up my library using subversion. My life is contained within subversion, I do not know how I could have possibly survived before all my work; code, presentations, papers, images and not to mention my thesis, is all perfectly backed up and re-visioned and floating happily in the cloud available to me from any machine. Zotero installs itself inside the firefox profile which makes it difficult to revision within the C:\\my-subversion” folder. What I decided to do was to create a new firefox profile (instructions here) within my-subversion folder then install zotero creating:


I then only added the zotero folder to my subversion repository. You could always revision your firefox profile but I decided not to. Now every time I add a new item to zotero the my-subversion folder indicates there has been a change and requires a commit. Obviously every time you add a pdf file to the library you will actually have to “SVN add” the file itself. This is not a problem for me as I try to keep my library light and not store to many pdfs.

I am also going to try and use zotero as an interface to my subversion repository, describing and tagging documents and code that I write, but more specifically presentations, so no more trying to work out what is contained in “Presentation1.ppt” or what file name I gave to that talk on data standards which I have to give tomorrow!

I am tagging my hard drive via Zotero, its just one big cloud.


sshh!, dont tell anyone about Data Sharing for Computational Neuroscience

I described in an earlier post that data sharing in Neuroscience is relatively non-existent. Some commentary on the subject has appeared since then via the 2007 SfN Satellite Symposium on Data Sharing entitled Value Added by Data SharingLong-TermPotentiation of Neuroscience Research, published in Neuroinformatics. I was also excited to see an article published last week Data Sharing for Computational Neuroscienc, also in Neuroinformatics. However, there is a caveat or two. Apart from ignoring all the data representation issues presented in other domains such as bioinformatics, the re-use of data models such as FuGE, or contribution to ontology efforts such as OBI, all these articles are not open access! How ironic, or should that be how embarrassing. Phil also covers this issue in his blog.

Oh well, looks as if there is still a challenge in the domain of Neuroscience for access to valuable insights into information flow in the brain. Who want to know how the brain works anyway? You can always pay $32 to springer if you want to find out.


Standard Open(ed-up) Science

OK, so it is not quite up to the minute as-you-do-it-you-publish-it open science. However, I plan to make my data that I generated during my PhD (just finished) open and available, and in writing this post I am making some what of a public commitment to do so. Once difference though, from some of the open science efforts that I have seen so far, is that I will be publishing my data, conforming to the Proteomics Standards Initiative(PSI) MIAPE guidelines for gel electrophoresis (MIAPE GE.pdf) recommended reporting requirements. The data itself will be represented in XML using the PSI recommended gel electrophoresis Mark up Language (GelML), and using terminology from sepCV and OBI should mean the data set is computationally amenable. I was involved in the development of these specifications so I suppose I should be leading by example and be the first one to publish a complete gel electrophoresis proteomics dataset.

When finnished, I would have liked it to be published some where like Nature Preceedings, however they only accept proprietary Microsoft files and pdfs rather than XML documents. I also though of creating a Google code project for it, but it seems quite elaborate for something nobody else would be contributing to and once completed would be rather static. Any suggestions are very welcome.



In catching up with my reading lists of 2007 I was alerted to gopubmed via Deepak’s post. Gopubmed described itself as an ontology based literature search making use of both the Gene ontology and Mesh terms. There is also the ability to provide feedback or rather act as a curator for the search results. I have already noticed  some mis-match in author details.  In general though the interface  is a vast improvement on pubmed’s tiered interface and the ability to refine the searches looks interesting. I have added gopubmed to  my search engines within firefox and will have a play to see if it is any good. RSS feeds on search terms would be top of the wish list. Anybody used it in anger?


e-Science blog

A new blog has appeared over the last month entitled e-science ramblings. This blog is edited by Hugo Hiden who is the technical director of the North Eastern Regional e-Science centre which is based at Newcastle University.

As described in his first post:

The reason for this blog is, primarily, to document my experiences with writing a prototype e-Science research platform using Microsoft tools instead of the more traditional approach of fighting with Open Source. This way is easier, supposedly. The task I have set myself is to recreate, at a basic level, the software being developed by the CARMEN project.

I think this should be an interesting read both on the technical aspects and the usability of Microsoft products compared to open source software for e-science.


I see science

It is interesting to see new developments in the dissemination of scientific discourse, such as scientific blogging and paradigms such as open science, come on-line with developments in web-based social media. The latest medium to receive the Science 2.0 treatment (poor pun on applying Web 2.0 technologies for science) is the video.

YouTube is probably the granddaddy, or at least the most prominent of the video upload and broadcast services. Although YouTube doesn’t not have a defined science category, it is easy to find science related videos and lectures, mixed in with the general population. However, several specialist sites have appeared dealing specifically with science research, all have been labeled as “YouTube for science”. The most recent site is SciVee, which is a collaboration of the Public Library of Science (PLoS), the National Science Foundation (NSF) and the San Diego Supercomputer Center (SDSC). The fact the a publishing house (PLoS) has got involved in this effort is encouraging, and maybe an admittance that a paper, in isolation of public commentary, the data used to produce the paper, and a sensible presentation mechanism, is no longer sufficient in the web-based publishing era (or maybe I have got over excited and read to much into it). The most interesting feature on SciVee, and probably the most powerful compared to some of the other broadcasters, is the ability to link to an Open Access publication, setting the context and relevance of the video. The flip side could also be true, where a video provides evidence of the experiment, such as the methods or the displays the result.

Another scientific video broadcaster is the Journal of Visualized Experiments (JOVE). As the name suggests, JOVE focuses on capturing the experiment performed within the laboratory, rather than a presentation, or general scientific discourse. As a result JOVE can be though of as a visual protocol or methods journal and is stylaised as a traditional journal, already on Issue 6; a focus on Neuroscience.

If JOVE is a visual journal for life-science experiments then Bioscreencast could be thought of as a visual journal of Bioinformatics. Bioscreencast focusses on screencasts of software, providing a visual “How-to” on scientific software, presentations and demonstrations.

No doubt these three may not be the last scientific video publishers, but they have an opportunity to become well established ahead of the others. Now, where is my webcam, I need to video myself writing code and submit it to JOVE, produce a demo and submit it to Bioscreencast. Then I have to write a paper on it, submit the pre-prints to Nature Preccedings get it published in an Open Access Journal, then video myself giving a presentation on the paper, submit it to SciVee and link them all together.


RSS readers

I have outlined my growing tendency at the minute to handing over applications to the “Internet cloud” in an earlier post.

I prefer using web-based applications because I tend to jump from several machines throughout the day at work and then use a different machine at home. Having applications, floating in the ether cloud, means moving around is considerably easier. I have been using bloglines for quite a while now for my RSS feeds. I did have a early look at Google reader when it first launched, but I felt then it was not quite what I wanted and definitely not as good as bloglines at the time. However a re-visiting of Google reader over the last week or so has dramatically changed my perception. Re-vamped with a new interface (similar to bloglines) has made reading posts alot easier. All the post from your subscribed feeds are actually saved, by default and don’t disappear once read, unlike in bloglines (unless you check the “keep new” box). I think the biggest feature for me is the ability to tag posts, combined with the saved posts facility, this should prove to be a very handy source of reference rather that just an “recent-post viewer”.

A new feature that has just been added to Google reader is the offline mode. Working in conjunction with Google Gears this provides the ability to read the last 2000 recent items, a feature I am looking forward to testing during the flight to ISMB in a few weeks.

If you use another RSS reader or have an opinion on Google reader then let me know.

With using gmail, and calendar, with trying out Google reader (and probably switching from bloglines), using google docs and spreadsheets more everyday, there is every danger that my cloud is going to be raining google. With the added prospect of Google presentations round the corner will it be long before I am floating off to the Google OS cloud?


Do scientists really believe in open science?

I am writing this post as a collection of the current status and opinions of “Open Science”. The main reason being I have a new audience; I am working for the CARMEN e-Neuroscience project. This has exposed me, first hand, to a domain of the life-sciences to which data sharing and publicly exposing methodologies has not been readily adopted, largely it is claimed due to the size of the data in question and sensitive privacy issues.

Ascoli, 2006 also endorses this view of the neuroscience and offers some further reasons why this is the case . He also includes the example of exposing neuronal morphological data and argues the benefits and counters the reticence to sharing this type of data.

Hopefully, as the motivation for the CARMEN project is to store and share and facilitate the analysis of neuronal activity data, some of these issues can be overcome.

With this in mind I want to create this post to provide a collection of specific blogs, journal articles, relevant links and opinions which hopefully will be a jumping-off point to understanding the concept of Open Science and embracing the future methodologies in pushing the boundaries of scientific knowledge.

What is Open Science?

There is no hard and fast definition, although according to the Wikipedia entry:

“Open Science is a general term representing the application of various Open approaches (Open Source, Open Access, Open Data) to scientific endeavour. It can be partially represented by the Mertonian view of Science but more recently there are nuances of the Gift economy as in Open source culture applied to science. The term is in intermittent and somewhat variable use.”

“Open Science” encompasses the ideals of transparent working practices across all of the life-science domains, to share and further scientific knowledge. It can also be thought of to include the complete and persistent access to the original data from which knowledge and conclusions have been extracted. From the initial observations recorded in a lab-book to the peer-reviewed conclusions of a journal article.

The most comprehensive overview is presented by Bill Hooker over at 3quarks daily. He has written three sections under the title “The Future of Science is open”

  1. Open Access.
  2. Open Science
  3. An Open Science World

In part 1, as the title suggests, Bill presents an overview on open access publishing and how this can lead to open-science (part 2). He suggests that

“For what I am calling Open Science to work, there are (I think) at least two further requirements: open standards, and open licensing.”

I don’t want to repeat the content already contained in these reviews, although I agree with Bill’s statement here. There is no point in having an open science philosophy if the data in question is not described or structured in a form that facilitates exchange, dissemination and evaluation of the data, hence the requirement of standards.

I am unaware of community endorsed standard reporting formats within Neuroscience. However, the proliferation of standards in Biology and Bioinformatics, is such, that it is fast becoming a niche domain in its own right. So much so, that there now exists a registry for Minimum Information reporting guidelines, following in the formats of MIAME and MIAPE. This registry is called MIBBI (Minimum Information for Biological and Biomedical Investigations) and aims to act as a “one-stop-shop” of existing standards life-science standards. MIBBI also provides a foundry where best practice for standards design can be fostered and disparate domains can integrate and agree on common representations of reporting guidelines for common technologies.

Complementary to standard data structures and minimum reporting requirements, is the terminology used to described the data; the metadata. Efforts are under way to standardise terminology which describes experiments, essential in an open environment, or simply in a collaboration. This is the goal of the Ontology of Biomedical Investigations (OBI) project which is developing “an integrated ontology for the description of biological and medical experiments and investigations. This includes a set of ‘universal’ terms, that are applicable across various biological and technological domains, and domain-specific terms relevant only to a given domain“. Already OBI is gaining momentum and currently supports diverse communities from Crop science to Neuroscience.

Open licensing of data may address the common arguments I hear for not releasing data, that “somebody might use it”, or the point blank refusal of “not until I publish my paper”. This is an unfortunate side effect of the “publish or perish” system as commented on bbgm and Seringhaus and Gerstein, 2007, and really comes down to due credit. In most cases this prevents real time assessment of research, complementary analysis or cross comparisons with other data sets to occur alongside the generation of the data, which would in no doubt enforce the validity of the research. Assigning computational amenable licenses to data, such as those proposed by Science Commons, maybe one way of ensuring that re-use of the data is always credited to the laboratory that generated the data. It is possible paradigm that “Data accreditation impact factors” could exist analogous to the impact factors of traditional peer-reviewed journals.

Open science may not just be be about releasing data associated with a peer-review journal, rather it starts from exposing the daily recordings and observations of an investigation, contained in the lab-book. One aspect of the “Open data” movement is that of “Open Notebook Science” a movement pioneered by Jean-Claude Bradley and the Useful Chemistry group, where their lab-book is is open and access-able on-line. This open notebook method was further discussed by a recent Nature editorial outlining the benefits of this approach. Exposing you lab-book could allow you to link the material and methods section of your publication, proving you actually did the work and facilitating the prospect of other researchers actually being able to repeat your ground breaking experiments.

Already many funders are considering data management or data sharing policies, to be applied to future research proposals. The BBSRC have recently released their data sharing policy which states that, “all research proposals submitted to BBSRC from 26th April 2007 must now include a statement on data sharing. This should include concise plans for data management and sharing or provide explicit reasons why data sharing is not possible or appropriate“. With these types of policies a requirement to research funding the “future of science is open“.

The “Open Science” philosophy appears to be gaining some momentum as is actively being discussed within the scientific blogosphere. This should not really come as a great surprise as science blogging can be seen as part of the “Open science” movement, openly sharing opinions and discourse. Some of the more prominent science blogs focusing on the open science ideal are Open access News, Michael Eisen’s Open Science Blog, Research Remix, Science Commons, Peter Murray-Rust.

There are of course alot more blogs discussing the issue. Performing an “open science search” on Postgenomic (rss feed on search terms please, Postgenomic) produces an up to the minute list of the open science discourse. Although early days, maybe even the “open science” group on Scintilla (still undecided on Scintilla) will be the place in the future for fostering the open science community.

According to Bowker’s description of the traditional model of scientific publishing, the journal article “forms the archive of scientific knowledge” and therefore there has been no need to hold on to the data after it has been “transformed” into a paper. This, incorporated with in-grained social fears, as a result of “publish or perish”, of not letting somebody see the experimental data before they get their peer-reviewed publication, will cripple the open science movement and slow down knowledge discovery. Computational amenable licences may go some way to solve this. But raising the awareness and a clear memorandum from the major journal publishers that, exposing real-time science and publishing data will not prevent publication as a peer-reviewed journal, can only help.

In synopsis I will quote Bill again as I think he presents a summary better than I could;

My working hypothesis is that open, collaborative models should out-produce the current standard model of research, which involves a great deal of inefficiency in the form of secrecy and mistrust. Open science barely exists at the moment — infancy would be an overly optimistic term for its developmental state. Right now, one of the most important things open science advocates can do is find and support each other (and remember, openness is inclusive of a range of practices — there’s no purity test; we share a hypothesis not an ideology).


Life-science data standards

The full complement of the Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI), Minimum Information About a Proteomics Experiment (MIAPE), recommended reporting guidelines are now available for community review on the Nature Biotechnology website.

The manuscripts range from the definition of the MIAPE concept to the individual guidelines themselves which cover, Mass Spectrometry, Mass Spectrometry Informatics, Gel Electrophoresis and Molecular Interaction experiments. A further paper on the PSI protein modification ontology (PSI-MOD) is also listed.

Several other Minimum reporting requirements are also listed from other domains such as genome sequences (MIGS) and in Situ Hybridization and Immunochemistry (MISFISHIE).

There is also a paper on the Functional Genomics Experiment Model (FUGE) which is an “extensible framework” or data model for standards in functional genomics, although equally extensible to most scientific experiments.

As this process of community review, hosted on the Nature Biotech website, is a relatively new process and open to anyone, both identified or anonymous, I would encourage anybody with the relevant knowledge to comment on the papers. A greater response from the community ultimately means the guidelines are actually representative of the domain and technology they represent.

Leave a comment