Archive for March, 2008

A data model for life-science experiments; FuGE

This post may be one in a series of responses to Cameron’s post on “Proposing a data model for Open Notebooks“. When I originally read this post I commented on the fact that a data model for experiments actually exists and that he may get some mileage out of it rather than starting from scratch and re-creating the wheel. Several discussions have followed on from this original post and Neil has picked up on it as well, with sentiments that I agree with.

I think a large part of this discussion confuses and conflates 3 issues which I believe to be separate;

  1. the representation of experiments – the data model
  2. the presentation or level of abstraction to the user (probably some what dependent on 3.)
  3. the implementation of the data model

With these three issues in mind, to start with, I am going back to the original post and respond to some of the comments.

What I’m suggesting is a standard format to describe experiments;…

A “standard” in the true sense of the word (established by consensus and approved by a recognized body) already exists to describe life-science experiments. It is a data model represented in UML called FuGE.

…..a default format for online notebooks. The object is to do a number of things. Firstly identify the page(s) as being an online laboratory notebook so that they can be aggregated or auto-processed as appropriate.

I see this as two different and separate things, the data model which represents experiments, and the presentation of the model to the user, in this case described as an online notebook. Page numbers are an arbitrary visual aid, they are not integral to modelling experiments

…Secondly to make rich metadata available in a human readable and machine processable form making mashups and other things possible using tools such as Yahoo! Pipes, Dapper, and the growing range of other interesting tools, but not to impose any unnecessary limitations on what that metadata might look like. ..

I am not going to deal with metadata here, as the post will probably be long enough. However, traditionally, metadata, (cv’s and ontologies) have been used to add specificity or meaning to the structured data. The choice of the metadata to use (or build) will be dependent on the application.

Another issue is the tables. My original thinking was that if we had a data model for tables then most of our problems would go away.

I am not sure I agree here. What is a table? I see it as a particular visual display mechanism that you have chosen to represent you results. The results can be modelled more accurately within the data model such as chemical-has_measurement, measurement has_numerical value and has_unit. I believe this statement is confusing the visual presentation of data with structuring the data.
However the argument against still stands. Anything that requires a fixed vocabulary is going to break

Well, anything that requires a fixed vocabulary is less flexible, breaking is something different. If it breaks doing the job it was designed to do then this is a problem. If it breaks when applied to a different application, then well, it was not designed for that application in the first place. FuGE is designed so that it provides a generic structure which can then be described or further specialised by the user/application by extending the model itself or by using cv’s/ontologies or free text. This provides the flexibility and in theory future proof.

Overall an experiment has inputs and outputs. These may be data or material objects. Procedures take inputs and generate outputs.[..] Broadly speaking there seem to be three types of item; material objects , data, and procedures (possibly also comments). For each of these we require a provenance (author), and a date

I would agree with you assessment of what classes are needed. This corresponds to what FuGE contains as illustrated in the digram below (click on image to see original)

Summary

In summary, the position I want to present is that FuGE is a data model to represent scientific experiments. Several domains are using it to represent their experiments from traditional biology/molecular biology to neurophysiology. I believe FuGE could form the underlying model for a “notebook” via an abstraction/presentation layer to the user. In how should it be implemented, blog, wiki, database, latex, XML, RDF, OWL, I am not going to hypothesis. However, a database implementation of the FuGE schema is already in development called SyMBA which abstracts away from the user presenting simple web forms to fill out the XML which is then stored as a relation database.

5 Comments

Minimum Information about a Neuroscience Investigation (MINI)

The idea behind the CARMEN project is that we provide a system to store electrophysiology data and analysis services so that data can be shared and analysed in the “Neuro-cloud”. An important factor in realising this system is that the stored data and the services have to be described in a way that is both human and computationally amenable. The first stage of this is agreeing what information should actually be ascribed to the data. In other words, the balance between what the experimentalist want to say about their data and what informaticians need to know about a particular data set in order to perform their analysis. To this end we have defined what we believe to be the minimum information that must be ascribed to an electrophysiology experiment for submission to the CARMEN system. It follows the now well practised format of MIAME and MIAPE minimum reporting requirements. In the first instance the document only represents consensus within the CARMEN consortium. However, it could form the basis of a community reporting standard for electrophysiology experiments. The document is available on Nature preceedings at the following URL and comments and opinions are encouraged.  http://precedings.nature.com/documents/1720/version/1

2 Comments

CFP: Bio-Ontologies 2008: Knowledge in Biology

Call for Papers for Bio-Ontologies 2008. Submissions are now invited Bio-Ontologies 2008: Knowledge in Biology, a SIG at Intelligent Systems for Molecular Biology 2008.

Key Dates to remember:

  • Submission due: Friday 2nd May
  • Notifications: Friday 23rd May
  • Final Version Due: Friday 30th May
  • Workshop: Sunday 20th July

Introduction

Bio-Ontologies has existed as a SIG at ISMB for more than a decade, making it one of the longest running. For this time, Bio-Ontologies has provided a forum for discussion on the latest and most cutting edge research on ontologies. In this decade, the use of ontologies has become mature, moving from niche to mainstream usage within bioinformatics. Following on from last year’s reflective look, this year we are broadening the scope of SIG; we are interested in any formal or informal approach to organising, presenting and disseminating knowledge in biology.

So, for example:

  • Semantic and/or Scientific wikis.
  • Multimedia blogs
  • Folksonomies
  • Tag Clouds
  • Collaborative Curation Platforms
  • Collaborative Ontology Authoring and Peer-Review Mechanisms

are topics which will be of relevance to the SIG, in addition to the more traditional areas for bio-ontologies.

  • Biological Applications of Ontologies
  • Reports on Newly Developed or Existing Bio-Ontologies
  • Tools for Developing Ontologies
  • Use of Ontologies in Data Communication Standards
  • Use of Semantic Web technologies in Bioinformatics
  • Implications of Bio-Ontologies or the Semantic Web for drug discovery
  • Current Research In Ontology Languages and its implication for Bio-Ontologies

Please note, that this year ISCB have made an innovative schedule, holding some of the SIGs DURING ISMB. Bio-Ontologies is on the Sunday parallel to the main conference.

Submissions

Submissions are now open and can be submitted through easychair.
Instructions to Authors

We are inviting two types of submissions.

Short papers, up to 4 pages.
Poster abstracts, up to 1/2 page.

Following review, successful papers will be presented at the Bio-Ontologies SIG. Poster abstracts will be provided poster space and time will be allocated during the day for at least one poster session. Unsuccesful papers will automatically be considered for poster presentation; there is no need to submit both on the same topic.

Organisers

  • Phillip Lord, Newcastle University
  • Susanna-Assunta Sansone, EBI
  • Nigam Shah, Stanford
  • Matt Cockerill, BioMedCentral

Programme Committee

The programme committee, organised alphabetically is:

  • Mike Bada, University of Colorado
  • Judith Blake, Jackson Laboratory
  • Frank Gibson, Newcastle University
  • Cliff Joslyn, Pacific National Laboratory
  • Wacek Kusnierczyk, Norwegian University of Science and Technology
  • Robin MacEntire, GSK
  • Helen Parkinson, EBI
  • Daniel Rubin, Stanford University
  • Alan Ruttenberg, Science Commons
  • Robert Stevens, University of Manchester
  • and the conference organisers.

Templates

Submission templates are available from the Bio-Ontologies website.

2 Comments