sshh!, dont tell anyone about Data Sharing for Computational Neuroscience

I described in an earlier post that data sharing in Neuroscience is relatively non-existent. Some commentary on the subject has appeared since then via the 2007 SfN Satellite Symposium on Data Sharing entitled Value Added by Data SharingLong-TermPotentiation of Neuroscience Research, published in Neuroinformatics. I was also excited to see an article published last week Data Sharing for Computational Neuroscienc, also in Neuroinformatics. However, there is a caveat or two. Apart from ignoring all the data representation issues presented in other domains such as bioinformatics, the re-use of data models such as FuGE, or contribution to ontology efforts such as OBI, all these articles are not open access! How ironic, or should that be how embarrassing. Phil also covers this issue in his blog.

Oh well, looks as if there is still a challenge in the domain of Neuroscience for access to valuable insights into information flow in the brain. Who want to know how the brain works anyway? You can always pay $32 to springer if you want to find out.

2 Comments

Standard Open(ed-up) Science

OK, so it is not quite up to the minute as-you-do-it-you-publish-it open science. However, I plan to make my data that I generated during my PhD (just finished) open and available, and in writing this post I am making some what of a public commitment to do so. Once difference though, from some of the open science efforts that I have seen so far, is that I will be publishing my data, conforming to the Proteomics Standards Initiative(PSI) MIAPE guidelines for gel electrophoresis (MIAPE GE.pdf) recommended reporting requirements. The data itself will be represented in XML using the PSI recommended gel electrophoresis Mark up Language (GelML), and using terminology from sepCV and OBI should mean the data set is computationally amenable. I was involved in the development of these specifications so I suppose I should be leading by example and be the first one to publish a complete gel electrophoresis proteomics dataset.

When finnished, I would have liked it to be published some where like Nature Preceedings, however they only accept proprietary Microsoft files and pdfs rather than XML documents. I also though of creating a Google code project for it, but it seems quite elaborate for something nobody else would be contributing to and once completed would be rather static. Any suggestions are very welcome.

8 Comments

First Wikipedia edit

I have just made my first edit in Wikipedia. The page in question described what now is the proliferation of life science reporting recommendations and can be found here. I added the details for the MIAPE GE recommendations and intend to add the neuroscience recommendations that I have been working on shortly.

Leave a comment

goPubmed

In catching up with my reading lists of 2007 I was alerted to gopubmed via Deepak’s post. Gopubmed described itself as an ontology based literature search making use of both the Gene ontology and Mesh terms. There is also the ability to provide feedback or rather act as a curator for the search results. I have already noticed  some mis-match in author details.  In general though the interface  is a vast improvement on pubmed’s tiered interface and the ability to refine the searches looks interesting. I have added gopubmed to  my search engines within firefox and will have a play to see if it is any good. RSS feeds on search terms would be top of the wish list. Anybody used it in anger?

2 Comments

Updating blog roll

In an effort of catching up with the blog posts form 2007 before starting the ones for 2008, I have just gone through my Google reader subscriptions. As a result I though I would update my blogroll  accordingly with the blogs have have enjoyed reading over the last year. The list is over on the right but the new additions are bio::blogs; the monthly bioinformatics blog carnival, bioinformatics zen from Michael Barton, the CARMEN project blog (which I contribute to), Hugo Hiden, the technical director of the e-science institute here in Newcastle, Public Rambling by Pedro, Savas Parastatidis personal blog and What you’re doing is rather desperate

Leave a comment

FuGE users workshop

Fuge logo

I am at the FuGE users workshop at the moment which is being held in Manchester. The idea behind the workshop is that people who have used FuGE or extended FuGE for a specific use get together and share their experiences. I have used FuGE to develop GelML for the Proteomics Standards Initiative. GelML is one specification produced by the Gel-working group of the PSI, which is a data model to capture the use of gel electrophoresis in a proteomics experiment – but in theory it should be able to account for the use of gel electrophoresis independent of the use or domain. I presented GelML on the first day along with some discussion points I would like to see addressed over the two days. We have a small be varied set of users at the workshop covering the domains of flow cytometry, RNAi, proteomics and systems biology along with tool developers (Symba).

As the result of the workshop we hope to produce a set of recommendations or best practice guidelines to promote uptake and consistency of the use of FuGE, all of which is an honourable goal. In regard to this one interesting discussion was to provide a library or a set of desing templates for common constructs, such as instrument settings or the protocol to create buffers and solutions.

The workshop has been very productive so far and it has been very interesting to see how other domains have extended FuGE

1 Comment

Scooped!!

I have been scooped on my own work! Deepak has managed to beat me in blogging about myspoon own screencast on the CARMEN project! Seriously though, thanks to Deepak for mentioning the project. I will have to blog harder and faster. As he highlights there are some challenges that the project hopes to address. Apart from the sheer demand for diskspace, one of the major challenges is to provide metadata to describe not only what an item of data is on the system but also how it was generated in the lab. In addition to this we also need a provenance trail describing what has been done on this data such as describing types of analysis to the degree of who authored the code, which version was it, where was it run and so on.

For the lab generation metadata we are trying to bring what we have learnt from within the biology/bionformatics community and take it a step further. We are currently assessing if the FuGE data model can be applied to electrophysiology . This may be achieved by creating an appropriate CARMEN ontology which we intend to align with OBI. It would be interesting to know if this seems like a sensible approach and of any alternatives we could employ.

The screencast of some introductory slides about the project and a demo of the current functionality of the CARMEN system can be viewed via the following screencast, hosted on Bioscreencast. I site I like and one which I begining to use more often.

I can seem to get it to embedd with wordpress so here is the link.

2 Comments

Latest bio::blogs

The 16th edition of Bio::blogs has been released on Freelancing Science. I enjoy reading Bio::blogs for two reasons:

  1. If I know it all already then I have been doing a relatively good job of keeping up to date with what has been talked about and I can convince myself that my knowledge is on the cutting edge of science ( this is rarely the case though)
  2. I always find a new blog or a new post that is interesting.

This month I noticed a screencast by Konrad  on open science and transparency which I though I would download to watch on my travels to San Diego. When I got there however I realized I had already seen it via Nature Precedings. Oh well must be on the cutting edge 🙂

1 Comment

CARMEN at SfN – Demonstration advert

Several members of the CARMEN project, including myself, will be travelling to the Society for Neuroscience Annual Meeting next week in San Diego. We will be presenting the current status and future plans of the project on Monday 5 November from 1.30pm – 4.30pm in the Exhibition Hall at the INCF (International Neuroinformatics Coordinating Facility) Booth (4924).

Come along and say hello if you are about. I hope to put the slides and the demo up as a screen cast on Bioscreencast sometime next week.

1 Comment

Ontology crowdsourcing

I have the unenviable task of developing an ontology for the CARMEN project which will allow the process of electrophysiology experiments, the generated data, the analysis of the data and the services that perform the analysis, to be described, and in addition be computationally amenable. Collecting the words that are required to described these tasks are relatively trivial. However, getting the scientists to realise they have assigned numerous meanings to the same word or term requires a little bit more patience on my part.

It also requires me to educate the scientists, in that building an ontology for electrophysiology is a little more complicated than putting some “words” in a text file.

The words in an ontology have to be explicitly defined so as to be completely unambiguous both to the scientist, who generate the data, and the informaticians who want to analyse the data, either immediately or several years down the line. The data should be described in such a manner to an agreed level of detail that no longer requires the informatician to pick up the phone and politely ask “how did you generate this piece of data?”.

The first stage I am trying to overcome or relay to the scientists is that although you use the same “words” you often use the words to describe different things in different contexts. This situation is generally less important when described in a journal publication but in presents issues when you use the words to annotate data and infer knowledge.

I have been trying to work out the best way to get this message across and to develop a methodology for collecting agreed definitions for words. I could have always put up a wiki or an issue tracker to do this, but this doesn’t always guarantee contribution. I feel the process needs to be mediated to turn the natural language definitions into more explicit normalised ontological definitions. Taking this into account I have decided to apply crowdsourcing to Ontology development.

Simply this means sending an email out entitled “Metadata term of the week”. This process was suggested to me by my boss Phil Lord. In this email I pick a word and attempt to define it. If I get it right then there is no need to respond. If you disagree with the definition then you have to respond with an alternative and therefore a discussion ensues and ends with an agreed definition. With this process the scientists get to see that other scientists within the project define or describe words slightly differently enough that they no longer are talking about the same thing.

The first Metadata term of the week was “spike sorting” and we received the following definitions

  1. Spike sorting is a process of assigning data spikes to sets, where each set is identified with a single neuron
  2. Spike sorting is a process aiming at separating spikes generated by different cells based on shape discrimination algorithms
  3. Spike sorting is a technique used in single-cell neural recordings which assigns particular spike shapes to individual neurons
  4. Spike sorting is a classification procedure. We can think about a forest (time series) where M animals of K different types live (M spikes of K different neurons). All animals are different but say two rabbits are a little bit more similar than the rabbit and fox. So, we need classify all M animals and to say about each to what particular class among K classes this animal belongs.
  5. Spike sorting is the process of identifying the waveforms associated with action potentials of an individual neuron within time series data.

All trying to say the same thing, although when taken explicitly they start to “mean” different things. Which led us to defining a three more terms in order to answer the original question:
a) An action potential is a sudden depolarization of the membrane potential of a cell . [synonym: spike]

b) Spike detection is a data extraction process that classifies the waveforms associated with action potentials and identifies the time point of when the spike event initiates. The input to this process is a continuous waveform. The output is a single sequence of spike event times.

c) Spike sorting is a data extraction process that assigns detected spike event times to individual neurons. The input of this process can be a continuous waveform or a sequence of spike event times. The output of this process are sets(or categories) of spikes. Each set is assumed to correspond to a single neuron.

This peer-production processes took approximately 4 days to conclude and I think it has succeeded in addressing three issues

  1. Highlighting the ambiguity and the use of terms, even within a small and enclosed group of scientist, within a single project.
  2. The peer-production of ontology terms and definitions.
  3. The engagement of the community within the project.

I would love to know peoples comments on this process or any alternative suggestions. Feel free to comment.

2 Comments