Archive for category neuroinformatics
Paul Watson presents a talk on CARMEN a the Google Seattle Conference on Scalability.
We have gone through a bit of professional branding with a shiney new logo and some publicity material. The website has also been re-designed. The original drupal site was replaced with a Plone site, a decision I was not involved with. I am not sure I am a big fan of the big mug-shot which spills over the website template, the fact you have to scroll to the bottom of the front page to find out what the project is about is less than ideal. Feel free to comment
MIBBI is a registry of scientific experiment reporting guidelines with the idea to foster a foundry of best practice to further develop and encourage modular development and re-use of reporting guidelines. The first workshop is being held at the EBI on the 2nd – 3rd April 2008 and is a relatively closed workshop to those developers and guidelines that are registered on the site. The schedule for day one is a whistle stop tour consisting of 5 min talks (adjusting for an academics interpretation of what 5 minutes means) for all the guidelines that exist, their scope and the people behind them. Due to this I am not going to comment on individual talks. I presented two talks during the day. One on CARMEN and the development of the MINI: Electrophysiology reporting guidelines, and one, standing in for Andy Jones on FuGE.
I tried sharing these slides via google presentation, they looked quite nice. However, wordpress does not seem to allow them to embed. So I put them on slide share instead. These set the tone for the discussions for the afternoon and tomorrow.
The idea behind the CARMEN project is that we provide a system to store electrophysiology data and analysis services so that data can be shared and analysed in the “Neuro-cloud”. An important factor in realising this system is that the stored data and the services have to be described in a way that is both human and computationally amenable. The first stage of this is agreeing what information should actually be ascribed to the data. In other words, the balance between what the experimentalist want to say about their data and what informaticians need to know about a particular data set in order to perform their analysis. To this end we have defined what we believe to be the minimum information that must be ascribed to an electrophysiology experiment for submission to the CARMEN system. It follows the now well practised format of MIAME and MIAPE minimum reporting requirements. In the first instance the document only represents consensus within the CARMEN consortium. However, it could form the basis of a community reporting standard for electrophysiology experiments. The document is available on Nature preceedings at the following URL and comments and opinions are encouraged. http://precedings.nature.com/documents/1720/version/1
I described in an earlier post that data sharing in Neuroscience is relatively non-existent. Some commentary on the subject has appeared since then via the 2007 SfN Satellite Symposium on Data Sharing entitled Value Added by Data SharingLong-TermPotentiation of Neuroscience Research, published in Neuroinformatics. I was also excited to see an article published last week Data Sharing for Computational Neuroscienc, also in Neuroinformatics. However, there is a caveat or two. Apart from ignoring all the data representation issues presented in other domains such as bioinformatics, the re-use of data models such as FuGE, or contribution to ontology efforts such as OBI, all these articles are not open access! How ironic, or should that be how embarrassing. Phil also covers this issue in his blog.
Oh well, looks as if there is still a challenge in the domain of Neuroscience for access to valuable insights into information flow in the brain. Who want to know how the brain works anyway? You can always pay $32 to springer if you want to find out.
I have the unenviable task of developing an ontology for the CARMEN project which will allow the process of electrophysiology experiments, the generated data, the analysis of the data and the services that perform the analysis, to be described, and in addition be computationally amenable. Collecting the words that are required to described these tasks are relatively trivial. However, getting the scientists to realise they have assigned numerous meanings to the same word or term requires a little bit more patience on my part.
It also requires me to educate the scientists, in that building an ontology for electrophysiology is a little more complicated than putting some “words” in a text file.
The words in an ontology have to be explicitly defined so as to be completely unambiguous both to the scientist, who generate the data, and the informaticians who want to analyse the data, either immediately or several years down the line. The data should be described in such a manner to an agreed level of detail that no longer requires the informatician to pick up the phone and politely ask “how did you generate this piece of data?”.
The first stage I am trying to overcome or relay to the scientists is that although you use the same “words” you often use the words to describe different things in different contexts. This situation is generally less important when described in a journal publication but in presents issues when you use the words to annotate data and infer knowledge.
I have been trying to work out the best way to get this message across and to develop a methodology for collecting agreed definitions for words. I could have always put up a wiki or an issue tracker to do this, but this doesn’t always guarantee contribution. I feel the process needs to be mediated to turn the natural language definitions into more explicit normalised ontological definitions. Taking this into account I have decided to apply crowdsourcing to Ontology development.
Simply this means sending an email out entitled “Metadata term of the week”. This process was suggested to me by my boss Phil Lord. In this email I pick a word and attempt to define it. If I get it right then there is no need to respond. If you disagree with the definition then you have to respond with an alternative and therefore a discussion ensues and ends with an agreed definition. With this process the scientists get to see that other scientists within the project define or describe words slightly differently enough that they no longer are talking about the same thing.
The first Metadata term of the week was “spike sorting” and we received the following definitions
- Spike sorting is a process of assigning data spikes to sets, where each set is identified with a single neuron
- Spike sorting is a process aiming at separating spikes generated by different cells based on shape discrimination algorithms
- Spike sorting is a technique used in single-cell neural recordings which assigns particular spike shapes to individual neurons
- Spike sorting is a classification procedure. We can think about a forest (time series) where M animals of K different types live (M spikes of K different neurons). All animals are different but say two rabbits are a little bit more similar than the rabbit and fox. So, we need classify all M animals and to say about each to what particular class among K classes this animal belongs.
- Spike sorting is the process of identifying the waveforms associated with action potentials of an individual neuron within time series data.
All trying to say the same thing, although when taken explicitly they start to “mean” different things. Which led us to defining a three more terms in order to answer the original question:
a) An action potential is a sudden depolarization of the membrane potential of a cell . [synonym: spike]
b) Spike detection is a data extraction process that classifies the waveforms associated with action potentials and identifies the time point of when the spike event initiates. The input to this process is a continuous waveform. The output is a single sequence of spike event times.
c) Spike sorting is a data extraction process that assigns detected spike event times to individual neurons. The input of this process can be a continuous waveform or a sequence of spike event times. The output of this process are sets(or categories) of spikes. Each set is assumed to correspond to a single neuron.
This peer-production processes took approximately 4 days to conclude and I think it has succeeded in addressing three issues
- Highlighting the ambiguity and the use of terms, even within a small and enclosed group of scientist, within a single project.
- The peer-production of ontology terms and definitions.
- The engagement of the community within the project.
I would love to know peoples comments on this process or any alternative suggestions. Feel free to comment.
Today is my first day in my new job as a RA for the CARMEN project, which stands for Code, analysis, repository and modelling for e-Neuroscience. My role in this exciting project, is as a Metadata researcher for experiment context, which should involve ontology development and data representation in neuroscience.
Just as a matter of interest I thought I would compare bioinformatics and neuroinformatics on Google Trends. Looks as if I have plenty of work to do!