The Semantic Web of Life Science

This summary was born out of a question  on Twitter and percolated to FriendFeed, which was “Who is using RDF and integrating other resources at the minute and what are those resources? From this question, several resources were highlighted.

UniProt. The comprehensive resource of protein information is available as an RDF distribution and each Protein record has a corresponding RDF download option.

Phil pointed out Semantic Systems Biology, As systems biology is largely concerned with representing networks and interactions at a systems level, a language like RDF would seem an obvious choice to represent this type of knowledge, to aid semantic description and data integration.

Melanie pointed out the following resources such as Bio2RDF. This project aims to RDF-ize numerous public life-science resources using what they call a three step approach which they have developed. The following image illustrates some of the resources that are included in Bio2RDF.

Bio2RDF Cloud

Bio2RDF Cloud

The NeuroCommons project seeks to make all scientific research materials – research articles, annotations, data, physical materials – as available and as usable as they can be. As a result they have an RDF triple store which they encourage you to either contribute to or download and use.

For a more general overview of resources that exist as an RDF implementation, the Linked Open Data cloud provides a graphical summary of the resources that exists and the relationships between them.

If you know of any more life-science resources or projects using RDF, then please do comment below.  Egon has indicated he is working on RDF-ing the NMRShiftDB and ChEMBL’s Starlight, and Andrew Clegg is considering a project proposal involving RDF. As a result a very interesting discussion ensued on FF.

Reblog this post [with Zemanta]

, , , , , , , ,

16 Comments

HUPO PSI-PAR: standard format for protein affinity reagents

Schematic diagram of an {{w|antibody}} and ant...
Image via Wikipedia

HUPO PSI-PAR: standard format for protein affinity reagents is now available for Public Comment on the PSI Web site for the next 30 days. The public comment period enables the wider community to provide feedback on a proposed standard before it is formally accepted, and thus is an important step in the standardisation process. This message is to encourage you to contribute to the standards development activity by commenting on the material that is available online. We invite both positive and negative comments. If negative comments are being made, these could be on the relevance, clarity, correctness, appropriateness, etc, of the proposal as a whole or of specific parts of the proposal. If you do not feel well placed to comment on this document, but know someone who may be, please consider forwarding this request. There is no requirement that people commenting should have had any prior contact with the PSI. If you have comments that you would like to make but would prefer not to make public, please email the PSI-Editor directly.

Reblog this post [with Zemanta]

, , , , ,

Leave a comment

The BioSysBio conference 2009

The Genomics GTL Pictorial Program.
Image via Wikipedia

The premise of the BioSysBio conference is to

bring together the best young researchers working in Synthetic Biology, Systems Biology and Bioinformatics, providing a platform to hear and discuss the most recent and scientific advances and applications in these fascinating fields.

This years BioSysBio 09 has just taken place in Cambridge, UK. The program was more slanted towards synthetic biology rather than more traditional systems biology, which I think reflects the growing momentum that synthetic biology has gained in the past year. I think this is a good progress and  I was secretley glad as I did not want to spend 3 days looking at massive network diagrams squashed onto power point slides.

This was the first conference I had been to that the organisers actually requested that we use the BioSysBio FriendFeed room and Twitter to communicate, so I did. Half way through the first day the organisers demonstrated the FF room, which seemed to exist solely of Allyson’s posts, and questions were asked if she was a blogging bot. When we did confirm there was actually a female at an engineering conference, she was thereafter known as the BioSysBio poster girl.

As ever Ally was monumental in her blogging during the conference and all her posts can be found here. At one stage Simon did try to blog her talk to the same detail and speed, but he just kept coming up withe excuses about the wifi being slow – eventually he got there.

This was the first time I attended BioSysBio and I thoroughly enjoyed the experience. In general all of the talks were of a high standard most notable for me were Allyson Lister’s talk on Saint: a lightweight SBML annotation integration environment, Christina Smolke on  Programming RNA Devices to Control Cellular Information Processing, Piers Millet on Why Secure Synthetic Biology? and Drew Endy on Building a new Biology. It was also good to hear about improvements for the Registry of standard biological parts by Randy Rettberg and the wiki style community building of the product catalogue, or data sheet about each part.

There is no point in me re-posting coverage that has already been documented, so if you would like to follow what happened you can follow the #biosysbio twitter stream, the biosysbio FreindFeed Room, or if you want a more comprehensive overview, Ally’s blog.

This was also the first time I had used twitter (via tweetdeck) instead of Friendfeed to microblog a conference. This approach certainly generated alot of noise and random soundbites, and was probably a fast way to make notes. However, although everything is grouped under the #biosysbio tag, they are not grouped around a particular talk or discussion thread. I can’t help thinking that microblogging via FriendFeed would be more focused around a specific talk and provide a more focused discussion, as opposed to just covering what was happening second by second.

Reblog this post [with Zemanta]

, , , , , , , , ,

1 Comment

PSI AnalysisXML Enters Public Comment

Section of a protein structure showing serine ...
Image via Wikipedia

The HUPO Proteomics Standards Initiative aims to develop community data standards for proteomics, that are developed, accepted and implemented by the proteomics community.  To this end,  The “AnalysisXML: exchange format for peptides and proteins identified from mass spectra” is now available for Public Comment on the PSI Web site.

The public comment period enables the wider community to provide feedback on a proposed standard before it is formally accepted, and thus is an important step in the standardisation process.

This message is to encourage you to contribute to the standards development activity by commenting on the material that is available online. We invite both positive and negative comments. If negative comments are being made, these could be on the relevance, clarity, correctness, appropriateness, etc, of the proposal as a whole or of specific parts of the proposal.

If you do not feel well placed to comment on this document, but know someone who may be, please consider forwarding this request. There is no requirement that people commenting should have had any prior contact with the PSI.

Announcement via the PSI editor, Norman Paton

Reblog this post [with Zemanta]

Leave a comment

The OBI winter workshop 2009

Vancouver skyline at night
Image by istargazer via Flickr

The OBI winter workshop 2009 has just been held in Vancouver. This was the first of the 2009 bi-annual face-to-face workshops concerned with the development of the ontology for biomedical investigations. The first session, on the first day, covered some administration and developmental policy issues;

Namespace –  should other communities who wish to integrate with OBI retain their namespace within OBI or loose it and gain an OBI one instead? The general consensus was that there should be only one namespace within OBI and that the source of the term/community should be described in the annotation property of each class.

Defined classes – a general development policy is that classes should not be asserted under defined classes (classes with Necessary and Sufficient conditions) this is to avoid multiple inheritance and to allow the reasoner to infer the hierarchy.

Quick ID – I presented the current status of the Quick ID policy to allow rapid term submission to the ontology. The documentation is still in a draft stage and will change, available from here.

Quick term – similar to the quick id process but concerns submission of classes which are composites (intersection of) classes that exist across multiple resources.

Use-cases.

Several people presented use-cases on how they are either using or intend to make use of OBI.  James presented  a talk on how Genepattern software would like to use OBI. Bjoern presented how he would like to use OBI for the immune epitope database (IEDB) and Jennifer outlined potential for the CEBS. This was very interesting and encouraging that there is definitely a demand for OBI and pressure to release something.

The remaining sessions were concerned with specific branch development describing important classes that exist and any issues that were outstanding that required a broader impact from the consortium, in order to reach a conclusion. I presented reports on both the instrument branch and the relations branch status.

On the final day we dealt with the OBI manuscript, incorporating the use-cases that were presented and assigning sections to individuals for completion and the different flavours of OBI releases that our users would require. We finished with presentations from NCBO Bioportal 2.0 demonstrating new features and soliciting feedback, and from Larisa on the robot scientist which uses there own ontology for experiments to drive the robot.

All in all it was a very productive, but tiring workshop which provided the ground work for further OBI development and work on the manuscript, between now and the next workshop  Summer 2009 OBI workshop (back2back with 2nd OBO Foundry workshop), EBI, Cambridge, UK, June 2-6, 2009 (OBO Foundry 7-8; OBI 9-12)

For a full outline of the agenda (now incorporated with action items) see here.

Reblog this post [with Zemanta]

Leave a comment

All change in 2009, no more academia

The Tyne Bridge across the River Tyne between ...
Image via Wikipedia

Last week I completed what was my first week of working in academia in 2009 and also my last for the foreseeable future. I have decided to leave academia, as well as Newcastle to take a position in a Biotech company in Cambridge.

I have certainly enjoyed just over 5 years working in Newcastle encompassing my PhD and a postdoc. Within that time my research interests have developed from representing scientific data, to data standards to ontology development. I am still maintaining some of my research interests in my new position. I will be working for an antibody company called Abcam, and as part of my role I will be investigating how to represent their product catalogue as an ontology and how users can have a more direct interaction with it.

I am really looking forward to this fresh challenge. I don’t know how this change of position will effect the content of this blog, although I expect it to still largely reflect my interests on the representation of scientific data and open science. I think its probably safe to assume there might be the odd post about antibodies as well.

Reblog this post [with Zemanta]

, , , , , , , , ,

2 Comments

Dopplr and my first trip

I have just signed up to Dopplr to track my travel and to keep a record of my trips. I am not sure how much use I will get out of it or how much of a benefit it will be, however I thought I would give it a go.

I’m taking a trip to Vancouver between January 31st and February 8th. which will be for the OBI developers workshop.

1 Comment

Melting away misconceptions: The strucure of the mitotic chromosome

I am sure many of you remember sitting in a science class as a child, or an early undergraduate course, being taught about cell replication. How DNA is passed from one cell to the next via either mitosis or meiosis in order to effect DNA replication and gene expression, so that the genetic information content of the DNA can be passed from one generation to the next.

DNA can be organised inside packages within cells. These packages are called chromatin, which are found inside the nuclei of eukaryotic cells and the nucleoid of prokaryotic cells. Chromatin [1] is a complex combination of DNA, RNA and protein that forms a chromosome.

To date, the commonly accepted hypothesis is that chromatin can take the following three organisational forms

  1. DNA wrapping around nucleosomes – The “beads on a string” structure.
  2. A 30 nm condensed chromatin fiber consisting of nucleosome arrays in their most compact form.
  3. Higher level DNA packaging into the metaphase chromosome

 A paper published by Eltsova M, MacLellan KM, et al, 2008[2] disproves the theory of the existence of a 30 nm condensed chromatin fiber, by providing high resolution in situ images (structures of less than 0.7nm can be resolved) that have not been technologically possible until now. They use a new technique called Cryo-electron microscopy of vitreous sections (CEMOVIS) [3] which allows them to observe the cells structures in a close-to-native state. The technique of CEMOVIS involves the vitrification of a sample by high pressure freezing which is sliced to 20-100nm sections (thin sectioning) and then is imaged in a cryo-electron microscope, without any further chemical treatment or staining, which ensures immobilization of all of the macromolecules in the specimen in a close-to-native state [3]. Eltsova M, MacLellan KM, et al, hypothesis, that it is the harsh nonphysiological treatments that chromatin samples have been exposed to in other investigations, such as, hypotonic buffers, chemical fixation, alcohol dehydration and embedding in plastic, that might have generated artificial de novo folding of chromatin into 30nm fibres. Instead, their results show no evidence of 30nm fibres. Rather they view a highly disordered and interdigitated state – which they call a chromatin melt. 

This paper truly, challenges the fundamental beliefs of what we understand of DNA replication. Not by more theories or hypothesis, however, instead by the actual visualisation of an unordered structure, using an advance technique which they claim allows the close-to-native state observation of biological phenomenon. Can you believe your eyes? As this technique of CEMOVIS matures, what other closely held hypothesis will be challenged with visual evidence? 

 

References

 

1. Maeshima K, Eltsov M. Packaging the genome: the structure of mitotic chromosomes.  J Biochem. 2008 Feb;143(2):145-53. Epub 2007 Nov 2. [PMID: 17981824]

2. Eltsova M, MacLellana KM, Maeshimad K, Frangakisb AS and Dubocheta J. Analysis of cryo-electron microscopy images does not support the existence of 30-nm chromatin fibers in mitotic chromosomes in situ. PNAS  December 8, 2008 DOI: 10.1073/pnas.0810057105

3. Al-Amoudi A, Chang J, Leforestier A, McDowall A, Salamin LM, Norlén LPO, Richter K, Blanc NS, Studer D and Dubochet J. Cryo-electron microscopy of vitreous sections. EMBO J 23:3583–3588. [PMID: 15318169]

,

2 Comments

e-Science pollution project makes local news

A colleague here in Newcastle, who sits about 10 meters away from me, (not making any claim what so ever I ever influenced her or her work) has appeared on the local news for her e-science project here at Newcastle. Apart from the fact, the web site tells me I should be using IE, and the fact that I can not embed the video in my blog – You can watch her interview and learn about Lakshmi’s project on a novel way to measure pollution.

Reblog this post [with Zemanta]

, , , , , ,

Leave a comment

The Triumvirate of Scientific Data

In a recent Nature editorial entitled Standardizing data,  several projects were highlighted that are forfeiting there chances of winning a Nobel prize (according to Quackenbush) and championing the blue collar science of data standardization.in the life-sciences.

I wanted to take the article a step further highlight three significant properties of scientific data that I believe to be fundamental in considering how to curate, standardize or simply represent scientific data; from primary data, to lab books, to publication. These significant properties of scientific data are the content, syntax, and semantics, or more simply put -What do we want to say? How do we say it? What does it all mean? These three significant properties of data are what I refer to as the Triumvirate of scientific data.

Content: What do we want to say?

Data Content is defined as the items, topics or information that is “contained in” or represented by a data object. What is, should or must be said. Generic data content standards exists, such as Dublin Core, as well as more focused or domain specific standards. Most aspects of the research life-cycle have a content standard. For example, when submitting a manuscript to a scientific publisher you are required to conform to a content standard for that Journal. For example, PlosOne calls their content standard Criteria for Publication and lists seven points to conform to.
The Minimum Information about [insert favourite technology] are efforts by the relevant communities to define content standards for their experiments. These do (should) not define how the content is represented (in a database or file format) rather they state what  information is required to describe an experiment. Collecting and defining content standards for the life-sciences is the premise of the MIBBI project.

Syntax: How do we say it?

The content of data is independent of any structure, language implementation or semantics. For example when viewing a journal article on Biomed central you typically have the option to view or download the “Full Text” which is often represented in HTML or you have the option of viewing the PDF file or XML. Each representation has the same scientific content to a human but is structured and then rendered (or “presented”) to the user in three different syntax.
The majority of the structural of syntactic representation of scientific data is largely database centric. However, alternative methods can be identified such as Wikis (OpenWetWare, UsefulChem), Blogs (LaBLog), XML, (GelML), RDF (UniProt export) or described as a data model (FuGE) which can be realised in multiple syntax

Semantics: What do we mean?

The explicit meaning of data is very difficult to get right and is a difficult problem in the life-sciences. One word can have many meanings and one meaning can be described by many words. A good example of a failure to correctly determine the semantics of data is described in the paper by Zeeberg et al 2004. In the paper they describe the mis-interpretation of the semantics of gene names. This mis-interpretation of semantics resulted in an irreversible conversion to date-format by Excel and which percolated through to the curated LocusLink public repository.
Within the life-sciences the issue of semantics is being addressed via the use of Controlled vocabularies and ontologies.
According to the Neurocommons definition; A controlled vocabulary is an association between formal names (identifiers) and their definitions.  A ontology is a controlled vocabulary augmented with logical constraints that describe their interrelationships. Not only do we need semantics for data, we need shared semantics, so that we are able to describe data consistently, within laboratories, across collaborations and transcending scientific domains. The OBO Foundry is one of the projects tasked with fostering the orthogonal development of ontologies – one term only appears in one ontology and is referenced by others – with the goal of shared semantics.

Summary

When considering how to curate, standardize or represent scientific data, either internally within laboratories, or externally for publication, the three significant properties of content, syntax and semantics should be considered carefully for the specific data. Consistent representation of data conforming to the Triumvirate of scientific data will provide a platform for the dissemination, interpretation, evaluation and advancement of scientific knowledge.

Acknowledgments

Thanks to Phil Lord for helpful discussions on the Triumvirate of data

Conflict of interest

I am involved in the MIBBI project, the development of GelML and a member of the OBO Foundry via the OBI project.

Reblog this post [with Zemanta]

, , , , , , , , , , , , , ,

4 Comments