DITA – A framework for scientific publishing?

There are two industry recognised standards for XML based documentation. These are Docbook and DITA (Darwin Information Typing Architecture).

Docbook is the older of the two specifications and created specifically for technical documentation. DITA, is a younger specification which grew out of IBM, and is referred to as having its own architecture and was designed to provide structure to more than just a book. Both specifications are OASIS standards.

A DITA topic

As with XML schemas, both specifications can be extended to include bespoke features. However, Docbook is based more on a book structure with Sections and subsections, where as DITA is built around topics that can be built up in any arrangement based on a document map.  A DITA topic is open to specialisation itself, however, a topic has only three required elements

  1. An id attribute
  2. A title
  3. A body
A topic also has numerous optional elements, utilising HTML syntax. e.g,




A topic can exist as a single XML file which can be composed into any arrangement for publication through the use of a document map. A DITA structure would present a more flexible architecture where the same “topic”, i.e a journal article section, such as an abstract, materials and methods, or results, could be included with ease more than one publication, correctly referenced. In this respect DITA is more like an object-oriented document schema, and can be more easily repurposed (in terms of structure) for any output format (i.e pdf, HTML). In the same respect, Docbook can be configured with some work to behave on a more topic by topic basis and DITA can support a book based methodology. They are after all both XML schemas and are equally extensible or open to specialisation.

As its a standard, whole ecosystems have emerged which makes use of the DITA architecture. For example, DITA for publishers provides libraries to convert DITA markup into HTML, PDF, EPUB, and Kindle rendering support. This allows content structures in DITA to be repurposed for different audiences or different devices with relative ease.

I have recently started using DITA as an architecture to represent content, primarily designed for books. However, with new demands appearing for different delivery mechanisms of the traditional textbook, such as Web delivery and ebooks, DITA is proving to be immensely powerful to deliver the same content through different mediums with relative ease and speed. In using it, it seems obvious that a DITA architecture would benefit the representation of content within a journal article, allowing references re-purposing and multiple format delivery. Maybe a topic for discussion through the Beyond the PDF forum.

In the end, it’s just XML, so I wont repeat the virtues of content markup through XML. However, for me its main advantage is the object oriented -like topic structure as a working architecture.

Enhanced by Zemanta

, , , , , , , , , , , ,

Leave a comment

Peanutbutter has been spread for the last time. Long live jam

Image by Rat Phlegm via Flickr

On looking at the date on my last post, shamefully I realised I have not posted for over a year. Most, well all, of my activity has been via twitter and FriendFeed. The reason for the lack of posts is also related to why I started bloging in the first place. I created this blog when I was a Phd student and maintained it while I was a postdoc to discuss work, ideas and problems. With my move to the biotech industry I was not able to post so freely on work related issues.

I have now moved from being in the biotech industry to the publishing industry (see About page) and I think I am in a position to blog a bit more freely than I have been previously. So with this in mind I have been doing some housekeeping. I have updated my about page accordingly. I have updated my publication page via referencing my Mendeley public profile, as opposed to referencing my CiteULike profile.

However, the main change is that I have retired the “peanutbutter” meme  (hence the title for this post) and replaced it with fgibson.com (the jam). Hopefully wordpress will work its magic and maintain the mapping for all the subscriptions, but just incase, if you want to, please update your feed subscriptions to http://www.fgibson.com

Maybe my top referring search term “is peanutbutter good for you health” will start to be replaced with more informatics relevant topics.

Enhanced by Zemanta

1 Comment

What does swine flu look like?

If you have been following all the major news reports, such as the ones on the BBC then you will have probably have been bombarded  with images of a spherical virus with lots of spikes, as an image of what swine flu looks like. This would not be entirely correct. The first high resolution electron microscopy images of the swine flu virus have been released which show that the virus is not spherical but rahter oblong in shape, as shown in the image below.

Reblog this post [with Zemanta]

, , , , , , ,


Developing ontologies in decentralised settings

I have placed a e-prints of a manuscript, on Nature preceedings, that I have been working on, in collaboration with the authors listed on the manuscript. It presents a review of the available published ontology engineering methodologies, and then assess their suitability when applied to community ontology development (the decentralised setting).

It is a lengthy document. Here is the abstract:

This paper addresses two research questions: “How should a well-engineered methodology facilitate the development of ontologies within communities of practice?” and “What methodology should be used?” If ontologies are to be developed by communities then the ontology development life cycle should be better understood within this context. This paper presents the Melting Point (MP), a proposed new methodology for developing ontologies within decentralized settings. It describes how MP was developed by taking best practices from other methodologies, provides details on recommended steps and recommended processes, and compares MP with alternatives. The methodology presented here is the product of direct first-hand experience and observation of biological communities of practice in which some of the authors have been involved. The Melting Point is a methodology engineered for decentralised communities of practice for which the designers of technology and the users may be the same group. As such, MP provides a potential foundation for the establishment of standard practices for ontology engineering.

, , , ,

Leave a comment

Content, Syntax and Semantics

These are the slides I gave at a DCC workshop entitled, “Digital curation 101″ which aimed to give and overview of what to consider regarding data curation and management in the context of applying for research funding. The presentation starts with definitions of content syntax and semantics, and example of how these concepts are being applied in the life-sciences, specifically proteomics.

Reblog this post [with Zemanta]

, , , , , , , , , , ,

1 Comment

A trip to Cambridge in June

I’m taking a trip to Cambridge between June 7th and June 12th.

1 Comment

A trip to London in May

I’m taking a trip to London on May 14th.

I will be at the Molecular Regulation of Cardiac Disease Symposium May 14-15 2009, London, UK. Hosted by Abcam http://www.abcam.com/index.html?pageconfig=resource&rid=11503&sc_ql=1595&intGoUser=15

Leave a comment

A trip to Manchester in May

I’m taking a trip to Manchester between May 11th and May 13th.

Leave a comment