BioHackathon series in 2013 and 2014: Improvements of semantic interoperability in life science data and services

Abstract

Publishing databases in the Resource Description Framework (RDF) model is becoming widely accepted to maximize the syntactic and semantic interoperability of open data in life sciences. Here we report advancements made in the 6th and 7th annual BioHackathons which were held in Tokyo and Miyagi respectively. This review consists of two major sections covering: 1) improvement and utilization of RDF data in various domains of the life sciences and 2) meta-data about these RDF data, the resources that store them, and the service quality of SPARQL Protocol and RDF Query Language (SPARQL) endpoints. The first section describes how we developed RDF data, ontologies and tools in genomics, proteomics, metabolomics, glycomics and by literature text mining. The second section describes how we defined descriptions of datasets, the provenance of data, and quality assessment of services and service discovery. By enhancing the harmonization of these two layers of machine-readable data and knowledge, we improve the way community wide resources are developed and published. Moreover, we outline best practices for the future, and prepare ourselves for an exciting and unanticipatable variety of real world applications in coming years.

Publication
F1000Research