Understanding how genetic differences between individuals impact the regulation, expression, and ultimately function of proteins is an important step toward realizing the promise of personal medicine. There are several technical barriers hindering the transition of biological knowledge into the applications relevant to precision medicine. One important challenge for data integration is that new biological sequences (proteins, DNA) have multiple issues related to interoperability potentially creating a quagmire in the published data, especially when different data sources do not appear to be in agreement. Thus, there is an urgent need for systems and methodologies to facilitate the integration of information in a uniform manner to allow seamless querying of multiple data types which can illuminate, for example, the relationships between protein modifications and causative genomic variants. Our work demonstrates for the first time how semantic technologies can be used to address these challenges using the nanopublication model applied to the neXtProt data set, a curated knowledgebase of information about human proteins. We have applied the nanopublication model to demonstrate querying over several named graphs, including the provenance information associated with the curated scientific assertions from neXtProt. We show by way of use cases using sequence variations, post-translational modifications (PTMs) and tissue expression, that querying the neXtProt nanopublication implementation is a credible approach for expanding biological insight.
Christine Chichester, Pascale Gaudet, Oliver Karch, Paul Groth, Lydie Lane, Amos Bairoch, Barend Mons, Antonis Loizou