The increasing and unprecedented publication rate in the biomedical field is a major bottleneck for discovery in Life Sciences. Although the scientific community is limited an inability to manually curate facts from published papers, recent approaches enable the automatic, scalable and reliable extraction of assertions from the scientific literature. While the publication of assertions on the Semantic Web is gaining traction, it also creates new challenges to ensure proper provenance, such as versioning for dataset change-sensitive link generation. Here, we address these issues and describe our efforts to represent the DisGeNET database of human gene-disease associations as permanent, immutable, and provenance rich digital objects called nanopublications. This is the first Linked Dataset that ensure stable interlinking to the assertion and its metadata by trusty URIs. As DisGeNET integrate expert-curated and text-mined data of different origin, the semantic description of the evidence for each assertion is provided to confer trust and allow evidence-based hypothesis generation. We describe our steps to ensure high quality and demonstrate the utility of linking our dataset to others on the emerging Semantic Web.
Núria Queralt-Rosinach, Tobias Kuhn, Christine Chichester, Michel Dumontier, Ferran Sanz, Laura I Furlong
Full publication: Semantic Web, 2015