DISGENET plus unlocks the potential of genomics
for drug R&D and health care
DISGENET plus brings together innovative text mining technologies, agile information update, and data mining expertise resulting in a comprehensive knowledge platform on disease genomics.
Genomic medicine revolution
Nearly all human diseases have a genetic component. Individual susceptibility for most human diseases is influenced by genetic variation, even for the case of infectious diseases.During the last 25 years, we have witnessed an unprecedented progress in the discovery of genetic variation associated with diseases thanks to the advent of high-throughput sequencing technologies and the availability of reference genomes from diverse populations. This has opened the possibility of using this information to improve disease diagnosis, to develop new treatments or to predict the individual susceptibility to diseases.
Characterizing the relationship between genomic variants and diseases provides a powerful tool for identifying processes involved in the pathogenesis of diseases and therefore pinpoint novel strategies for treatment and prevention.
Current challenges in exploiting biomedical data
Despite the impressive progress of disease genomics during the last years, there are several issues that hamper the translation of the accumulated knowledge into real applications with clinical impact.
More than 100,000 associations between genomic variants and common diseases have been discovered thanks to the development of novel technologies that followed the Human Genome Project. The exploration of the genetic architecture of common human diseases has shown that they are driven by a combination of a large number of variants distributed throughout the genome. For instance, a genome of 3·3 billion nucleotides may contain 6 million variants, from which 2800 may affect the function of proteins. To determine which of these variants is leading to the disease phenotype is a cumbersome and time consuming process involving many manual steps. Bioinformatic resources are key to enable, accelerate and guide this analysis.
Several databases and repositories have been developed to store the information on genes and variants associated with diseases. These databases are often focused on specific disease families such as rare diseases, Mendelian diseases, or cancer. As a consequence, the available information on disease genetics is scattered across numerous repositories that do not communicate with one another.
Scientists refer to proteins and genes with different names, making the proper identification of these entities a very cumbersome task. Although community-driven terminologies and ontologies have emerged to standardize the way genes, proteins and also diseases and phenotypes are named, there is currently a wide array of standards in use. There is no consensus on the standards to use: each database or resource annotates their data with different standards. This makes data integration a very challenging task. Thus, to integrate information coming from different sources, standards are required along with approaches to link between them.
Innovation locked in publications
New findings are reported in scientific publications. More than 95,000 scientific works are published each year on disease genomics (more than 8,000 publications/month), leading to an information overload. It is practically impossible for a scientist to keep the pace with the most recent findings in the area. As a consequence, the innovation is locked in publications, and it is not possible to extract this information in a timely manner for further analysis. Text mining tools are required to automatically extract this information, standardize it and incorporate it to databases or processes by analysis pipelines.
To overcome these challenges, we have developed DISGENET plus, based on more than 10 years of experience in the field of knowledge management and data mining in disease genomics, and in the community-recognized open platform DisGeNET. DISGENET plus brings together innovative text mining technologies, agile information update, and data mining expertise in a comprehensive knowledge platform on disease genomics.
DISGENET plus contains one of the largest catalogs of genes and variants associated with human diseases, traits and phenotypes. This catalog is regularly updated to provide the users with the newest information of the advances in the biomedical genomics field.
Our current knowledge on disease genes and variants is gathered in a condensed and harmonized manner, with detailed information on the provenance. In this way, DISGENET plus aggregates data from authoritative databases and enriches it with the latest information extracted from new studies by means of innovative text mining approaches.