Data-sharing on the variant level to improve genetic diagnosis and discovery

5 min readJul 11, 2022

Franklin has collaborated on data-sharing initiatives with additional databases that provide variant-level information with phenotypic features, in order to improve the variant classification process and facilitate the discovery of novel disease-causing variants.

Since the initial stages of genomic research, the field has prided itself on a robust culture of data sharing. In 2001, when the first version of the human genome was published, the wealth of data was shared with the entire healthcare community. Since then, publicly available genetic data has massively accelerated discoveries in the prevention, diagnosis, and treatment of a wide range of diseases, from rare Mendelian disease to common polygenic conditions and cancer.

As for the discovery of novel disease genes associated with rare conditions, the global research community has been working tirelessly since 2009 to find and analyze DNA variants found in whole-exome sequencing (WES) and whole-genome sequencing (WGS). These are two of the most prominent tools used to discover genes that cause diseases related to rare phenotypes. However, more than one decade of scientific progress has only identified causal variants in 4,588 known genes, or about 22% of the total protein-coding genes in the genome according to OMIM — leaving close to 80% yet to be associated with diseases. Moreover, the overwhelming burden of variants of unknown significance (VUSs) is still one of the main challenges facing genetic testing, resulting in a diagnostic yield of no more than 30% on average. VUSs place clinicians in a difficult position, making clinical management recommendations more complex, while also potentially creating anxiety or misunderstanding among patients.

The challenges

One of the challenges that the genetic community faces nowadays is the data bottleneck resulting from inefficient and unstandardized gathering of individual laboratory findings, with most information siloed in restricted databases. Even with the rise in WES and WGS accessibility over the past few years and the consequential exponential increase of genomic data, experts agree that without appropriate analytical tools to safely and accurately share phenotypic and genotypic information among the community, the interpretation and classification of novel variants simply cannot advance at the required speed.

Normal population databases, such as 1000 Genomes, gnomAD or ExAC, have facilitated the identification and classification of rare pathogenic variants by publicly sharing the aggregated data of mostly healthy individuals in an accessible way. Nevertheless, these databases lack phenotypic information, and consequently, are not optimized for the interpretation of variants associated with rare phenotypes, particularly those characterized by more complex scenarios like mild presentation, incomplete penetrance, mosaicism, or late-onset.

Overcoming the limitations with collaboration

To overcome these limitations, databases that focus on sharing real-world evidence of cases with rare phenotypes are gaining more ground in the professional community. Following the MatchMaker Exchange example, which aims to connect exome, genome, and phenotype databases at the gene level, Franklin, along with DECIPHER, MyGene2/Geno2MP, and VariantMatcher, have joined an initiative to facilitate collaboration and data-sharing for genomic variants.

These tools have created a public way to share genomic and phenotypic data from individuals with rare phenotypes, making the data easily accessible to researchers, clinicians, and healthcare providers, with the goal of ultimately impacting patients’ lives. Now, the consortia aim to connect these databases and others in a federated network using the GA4GH Data Connect standard, with two main goals:

Accelerating the identification of novel disease-causing variants
Providing a more precise classification of variants of uncertain significance (VUS)

Franklin’s impact

Franklin is no stranger to the huge impact data-sharing can have on the genetics field. The development of tools to facilitate responsible, practical, and comprehensive sharing of genomic data among experts is one of the key pillars of the platform, and a mission we take very seriously at Genoox. Supporting multiple genetic applications such as rare diseases, oncology, and carrier screening, Franklin not only collects in-depth variant evidence and automates ACMG-based classification for several variant types, but also provides dedicated features that allow users to share de-identified evidence and insights with the rest of the community.

One of Franklin’s core principles for data-sharing is that it should not add any additional effort to the users’ day-to-day tasks. In order to overcome one of the main bottlenecks for sharing, Franklin streamlines the process into the variant interpretation workflow, encouraging users to contribute to the real-world database in an effortless manner, with just one click.

Franklin’s additional data layer has already led to the reclassification of many VUSs to either pathogenic or benign, thereby assisting in the discovery of rare causal variants. All users are able to start discussions on variants or cases and collaborate among experts to resolve uncertainties, building networks with professionals from other organizations who have experience with similar cases. What is more, Franklin collects and aggregates all variants detected in genetic cases to build up the Community Frequency database, helping to identify common variants in global or specific populations.

By allowing Franklin’s users to contact other users who have seen the same variant on a patient and ask about their case, evidence for interpretation, and classification, we get to see how the Franklin Community impacts real lives by connecting experts and building a knowledge base on real-time evidence from all around the globe. Like Erez, we believe that sharing is caring.

With Franklin, the variant interpretation process goes hand-in-hand with community data-sharing. By taking advantage of data from more than 20,000 professionals from 2,000 organizations all over the world, experts are able to gain access to unique insights that can make a difference in terms of genetic diagnosis.

Through this data-sharing initiative, Franklin joined forces with other tools to work towards one common goal: Accelerating the research of the remaining 80% of the predicted protein-encoding genes, with the aim of matching genomic regions and variants to disease phenotypes, ultimately providing clinicians with greater insights and confidence when treating their patients.

Data-sharing on the variant level to improve genetic diagnosis and discovery

The challenges

Overcoming the limitations with collaboration

Franklin’s impact

Written by Franklin by Genoox