Clinvar / GTR Research Questions

Clinvar and GTR: discussion

The subject of this data analysis experiment is a mashup of two datasets, Clinvar and the Genetic Testing Registry. Please see the linked blog posts for a detailed introduction to these datasets, as well as their technical details and links to formal documentation.

Gene (Symbol) based analysis

Does the research represented in ClinVar, indicated by the HUGO gene name symbols assigned to individual variant accessions, demonstrate a relationship with the frequency of distribution of genetic tests for these genes in the Genetic Testing Registry?

Which gene tests in GTR are backed by the most ClinVar submissions?

Are there genes well-represented in terms of ClinVar submissions that are not well represented in the GTR database in terms of gene panel coverage? Or are these two distributions fairly well aligned?

Condition (concept) based analysis

Graph the frequency of conditions (represented by regularized MedGen concept codes aka CUIs) cited in ClinVar versus the frequency of conditions tested for in GTR.

What is the apparent coverage for condition-based testing (GTR) in terms of numbers of accessioned variants for those conditions (ClinVar)?

Further analysis

How does the data landscape change when GTR test_type is restricted to “Clinical”?

Is there a correlation between frequency of pubmed citations for a particular variant and number of GTR tests for the gene in which that variant is found?


  1. Genetic testing (as represented in GTR) follows a gene distribution pattern similar to the distribution of ClinVar submissions.
  2. The greater the number of pubmed citations for variants within particular genes, the greater the number of genetic tests for those genes.

Links to Relevant Research

The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency

Database resources of the National Center for Biotechnology Information

Evaluating the NIH’s New Genetic Testing Registry

Free the Data: The End of Genetic Data as Trade Secrets

A general framework for estimating the relative pathogenicity of human genetic variants

ClinVitae: a unified database of clinically-observed genetic variants aggregated from public sources

ClinVar: public archive of relationships among sequence variation and human phenotype

In Tackling the VUS Challenge, Are Public Databases the Solution or a Liability for Labs?


  • clinvar.variant_summary.RCVaccession — character, list of RCV accessions that report this variant
  • clinvar.variant_summary.VariationID — integer, unique Variation ID assigned to each variant
  • clinvar.variant_summary.GeneID — integer, GeneID in NCBI’s Gene database
  • clinvar.variant_summary.Symbol — character, comma-separated list of GeneIDs overlapping the variation (NULL or ‘-‘ if not named)
  • clinvar.variant_summary.HGVS_c — character, RefSeq cDNA-based HGVS expression
  • clinvar.variant_summary.NumberSubmitters — integer, number of submissions with this variant.
  • clinvar.variant_summary.ClinicalSignificance — character, comma-separated list of values of clinical significance reported for this variation
  • clinvar.var_citations.VariationID — integer, corresponds to VariationID in variant_summary
  • clinvar.var_citations.citation_source — character, name of citation index to which citation_id belongs
  • clinvar.var_citations.citation_id — integer, unique ID within citation_source index for this article (citation)
  • GTR.test_condition_gene.GTR_identifier — character, unique ID for each record in this table
  • GTR.test_condition_gene.concept_type — character, “condition” or “gene” — (If concept_type is “condition”, the Symbol field will be NULL or empty.)
  • GTR.test_condition_gene.Symbol — character, the HUGO gene name for the gene region(s) being tested (or NULL or empty).
  • GTR.test_condition_gene.test_type — character, “Research” or “Clinical” — whether the test results are intended to be used by doctors for patient care (Clinical) or whether testing must be considered for scientific research purposes only.
  • GTR.test_condition_gene.MIM_number — integer, relates to OMIM concept for phenotype indication for test
  • GTR.test_condition_gene.gene_or_SNOMED_CT_ID — integer, relates to either Gene (NCBI) or SNOMED concept for tested gene.
Clinvar / GTR Research Questions

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s