Clinvar and GTR: discussion
The subject of this data analysis experiment is a mashup of two datasets, Clinvar and the Genetic Testing Registry. Please see the linked blog posts for a detailed introduction to these datasets, as well as their technical details and links to formal documentation.
Gene (Symbol) based analysis
Does the research represented in ClinVar, indicated by the HUGO gene name symbols assigned to individual variant accessions, demonstrate a relationship with the frequency of distribution of genetic tests for these genes in the Genetic Testing Registry?
Are there genes well-represented in terms of ClinVar submissions that are not well represented in the GTR database in terms of gene panel coverage? Or are these two distributions fairly well aligned?
Condition (concept) based analysis
Graph the frequency of conditions (represented by regularized MedGen concept codes aka CUIs) cited in ClinVar versus the frequency of conditions tested for in GTR.
What is the apparent coverage for condition-based testing (GTR) in terms of numbers of accessioned variants for those conditions (ClinVar)?
How does the data landscape change when GTR test_type is restricted to “Clinical”?
Is there a correlation between frequency of pubmed citations for a particular variant and number of GTR tests for the gene in which that variant is found?
- Genetic testing (as represented in GTR) follows a gene distribution pattern similar to the distribution of ClinVar submissions.
- The greater the number of pubmed citations for variants within particular genes, the greater the number of genetic tests for those genes.
Links to Relevant Research
- clinvar.variant_summary.RCVaccession — character, list of RCV accessions that report this variant
- clinvar.variant_summary.VariationID — integer, unique Variation ID assigned to each variant
- clinvar.variant_summary.GeneID — integer, GeneID in NCBI’s Gene database
- clinvar.variant_summary.Symbol — character, comma-separated list of GeneIDs overlapping the variation (NULL or ‘-‘ if not named)
- clinvar.variant_summary.HGVS_c — character, RefSeq cDNA-based HGVS expression
- clinvar.variant_summary.NumberSubmitters — integer, number of submissions with this variant.
- clinvar.variant_summary.ClinicalSignificance — character, comma-separated list of values of clinical significance reported for this variation
- clinvar.var_citations.VariationID — integer, corresponds to VariationID in variant_summary
- clinvar.var_citations.citation_source — character, name of citation index to which citation_id belongs
- clinvar.var_citations.citation_id — integer, unique ID within citation_source index for this article (citation)
- GTR.test_condition_gene.GTR_identifier — character, unique ID for each record in this table
- GTR.test_condition_gene.concept_type — character, “condition” or “gene” — (If concept_type is “condition”, the Symbol field will be NULL or empty.)
- GTR.test_condition_gene.Symbol — character, the HUGO gene name for the gene region(s) being tested (or NULL or empty).
- GTR.test_condition_gene.test_type — character, “Research” or “Clinical” — whether the test results are intended to be used by doctors for patient care (Clinical) or whether testing must be considered for scientific research purposes only.
- GTR.test_condition_gene.MIM_number — integer, relates to OMIM concept for phenotype indication for test
- GTR.test_condition_gene.gene_or_SNOMED_CT_ID — integer, relates to either Gene (NCBI) or SNOMED concept for tested gene.