ClinVar / GTR Conclusion

Conclusion

The analysis of ClinVar and the Genetic Testing Registry in terms of gene Symbols, number of ClinVar Submissions, and number of unique tests in GTR demonstrates a positive linear relationship between research submitted to ClinVar and number of clinical and/or research tests in the Genetic Testing Registry.

While the vast majority of genes reported in ClinVar and tests in GTR follow this positive linear relationship, as the graph below shows, certain notable outliers emerge from the data:

  • The LMNA gene appears to have a high number of GTR tests (272) while showing a relatively low number of ClinVar submissions (515) in proportion to other genes.
  • BRCA2 ranks high in both number of GTR tests (174) and number of ClinVar submissions (7584). Its nearest neighbor in terms of submissions and unique_tests is BRCA1. Together these genes comprise the most tested and most well-researched cancer-causing genes.
  • The gene region known as “TTN” (aka “Titan”) sits well above most genes with a ClinVar submission count at 4609, while showing only 6 tests in GTR.

Positive Linear Relationship

[Click to see source code]

This linear regression scatterplot demonstrates a positive linear relationship between ClinVar Submissions (the x axis) and number of unique tests in GTR (the y axis).

Scatterplot for x=Submissions (ClinVar) per gene and y=unique_tests per gene
Scatterplot for x=Submissions (ClinVar) per gene and y=unique_tests per gene

Clinvar Submissions: univariate distribution

The following skewed-right distribution graph of Submissions per gene Symbol shows that most genes cluster for ClinVar submissions around 1 to 1000, while some heavily-researched genes like TTN, BRCA1, and BRCA2 have many thousands of ClinVar Submissions.

The number of distinct genes in ClinVar is roughly 26,000. Since most of these genes have relatively low Submission counts, the values in the distribution, for the purposes of a more readable graph, have been log10 normalized.

Log10-normalized distribution of ClinVar Submissions per gene.
Log10-normalized distribution of ClinVar Submissions per gene.

GTR unique_tests: univariate distribution

The following skewed-right distribution graph of unique_tests per gene Symbol shows that most genes have few tests (under 25), while some heavily-researched genes like BRCA1 and LMNA have far more registered genetic tests (over 200).

GTR: distribution of unique_tests per gene Symbol (skewed right)
GTR: distribution of unique_tests per gene Symbol (skewed right)

A log10 normalization across the same data produces this graph:

Number of unique_tests per gene Symbol, log10 normalized.
GTR: distribution of unique_tests per gene Symbol, log10 normalized.

We might be able to explain the outliers by looking at the pattern of assignment of ClinicalSignificance to the genes recorded in ClinVar Submissions. For example, we might expect to see a very low rate of “pathogenic” calls on variants within the TTN gene, or a very high rate of “pathogenic” calls on variants in LMNA. (An exercise for another day.)

The above graph does not contain “NA” values; that is, genes noted in ClinVar without tests in GTR cannot be shown on this graph.

Advertisements
ClinVar / GTR Conclusion

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s