Advertisement

Estimating diagnostic noise in panel-based genomic analysis

  • Robin N. Beaumont
    Affiliations
    Institute of Biomedical and Clinical Science, College of Medicine and Health, University of Exeter Medical School, University of Exeter, Exeter, United Kingdom
    Search for articles by this author
  • Caroline F. Wright
    Correspondence
    Correspondence and requests for materials should be addressed to Caroline F. Wright, Institute of Biomedical and Clinical Science, College of Medicine and Health, University of Exeter Medical School, University of Exeter, Exeter EX1 2LU, United Kingdom
    Affiliations
    Institute of Biomedical and Clinical Science, College of Medicine and Health, University of Exeter Medical School, University of Exeter, Exeter, United Kingdom
    Search for articles by this author
Open AccessPublished:August 03, 2022DOI:https://doi.org/10.1016/j.gim.2022.06.008

      ABSTRACT

      Purpose

      Gene panels with a series of strict variant filtering rules are often used for clinical analysis of exomes and genomes. Panel sizes vary, affecting the test’s sensitivity and specificity. We investigated the background rate of candidate variants in a population setting using gene panels developed to diagnose a range of heterogeneous monogenic diseases.

      Methods

      We used the Gene2Phenotype database with the Variant Effect Predictor plugin to identify rare nonsynonymous variants in exome sequence data from 200,643 individuals in UK Biobank. We evaluated 5 clinically curated gene panels of varying sizes (50-1700 genes).

      Results

      Bigger gene panels resulted in more prioritized variants, varying from an average of approximately 0.3 to 3.5 variants per person. The number of individuals with prioritized variants varied linearly with coding sequence length for monoallelic genes (∼300 individuals per 1000 base pairs) and quadratically for biallelic genes, with notable outliers.

      Conclusion

      Although large gene panels may be the best strategy to maximize diagnostic yield in genetically heterogeneous diseases, they frequently prioritize likely benign variants requiring follow up. Most individuals have ≥1 rare nonsynonymous variant in panels containing >500 disease genes. Extreme caution should be applied when interpreting candidate variants, particularly in the absence of relevant phenotypes.

      Keywords

      Introduction

      Clinical analysis of genetically heterogenous diseases often uses large gene panels with a series of strict variant filtering rules to identify likely disease-causing variants in monogenic disease genes.
      • Martin A.R.
      • Williams E.
      • Foulger R.E.
      • et al.
      PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels.
      • Beauchamp K.A.
      • Muzzey D.
      • Wong K.K.
      • et al.
      Systematic design and comparison of expanded carrier screening panels.
      • Mendeliome Group Saudi
      Comprehensive gene panels provide advantages over clinical exome sequencing for Mendelian diseases.
      These panels may either be targeted in the laboratory or targeted informatically after exome or genome sequencing. Numerous gene curation initiatives exist
      • DiStefano M.T.
      • Goehringer S.
      • Babb L.
      • et al.
      The Gene Curation Coalition: a global effort to harmonize gene-disease evidence resources.
      to evaluate gene–disease relationships, and lists of approved genes may be combined to create panels for comprehensive diagnostic testing of individuals presenting with appropriate clinical phenotypes. Panels can vary enormously in size—from several to several thousand genes—depending on the heterogeneity of the clinical presentation of the disease and the evidence thresholds required for inclusion. The size and gene context of a panel affect the specificity and sensitivity of the test but these standard test performance characteristics are rarely investigated. Although studies have shown that bigger gene panels aren’t necessarily better,
      • Alfares A.A.
      • Kelly M.A.
      • McDermott G.
      • et al.
      Results of clinical genetic testing of 2,912 probands with hypertrophic cardiomyopathy: expanded panels offer limited additional sensitivity.
      • Hosseini S.M.
      • Kim R.
      • Udupa S.
      • et al.
      Reappraisal of reported genes for sudden arrhythmic death: evidence-based evaluation of gene validity for Brugada syndrome.
      • Chambers C.
      • Jansen L.A.
      • Dhamija R.
      Review of commercially available epilepsy genetic panels.
      nonetheless a focus on increasing diagnostic yields can result in a tendency toward ever larger panels.
      We sought to investigate how gene panel size affects the background rate of candidate variants prioritized for classification by applying standard rare disease variant filtering pipelines in a population setting, in which most variants will be benign, ie, not highly penetrant causes of monogenic disease. Using exome sequencing data from UK Biobank (UKB), which is known to have a strong ascertainment bias toward healthy adults,
      • Fry A.
      • Littlejohns T.J.
      • Sudlow C.
      • et al.
      Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population.
      ,
      • Batty G.D.
      • Gale C.R.
      • Kivimäki M.
      • Deary I.J.
      • Bell S.
      Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis.
      we tested 5 clinically curated gene panels that were developed to diagnose a range of heterogeneous monogenic diseases. We compared a subset of the results with data from the Deciphering Developmental Disorders (DDD) study, which includes families affected by severe undiagnosed developmental disorders (DDs).
      Deciphering Developmental Disorders Study
      Prevalence and architecture of de novo mutations in developmental disorders.
      ,
      Deciphering Developmental Disorders Study
      Large-scale discovery of novel genetic causes of developmental disorders.

      Materials and Methods

      The Variant Effect Predictor (VEP) plugin, Gene2Phenotype (G2P),
      • Thormann A.
      • Halachev M.
      • McLaren W.
      • et al.
      Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP.
      aims to identify likely disease-causing variants in a standard variant call format (VCF; https://samtools.github.io/hts-specs/) file based on a list of candidate genes, with allelic requirement for disease, and predicted variant consequences. The plugin takes a list of input genes, along with their annotation as monoallelic or biallelic. It uses VEP
      • McLaren W.
      • Gil L.
      • Hunt S.E.
      • et al.
      The Ensembl Variant Effect Predictor.
      computed consequence of variants in an individual VCF file to identify nonsynonymous variants within the input genes with Genome Aggregation Database (gnomAD) allele frequencies
      • Karczewski K.J.
      • Francioli L.C.
      • Tiao G.
      • et al.
      The mutational constraint spectrum quantified from variation in 141,456 humans.
      of <0.0001 for variants in monoallelic disease genes or 0.005 for variants in biallelic disease genes. Biallelic disease genes also require an individual to have 2 or more such variants within the gene (regardless of phase) to be identified as potentially pathogenic. We used the G2P-VEP to identify rare nonsynonymous variants in exome sequence data from 200,643 individuals in UKB.
      • Van Hout C.V.
      • Tachmazidou I.
      • Backman J.D.
      • et al.
      Exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank.
      We limited our analysis to single nucleotide variants and small insertion/deletions on the autosomes. Ancestry was determined genetically using principal components analysis. Individuals were clustered using k-means clustering, and those clustering with individuals from the “EUR” group in 1000 Genomes were defined as European. South Asian and African genetic ancestry individuals were defined analogously as those clustering with individuals of “SAS” and “AFR” populations from 1000 Genomes, respectively. We investigated 5 clinically curated panels of known monogenic disease-causing genes: DDs, cancer, cardiac, skin, and eye (downloaded from https://www.ebi.ac.uk/gene2phenotype on June 26, 2021; cardiac panel, personal communication, July 23, 2021; Supplemental Tables 1-5).
      For each individual in UKB, we examined the number of variants in genes within each panel that were prioritized by the plugin. For each gene in the panel, we also examined the number of individuals who had the prioritized variants. We estimated the expected number of individuals with variants identified within a given gene on the basis of the length of the coding sequence of the Ensembl canonical transcript of the gene. For monoallelic disease genes, we fitted a simple linear regression line:
      individualsgene length


      For biallelic disease genes, in which 2 rare nonsynonymous variants are required in each individual, we fitted the data to a quadratic model:
      individualsgene length+gene length2


      To assess the effect of overlapping phenotypes in common diseases, of which a small subset might be caused by monogenic disease variants, we tested the difference between variants prioritized in cancer cases vs controls using the UKB cancer registry data. We then compared the distribution of variants identified within cases and controls separately within the cancer gene panel.
      Given the widespread use of pathogenicity predictors and clinical databases in diagnostic variant classification,
      • Richards S.
      • Aziz N.
      • Bale S.
      • et al.
      Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
      we further assessed the effect of using rare exome variant ensemble learner (REVEL) scores
      • Ioannidis N.M.
      • Rothstein J.H.
      • Pejaver V.
      • et al.
      REVEL: an ensemble method for predicting the pathogenicity of rare missense variants.
      and ClinVar
      • Landrum M.J.
      • Lee J.M.
      • Riley G.R.
      • et al.
      ClinVar: public archive of relationships among sequence variation and human phenotype.
      variant classifications provided in the G2P-VEP plugin output files. Quintiles of REVEL score were calculated on the basis of all variants prioritized by G2P-VEP within each gene panel separately. For monoallelic genes, variants were classified on the basis of the highest REVEL score or ClinVar classification of all variants prioritized within each gene for each individual. For biallelic genes, variants were classified on the basis of the second highest REVEL score or ClinVar classification of the 2 variants with the highest consequences within each gene for each individual.
      To assess the diagnostic benefits of using the largest gene panel in a disease cohort, as well as the benefit of having haplotype data, we tested the number of variants prioritized using the DD gene panel in exome sequence data from 9859 proband–parent trios in the DDD study.
      Deciphering Developmental Disorders Study
      Prevalence and architecture of de novo mutations in developmental disorders.
      ,
      Deciphering Developmental Disorders Study
      Large-scale discovery of novel genetic causes of developmental disorders.
      Ancestry was determined genetically using principal components as described previously.
      • Martin H.C.
      • Jones W.D.
      • McIntyre R.
      • et al.
      Quantifying the contribution of recessive coding variation to developmental disorders.
      For individuals with at least 2 heterozygous variants prioritized in biallelic genes, we examined parental genotype to determine allelic transmission. Individuals with at least 1 variant unambiguously inherited from each parent were classified as having variants in trans, ie, true compound heterozygotes. Cases in which all variants prioritized for the individual within the gene were unambiguously inherited from a single parent, they were classified as being in cis. Cases in which variants could not be classified by either of these rules were classified as having unknown transmission.

      Results

      The number of genes in each panel is shown in Table 1 along with the total length of the genes in each list. Notably, the largest panel includes >7.3% of the exome. Supplemental Figure 1 shows an UpSet plot of the overlap between genes on each panel. Figure 1 shows the number of variants prioritized in each individual in UKB compared with the number of genes in the gene panel. As anticipated, panels containing more genes resulted in more variants per individual being prioritized. The mean number of variants prioritized per individual ranged from 0.3 in the cancer and cardiac panels to 3.5 in the DD panel (Table 1).
      Table 1Number of genes in each gene panel and mean and range of variants prioritized per person for each gene list
      PanelNumber of GenesBiallelic GenesMonoallelic GenesTotal Length (bp)Proportion of Exome (%)Mean (minimum-maximum) Number of Rare Variants Prioritized in UKBNumber of Genes With Zero Individuals Prioritized in UKB
      Cancer913762238,6080.40.3 (0-8)0
      Cardiac49743229,7640.40.3 (0-8)0
      DD170811306444,323,1497.33.5 (0-37)57
      Eye5364081781,343,2352.31.3 (0-24)16
      Skin293189128750,6731.30.9 (0-14)11
      bp, base pair; DD, developmental disorders; UKB, UK Biobank.
      Figure thumbnail gr1
      Figure 1Number of rare nonsynonymous variants prioritized by the Variant Effect Predictor Gene2Phenotype plugin in each person in UK Biobank for each gene panel vs number of genes in panel. DD, developmental disorder.
      Supplemental Figure 2 shows histograms of the number of individuals who had variants prioritized within each gene for each panel. The distribution appears similar across all gene panels tested, suggesting that most variants prioritized represent random noise rather than pathogenic variants. To examine the effect of overlapping common disease within the UKB, we compared these distributions for the cancer panel within cancer cases and controls separately (Supplemental Figure 3). We found no evidence for a difference in distribution between cases and controls (P = .903), presumably because of the predominance of sporadic disease in this cohort.
      For monoallelic disease genes across all panels, the number of variants prioritized in each gene increased linearly with the length of the canonical transcript of the gene (Figure 2 and Supplemental Figure 4); for each 1000 additional base pairs, there are an average of 295.8 additional variants prioritized. For biallelic disease genes, the number of variants scaled quadratically with the coding sequence length (Figure 2). For both monoallelic and biallelic genes, there were notable outliers, which we defined as >3 SDs from the regression line and well-known large genes that resulted in a disproportionately large number of prioritized variants (such as NEB, SYNE1, and TTN; Table 2 and Supplemental Table 6).
      Figure thumbnail gr2
      Figure 2Coding sequence length of gene vs number of individuals with at least 1 variant (monoallelic) or 2 variants (biallelic) in the gene. Lines of best-fit are linear for monoallelic and quadratic for biallelic genes. Genes with >3 SD from the regression line are highlighted in blue. x-axis of the monoallelic genes has been truncated to increase resolution. The full figure can be found in .
      Table 2Notable genes across all panels
      GeneIndividualsLength (bp)Panel(s)TypeReason
      RP1L179747203EyeMonoallelicOutlier (high)
      PLEC1622713644SkinMonoallelicOutlier (high)
      COL7A165548835SkinMonoallelicOutlier (high)
      HYDIN265615366DDBiallelicOutlier (high)
      HSPG2219113176DDBiallelicOutlier (high)
      PIEZO126537566DDBiallelicOutlier (high)
      MTHFR11991971DDBiallelicOutlier (high)
      ERCC612534482DD, eye, skinBiallelicOutlier (high)
      HR13273570DD, skinBiallelicOutlier (high)
      HSPG2219113176EyeBiallelicOutlier (high)
      ZNF469224611862EyeBiallelicOutlier (high)
      RP1L115197203EyeBiallelicOutlier (high)
      BFSP115241998EyeBiallelicOutlier (high)
      PLEC442213644SkinBiallelicOutlier (high)
      KIAA110921615282DDBiallelicOutlier (low)
      HERC119814586DDBiallelicOutlier (low)
      TTN30308107976CardiacMonoallelicGene size
      MACF1582322668DDMonoallelicGene size
      SYNE1297026394DDBiallelicGene size
      NEB273125578DDBiallelicGene size
      Notable genes across all panels, highlighted either because they lie >3 SDs outside of the trendline or are very large (coding length of canonical transcript >20,000 bp) resulting in large numbers of variants being prioritized.
      bp, base pair; DD, developmental disorders.
      Comparing individuals of European ancestry to those of non-European ancestry within UKB we found that individuals of non-European ancestry had a larger number of variants prioritized on average than individuals of European ancestry (5.4 and 3.3, respectively). Most variants prioritized by G2P-VEP were missense variants with an average of 3.4 missense variants prioritized per gene compared with just 0.1 predicted loss of function (LoF) variants (including stop gain, splice acceptor, splice donor, and frameshift variants; Table 3). We further evaluated the effect of using population-specific allele frequencies (European, African, and South Asian) from gnomAD for variant prioritization. We found that, although applying population-specific frequencies in addition to total allele frequency inevitably reduced the number of prioritized variants, individuals classified as African ancestry still had significantly more prioritized variants than individuals of South Asian ancestry, who again had more prioritized variants than European ancestry individuals (Supplemental Figure 5).
      Table 3Subsets of mean, median, minimum, and maximum variants prioritized in the DD panel per individual in the DDD cohort compared with UKB
      CohortMeanSDMedianMinimum-MaximumN
      UKB3.52.33(0-37)200643
      UKB EUR3.32.13(0-16)184477
      UKB non-EUR5.43.55(0-37)16155
      UKB LoF variants0.10.30(0-5)200643
      UKB missense variants3.42.33(0-36)200643
      UKB monoallelic genes2.61.72(0-21)200643
      UKB biallelic genes0.91.50(0-26)200643
      DDD parents (unaffected)4.93.74(0-163)17077
      DDD probands5.53.55(0-104)9859
      DDD probands EUR5.03.05(0-104)8522
      DDD probands non-EUR8.64.58(0-39)1337
      DDD probands LoF variants0.40.80(0-9)9859
      DDD probands missense variants4.53.24(0-102)9859
      DDD probands monoallelic genes3.52.23(0-78)9859
      DDD probands biallelic genes2.12.52(0-29)9859
      DDD (variants in cis in biallelic genes)1.11.40(0-13)9859
      DDD (variants in trans in biallelic genes)0.61.10(0-7)9859
      DD, developmental disorders; DDD, Deciphering Developmental Disorders; EUR, European, LoF, loss of function; UKB, UK Biobank.
      To investigate the potential for using additional evidence to reduce the number of prioritized missense variants, we separated the prioritized variants on the basis of quantile of REVEL score or ClinVar pathogenicity classifications (Figure 3). There were substantially more variants prioritized with REVEL scores in the lowest quintile than in the highest quintile suggesting that the number of variants highlighted by the pipeline can be reduced by using REVEL or other appropriate pathogenicity predictors.
      • Gunning A.C.
      • Fryer V.
      • Fasham J.
      • et al.
      Assessing performance of pathogenicity predictors using clinically relevant variant datasets.
      Examining the ClinVar classifications of the variants prioritized showed that more of the prioritized variants were classified in ClinVar as benign/likely benign than as pathogenic/likely pathogenic, highlighting the benefit of using clinical variant databases in variant prioritization.
      Figure thumbnail gr3
      Figure 3Scatter plot of length of gene vs number of individuals with variants identified in that gene, split by REVEL quintile (left) or ClinVar classification (right). Individuals are classified according to the REVEL or ClinVar classification of the variant with the most severe consequence (monoallelic, top) or the least severe consequence of the 2 highest consequence variants (biallelic, bottom). x-axis of the monoallelic genes has been truncated to increase resolution. REVEL, rare exome variant ensemble learner.
      Using just the largest panel (DDs), there were substantially more variants prioritized per individual for probands and parents in the DDD cohort than for those in the UKB (Table 3). This difference is likely to be due to the different study designs between UKB (a cross-sectional population cohort) and the DDD study (a disease cohort), resulting in opposite recruitment biases, although may also be due in part to technical reasons such as differences in the exome sequencing pipelines or reference genome builds between the 2 cohorts. Within the DDD cohort, probands had more variants on average (5.5) than the unaffected parents (4.9); the former likely reflecting the high rate of pathogenic de novo variants in the cohort and the latter likely reflecting incomplete penetrance in dominant conditions. In common with the UKB, individuals with genetically determined non-European ancestry had more prioritized variants than those with European ancestry (Table 3). Using parental information to determine allelic transmission for biallelic genes in the DDD cohort (ie, to separate individuals for whom prioritized variants within a gene all appeared in cis from those in whom at least 2 variants were in trans), we found that on average there were 0.3 biallelic disease genes per individual with variants in trans on average compared with 0.5 with variants in cis (Supplemental Figure 6).

      Discussion

      This study aimed to estimate the diagnostic noise in panel-based genomic analysis by quantifying the normal background rate of rare nonsynonymous variants in monogenic disease genes compared with the genomic footprint of the panel. We used G2P-VEP to estimate the number of potentially pathogenic variants prioritized in a clinically unselected population of individuals from the UKB. We found that on average, for every 100 genes in a gene panel, there were approximately 0.2 rare predicted LoF and missense variants prioritized in each individual, increasing linearly with the number of genes in the panel. This finding suggests that for larger gene panels (>500 genes), almost every individual tested will have at least 1 rare nonsynonymous candidate diagnostic variant that is likely to be benign, ie, a variant that does not cause disease in that individual. Restricting missense variants to just ClinVar pathogenic/likely pathogenic or REVEL score of >0.7 (in keeping with variant interpretation guidelines from the UK Association of Clinical Genomics Scientists

      Ellard S, Baple EL, Callaway A, et al. ACGS best practice guidelines for variant classification in rare disease 2020. Published February 4, 2020. https://www.acgs.uk.com/media/11631/uk-practice-guidelines-for-variant-classification-v4-01-2020.pdf

      ) substantially reduces the number of background candidate variants but still results in around approximately 0.003 variants prioritized per individual per 100 genes in the panel. Although we make no recommendation about the effectiveness of such a strategy in a diagnostic setting, because this approach is likely to filter out some genuinely pathogenic variants, even using these very conservative thresholds we estimate that panels including all monogenic disease genes (∼5000 genes; the so-called Mendeliome
      • Pengelly R.J.
      • Ward D.
      • Hunt D.
      • Mattocks C.
      • Ennis S.
      Comparison of Mendeliome exome capture kits for use in clinical diagnostics.
      ) would prioritize a benign variant in around 1 in every 6 individuals tested.
      Comparing the number of individuals with variants prioritized within each gene across all panels revealed that this number increased linearly with the length of the coding sequence for monoallelic genes and quadratically for biallelic genes. We identified a number of genes that fell outside these trends (Figure 2 and Table 2). Outlying genes with more variants prioritized than expected on the basis of their size included highly variable genes (eg, PLEC) and genes with highly homologous regions that are known to be difficult to align with short read sequencing data (eg, HYDIN and RP1L1). There were 2 highly constrained biallelic genes (KIAA1109 and HERC1) with substantially fewer variants than expected. Genes with more identified individuals than expected warrant caution, even if identified in affected individuals, because they harbor a large proportion of benign variants. Conversely, biallelic variants in KIAA1109 or HERC1 may be more likely to be true diagnoses in affected individuals because they are found at a lower rate in the general population.
      We believe our findings are extremely relevant to clinical variant filtering after targeted panel, exome, or genome sequencing for heterogeneous monogenic diseases and before diagnostic variant classification.
      • Richards S.
      • Aziz N.
      • Bale S.
      • et al.
      Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
      ,

      Ellard S, Baple EL, Callaway A, et al. ACGS best practice guidelines for variant classification in rare disease 2020. Published February 4, 2020. https://www.acgs.uk.com/media/11631/uk-practice-guidelines-for-variant-classification-v4-01-2020.pdf

      Although the exact details of variant filtering for rare disease diagnosis differ between laboratories, there are some standard practices that are universally accepted and shared by most pipelines as well as the G2P-VEP plugin. First, individual variants must be rare, with allele frequency thresholds that depend upon mechanism of disease; the thresholds used in the G2P-VEP plugin (<0.0001 for monoallelic and <0.005 in biallelic genes) are very conservative because they were developed for use in DDs, in which causal variants are likely to be depleted in the population. Although variants present at high frequency in subpopulations in reference data sets may be further filtered out, currently every individual in gnomAD has >200 rare (minor allele frequency < 0.1%) coding variants

      Gudmundsson S, Singer-Berk M, Watts NA, et al. Variant interpretation using population databases: lessons from gnomAD. Hum Mutat. 2021:10.1002/humu.24309. doi:10.1002/humu.24309

      and many of those prioritized in this study were not present in reference data sets. Second, variants must have an interpretable functional effect, which in practice usually means that nonsynonymous coding variants are selected for inclusion. Third, although a single variant alone is sufficient to cause monoallelic disease, 2 variants in trans in the same person are required to cause biallelic disease; in the absence of inheritance or long-read sequence data, the phase of these candidate biallelic variants is usually unknown and double heterozygous carriers are prioritized. These rules are all requirements of the G2P-VEP plugin. In addition, many pipelines will also use pathogenicity prediction scores (such as REVEL) or presence in clinical databases (such as ClinVar) as a further filtering step, which we have also replicated in this study. The strengths of our analyses include the size of the UKB and the population-based nature of the cohort. The majority of the participants are healthy and unaffected by the diseases represented by the panels tested here. Sensitivity analysis comparing cancer cases with controls suggest that the presence of undiagnosed cases in the cohort is unlikely to have a large effect on the results presented in this article, although the presence of some relevant monogenic disease cases in the cohort
      • de Marvao A.
      • McGurk K.A.
      • Zheng S.L.
      • et al.
      Phenotypic expression and outcomes in individuals with rare genetic variants of hypertrophic cardiomyopathy.
      ,
      • Kingdom R.
      • Tuke M.
      • Wood A.
      • et al.
      Rare genetic variants in genes and loci linked to dominant monogenic developmental disorders cause milder related phenotypes in the general population.
      remains a limitation of our approach.
      In a clinical testing setting, genes and gene panels would ideally be selected by clinicians with a deep knowledge of genotype–phenotype correlations, thus ensuring the appropriateness of the test for individual patients. Moreover, in addition to automated pipelines, prioritized variants will be reviewed by variant interpretation scientists who will decide which variants to report. Finally, multidisciplinary team meetings may be used to discuss candidate variants, potentially gather additional evidence (such as phase, segregation or functional data), and reach a final variant classification. Therefore, many of the variants prioritized in our current analysis would be excluded from the final report. However, the more variants that are prioritized—either through the choice of panel test ordered or through the specific variant filters applied—the more variants will require evaluation by these specialists, and potentially the more variants will be classified as being of uncertain significance, which has major resource implications for diagnostic services. In many cases, this is unavoidable because of the highly heterogenous nature of some diseases (such as DDs), making it imperative that clinicians and clinical scientists are aware of the magnitude of the problem.
      The growing interest in newborn sequencing and differences in the genes included in panels used
      • DeCristo D.M.
      • Milko L.V.
      • O’Daniel J.M.
      • et al.
      Actionability of commercial laboratory sequencing panels for newborn screening and the importance of transparency for parental decision-making.
      means that estimating the background distribution of likely benign variants identified by panel-based genomic analysis is essential both for appropriate gene selection and interpretation of the results. Although panel-based analysis is an efficient way of identifying possibly pathogenic variants in known monogenic genes, many of the variants prioritized by such approaches will not be pathogenic and as the genomic footprint of the panel increases, so too will the number of prioritized variants. As with any test, there is a trade-off between sensitivity and specificity; in the case of rare diseases, including more genes on a panel, or using less strict variant filtering rules, will result in more diagnoses (higher sensitivity) but will also prioritize more benign variants (lower specificity) and produce a greater variant classification burden. Owing to the bias toward Europeans in existing cohorts, these issues are more pronounced for individuals of non-European genetic ancestry (in whom a greater number of variants will be prioritized owing to inaccurate allele frequency annotations), highlighting the need for greater ethnic diversity among sequenced populations to improve diagnostic accuracy.
      • Popejoy A.B.
      • Ritter D.I.
      • Crooks K.
      • et al.
      The clinical imperative for inclusivity: race, ethnicity, and ancestry (REA) in genomics.

      Conclusion

      Although large gene panels may be the best strategy to maximize diagnostic yield in genetically heterogeneous diseases, they will frequently prioritize benign candidate variants potentially requiring additional clinical follow up. Our findings highlight the value of careful phenotypic selection of individuals for panel testing to improve clinical specificity. Our findings also support the need for sequence data from more ethnically diverse groups to improve variant filtering in non-Europeans. Overall, we suggest that extreme caution should be applied when interpreting candidate pathogenic variants from large gene panel analyses, particularly in the absence of relevant phenotypes.

      Data Availability

      The UK Biobank data are available from https://www.ukbiobank.ac.uk on application. The Deciphering Developmental Disorders data are available on application through the European Genome-Phenome Archive, see https://ega-archive.org/studies/EGAS00001000775. Gene panels can be downloaded from https://www.ebi.ac.uk/gene2phenotype/.

      Conflict of Interest

      The authors have no conflicts of interest.

      Acknowledgments

      The authors wish to thank Andrew Wood for assistance with the UK Biobank exome sequencing data, James Ware for providing the cardiac gene panel, and Helen Firth for helpful suggestions. This work was supported by the Medical Research Council (MR/T00200X/1) and the Wellcome Trust (200990/A/16/Z) and was conducted using the UK Biobank resource under application number 49847. The Deciphering Developmental Disorders study presents independent research commissioned by the Health Innovation Challenge Fund (grant number HICF-1009-003), a parallel funding partnership between the Wellcome Trust and the Department of Health, and the Wellcome Trust Sanger Institute (grant number WT098051); see Deciphering Developmental Disorders11 or www.ddduk.org/access.html for full acknowledgment. The authors would like to acknowledge the use of the University of Exeter High-Performance Computing (HPC) facility in carrying out this work. The views expressed in this work are those of the authors and not necessarily those of the funders. For the purpose of open access, the author has applied a CC BY public copyright license to any author accepted manuscript version arising from this submission.

      Author Information

      Conceptualization: C.F.W.; Data Curation: R.N.B.; Formal Analysis: R.N.B.; Software: R.N.B.; Visualization: R.N.B.; Investigation: R.N.B.; Methodology: R.N.B., C.F.W.; Project Administration: C.F.W.; Supervision: C.F.W.; Funding Acquisition: C.F.W.; Writing-original draft: R.N.B., C.F.W.; Writing-review and editing: R.N.B., C.F.W.

      Ethics Declaration

      The UK Biobank has approval from the North West Multi-centre Research Ethics Committee (21/NW/0157) as a Research Tissue Bank approval. The Deciphering Developmental Disorders study has UK Research Ethics Committee approval (10/H0305/83, granted by Cambridge South Research Ethics Committee and GEN/284/12, granted by the Republic of Ireland Research Ethics Committee). All individuals included in this study gave appropriate consent.

      Additional Information

      References

        • Martin A.R.
        • Williams E.
        • Foulger R.E.
        • et al.
        PanelApp crowdsources expert knowledge to establish consensus diagnostic gene panels.
        Nat Genet. 2019; 51: 1560-1565https://doi.org/10.1038/s41588-019-0528-2
        • Beauchamp K.A.
        • Muzzey D.
        • Wong K.K.
        • et al.
        Systematic design and comparison of expanded carrier screening panels.
        Genet Med. 2018; 20: 55-63https://doi.org/10.1038/gim.2017.69
        • Mendeliome Group Saudi
        Comprehensive gene panels provide advantages over clinical exome sequencing for Mendelian diseases.
        Genome Biol. 2015; 16 (Published correction appears in Genome Biol. 2015;16:226.): 134
        • DiStefano M.T.
        • Goehringer S.
        • Babb L.
        • et al.
        The Gene Curation Coalition: a global effort to harmonize gene-disease evidence resources.
        Genet Med. 2022; S1098-3600: 00746-00748https://doi.org/10.1016/j.gim.2022.04.017
        • Alfares A.A.
        • Kelly M.A.
        • McDermott G.
        • et al.
        Results of clinical genetic testing of 2,912 probands with hypertrophic cardiomyopathy: expanded panels offer limited additional sensitivity.
        Genet Med. 2015; 17 (Published correction appears in Genet Med. 2015;17(4):319.): 880-888
        • Hosseini S.M.
        • Kim R.
        • Udupa S.
        • et al.
        Reappraisal of reported genes for sudden arrhythmic death: evidence-based evaluation of gene validity for Brugada syndrome.
        Circulation. 2018; 138: 1195-1205https://doi.org/10.1161/CIRCULATIONAHA.118.035070
        • Chambers C.
        • Jansen L.A.
        • Dhamija R.
        Review of commercially available epilepsy genetic panels.
        J Genet Couns. 2016; 25: 213-217https://doi.org/10.1007/s10897-015-9906-9
        • Fry A.
        • Littlejohns T.J.
        • Sudlow C.
        • et al.
        Comparison of sociodemographic and health-related characteristics of UK Biobank participants with those of the general population.
        Am J Epidemiol. 2017; 186: 1026-1034https://doi.org/10.1093/aje/kwx246
        • Batty G.D.
        • Gale C.R.
        • Kivimäki M.
        • Deary I.J.
        • Bell S.
        Comparison of risk factor associations in UK Biobank against representative, general population based studies with conventional response rates: prospective cohort study and individual participant meta-analysis.
        BMJ. 2020; 368: m131https://doi.org/10.1136/bmj.m131
        • Deciphering Developmental Disorders Study
        Prevalence and architecture of de novo mutations in developmental disorders.
        Nature. 2017; 542: 433-438https://doi.org/10.1038/nature21062
        • Deciphering Developmental Disorders Study
        Large-scale discovery of novel genetic causes of developmental disorders.
        Nature. 2015; 519: 223-228https://doi.org/10.1038/nature14135
        • Thormann A.
        • Halachev M.
        • McLaren W.
        • et al.
        Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP.
        Nat Commun. 2019; 10: 2373https://doi.org/10.1038/s41467-019-10016-3
        • McLaren W.
        • Gil L.
        • Hunt S.E.
        • et al.
        The Ensembl Variant Effect Predictor.
        Genome Biol. 2016; 17: 122https://doi.org/10.1186/s13059-016-0974-4
        • Karczewski K.J.
        • Francioli L.C.
        • Tiao G.
        • et al.
        The mutational constraint spectrum quantified from variation in 141,456 humans.
        Nature. 2020; 581 (Published correction appears in Nature. 2021;590(7846):E53. Published correction appears in Nature. 2021;597(7874):E3–E4.): 434-443
        • Van Hout C.V.
        • Tachmazidou I.
        • Backman J.D.
        • et al.
        Exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank.
        Nature. 2020; 586: 749-756https://doi.org/10.1038/s41586-020-2853-0
        • Richards S.
        • Aziz N.
        • Bale S.
        • et al.
        Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.
        Genet Med. 2015; 17: 405-424https://doi.org/10.1038/gim.2015.30
        • Ioannidis N.M.
        • Rothstein J.H.
        • Pejaver V.
        • et al.
        REVEL: an ensemble method for predicting the pathogenicity of rare missense variants.
        Am J Hum Genet. 2016; 99: 877-885https://doi.org/10.1016/j.ajhg.2016.08.016
        • Landrum M.J.
        • Lee J.M.
        • Riley G.R.
        • et al.
        ClinVar: public archive of relationships among sequence variation and human phenotype.
        Nucleic Acids Res. 2014; 42: D980-D985https://doi.org/10.1093/nar/gkt1113
        • Martin H.C.
        • Jones W.D.
        • McIntyre R.
        • et al.
        Quantifying the contribution of recessive coding variation to developmental disorders.
        Science. 2018; 362: 1161-1164https://doi.org/10.1126/science.aar6731
        • Gunning A.C.
        • Fryer V.
        • Fasham J.
        • et al.
        Assessing performance of pathogenicity predictors using clinically relevant variant datasets.
        J Med Genet. 2021; 58: 547-555https://doi.org/10.1136/jmedgenet-2020-107003
      1. Ellard S, Baple EL, Callaway A, et al. ACGS best practice guidelines for variant classification in rare disease 2020. Published February 4, 2020. https://www.acgs.uk.com/media/11631/uk-practice-guidelines-for-variant-classification-v4-01-2020.pdf

        • Pengelly R.J.
        • Ward D.
        • Hunt D.
        • Mattocks C.
        • Ennis S.
        Comparison of Mendeliome exome capture kits for use in clinical diagnostics.
        Sci Rep. 2020; 10: 3235https://doi.org/10.1038/s41598-020-60215-y
      2. Gudmundsson S, Singer-Berk M, Watts NA, et al. Variant interpretation using population databases: lessons from gnomAD. Hum Mutat. 2021:10.1002/humu.24309. doi:10.1002/humu.24309

        • de Marvao A.
        • McGurk K.A.
        • Zheng S.L.
        • et al.
        Phenotypic expression and outcomes in individuals with rare genetic variants of hypertrophic cardiomyopathy.
        J Am Coll Cardiol. 2021; 78: 1097-1110https://doi.org/10.1016/j.jacc.2021.07.017
        • Kingdom R.
        • Tuke M.
        • Wood A.
        • et al.
        Rare genetic variants in genes and loci linked to dominant monogenic developmental disorders cause milder related phenotypes in the general population.
        Am J Hum Genet. 2022; S0002-9297: 00217-00218https://doi.org/10.1016/j.ajhg.2022.05.011
        • DeCristo D.M.
        • Milko L.V.
        • O’Daniel J.M.
        • et al.
        Actionability of commercial laboratory sequencing panels for newborn screening and the importance of transparency for parental decision-making.
        Genome Med. 2021; 13: 50https://doi.org/10.1186/s13073-021-00867-1
        • Popejoy A.B.
        • Ritter D.I.
        • Crooks K.
        • et al.
        The clinical imperative for inclusivity: race, ethnicity, and ancestry (REA) in genomics.
        Hum Mutat. 2018; 39: 1713-1720https://doi.org/10.1002/humu.23644