If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Division of Infection, Immunity and Respiratory Medicine, University of Manchester, Manchester, United KingdomPrevention and Early Detection Theme, NIHR Manchester Biomedical Research Centre, Manchester, United Kingdom
Prevention and Early Detection Theme, NIHR Manchester Biomedical Research Centre, Manchester, United KingdomDivision of Evolution, Infection and Genomics, The University of Manchester, Manchester, United Kingdom
Prevention and Early Detection Theme, NIHR Manchester Biomedical Research Centre, Manchester, United KingdomDivision of Cancer Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United KingdomDepartment of Obstetrics & Gynaecology, St Mary’s Hospital, Manchester University NHS Foundation Trust, Manchester, United Kingdom
Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, Faculty of Biology, Medicine and Health, Manchester Academic Health Science Centre, The University of Manchester, Manchester, United KingdomNIHR Manchester Musculoskeletal Biomedical Research Unit, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, United Kingdom
Prevention and Early Detection Theme, NIHR Manchester Biomedical Research Centre, Manchester, United KingdomDivision of Evolution, Infection and Genomics, The University of Manchester, Manchester, United Kingdom
Correspondence and requests for materials should be addressed to Philip A.J. Crosbie, Manchester Thoracic Oncology Centre, North West Lung Centre, Manchester University NHS Foundation Trust, Southmoor Rd, Wythenshawe, Manchester M23 9LT, United Kingdom.
Division of Infection, Immunity and Respiratory Medicine, University of Manchester, Manchester, United KingdomPrevention and Early Detection Theme, NIHR Manchester Biomedical Research Centre, Manchester, United KingdomManchester Thoracic Oncology Centre, Wythenshawe Hospital, Manchester University NHS Foundation Trust, Manchester Academic Health Science Centre, Manchester, United Kingdom
Screening with low-dose computed tomography reduces lung cancer (LC) mortality. Risk prediction models used for screening selection do not include genetic variables. Here, we investigated the performance of previously published polygenic risk scores (PRSs) for LC, considering their potential to improve screening selection.
Methods
We validated 9 PRSs in a high-risk case-control cohort, comprising genotype data from 652 surgical patients with LC and 550 cancer-free, high-risk (PLCOM2012 score ≥ 1.51%) participants of the Manchester Lung Health Check, a community-based LC screening program (n = 550). Discrimination (area under the curve [AUC]) between cases and controls was assessed for each PRS independently and alongside clinical risk factors.
Results
Median age was 67 years, 53% were female, 46% were current smokers, and 76% were National Lung Screening Trial eligible. Median PLCOM2012 score among controls was 3.4%, 80% of cases were early stage. All PRSs significantly improved discrimination, AUC increased between +0.002 (P = .02) and +0.015 (P < .0001), compared with clinical risk factors alone. The best-performing PRS had an independent AUC of 0.59. Two novel loci, in the DAPK1 and MAGI2 genes, were significantly associated with LC risk.
Conclusion
PRSs may improve LC risk prediction and screening selection. Further research, particularly examining clinical utility and cost-effectiveness, is required.
Early-stage LC is largely asymptomatic, leading to late-stage clinical presentation and poor survival. Two large, randomized controlled trials (RCTs) demonstrated that low-dose computed tomography (LDCT) screening significantly reduces LC mortality.
Variables considered by RPMs always include age and smoking exposure, as well as a selection of subsidiary risk factors. National Health Service (NHS) England’s Targeted Lung Health Check (LHC) program mandates the use of 2 validated RPMs, PLCOM2012 and LLPV2, for screening selection.
In the past decade, several large genome-wide association studies (GWAS) have implicated dozens of single-nucleotide variations (SNVs) (formerly SNP) in this association.
Genetic predisposition to lung cancer: comprehensive literature integration, meta-analysis, and multiple evidence assessment of candidate-gene association studies.
These low-penetrance SNVs can be integrated into a polygenic risk score (PRS), a tool that predicts an individual’s genetic risk of developing LC. Although several PRSs for LC have been developed, measures of genetic risk are not used clinically.
Research is needed to investigate whether PRSs improve risk prediction and screening selection in high-risk populations that are likely to be targeted for LC screening.
This study aimed to validate several previously published LC PRS tools in a high-risk case-control cohort. Case samples were sourced from patients who were undergoing surgery for histologically confirmed non–small cell lung cancer (NSCLC). Control samples were sourced from high-risk attendees of the Manchester LHC pilot, a community-based LC screening program.
Previous studies often developed and validated PRSs in biobank or RCT settings, which may not be wholly representative of real-world screening populations.
Explaining differences in the frequency of lung cancer detection between the National Lung Screening Trial and community-based screening in Manchester, UK.
In contrast, our study cohort was highly representative of community-based LC screening participants.
Materials and Methods
Controls were attendees of the second round of the Manchester LHC pilot who were subsequently screen negative. The LHC pilot was an LC screening program that took place in 3 socioeconomically deprived areas of Manchester (Harpurhey, Gorton, and Wythenshawe) in 2016 to 2017.
In the pilot, ever-smokers aged 55 to 74 with a PLCOM2012 risk score of ≥1.51% were offered 2 rounds of annual LDCT screening. Extensive clinical and demographic data were available for all LHC participants. Cases had histologically confirmed NSCLC treated with surgical resection between 2010 and 2018, sourced from the Manchester Cancer Research Centre (MCRC) Biobank. The MCRC Biobank collects samples from 4 NHS trusts, all of which are in the Greater Manchester area.
All LC surgery for patients in Greater Manchester takes place at the Thoracic Surgery Centre at Wythenshawe Hospital. Consequently, there is broad geographical overlap between the case and control cohorts. Clinical data available for cases included age, sex, smoking history, cancer histology, tumor stage, date of surgery, and spirometry. Some variables necessary for calculating PLCOM2012 risk scores were not available in the case cohort. These included previous cancer diagnosis, family history of LC, diagnosis of chronic obstructive pulmonary disease (COPD), ethnicity, and educational qualifications. All blood samples were transferred to –80 °C storage within 48 hours of venesection. Written, informed consent was provided by all participants (REC Ref: 18/NW/0092).
QIAGEN Gentra, PureGene, and FlexiGene kits were used for case DNA extraction, and Qiagen QIAamp DNA Blood Midi Kit was used for control DNA extraction. Detailed protocols are available.
Thermo Fisher NanoDrop and Qubit were used for DNA quality control (QC) and quantification. Genotyping was performed using the Illumina iScan, using the Infinium OncoArray-500K for high-throughput screening.
Y chromosome and mitochondrial SNVs were excluded, as were those with a missingness rate of >0.02, minor allele frequency of <0.01, deviation from Hardy-Weinberg Equilibrium (1 × 10−4), and symmetric SNVs. Strand alignment was confirmed by comparison with the haplotype reference panel and 1000 Genome data sets. Samples with a call rate of <98%, divergent heterozygosity (>3 SDs), or nonconcordant sex were excluded. KING (v.1.9) software was used to assess identity by descent. Unexpected duplicates were excluded, as were 1 member of each pair of first- or second-degree relatives.
tool for principal component (PC) analysis (PCA) was used to ascertain genetic ancestry and genetic variation in the cohort. Samples that deviated from the European cluster when compared with the HapMap3 reference data set were excluded (Supplemental Figure 1). Samples from within the European ancestry subgroup were also excluded if they had outlying genetic variation based on a scatter plot of PC1 and PC2 values. The Aberrant (v.1.0)
Haplotype reference panel r.1.1 2016 (GRCh37/hg19) was used as the reference panel, and Eagle v.2.4 was used for phasing. The imputed data set was filtered for duplicate SNVs and SNVs with low imputation confidence (r2 <0.5).
PubMed and The Polygenic Score Catalog were searched for previously developed LC PRSs. Effect alleles listed in the literature were matched with the Manchester cohort by comparing minor allele frequency. The LDproxy tool was used to identify proxy SNVs when data were not available in the study cohort. If no proxy with an R2 of >0.5 was available, the SNV was excluded from the PRS. Genetic load was calculated using the --score function in PLINK (v.1.9). Allelic weight was represented by the natural logarithm of the published odds ratios (ORs). Scores were centered on the mean to facilitate comparison between PRSs.
R package was used to fit a logistic regression model using the demographic and clinical data available in both cases and controls (“base clinical model”). Variables included were age, sex, body mass index (BMI), smoking status, and forced expiratory volume in 1 second (FEV1)/forced vital capacity (FVC) ratio. Each of the PRSs were added to the base clinical model to examine improvements in area under the curve (AUC) derived from genetic factors. Likelihood-ratio test was used to establish statistical significance. Data were presented visually using ggplot2 (v.3.3.5)
To investigate novel SNVs in our cohort, we performed an association analysis using SNPTEST (v.2.5.2; https://www.well.ox.ac.uk/∼gav/snptest/). The PCs generated during PCA, as well as sex, age, smoking status, and BMI, were used as covariates. The output was uploaded to the Functional Mapping and Annotation of Genome-Wide Association Studies (FUMA GWAS v.1.3.6)
platform for GWAS data visualization and candidate gene exploration. Maximum P value for independently significant SNVs was set at 1 × 10−5.
Results
Cohort characteristics
In total, 706 participants of the Manchester LHC pilot provided blood samples, 60% of the total eligible cohort (scan-negative participants). The participating control group was highly representative of the total screen-negative participants (Supplemental Table 1). The participating control group was approximately split between males and females and current and former smokers. Median age was 65 years, median pack-year history was 45, 67.4% had no educational qualification, and 30.6% had a self-reported COPD diagnosis. Median PLCOM2012 risk score was 3.5% (Supplemental Table 1). After processing and QC, 550 control samples (77.9%) were eligible for inclusion in the analysis. The only variable with a statistically significant difference between the final control data set and the samples which were not successfully processed was median FEV1/FVC ratio (70.2 vs 71.6; P = .05) (Supplemental Table 2). FEV1/FVC ratio is a measure of airflow obstruction (<70 indicates a restrictive lung condition).
In total, 701 case samples were obtained; 55% of cases were women, 59% former smokers, and median age was 69. Tumor pathological subtypes included adenocarcinomas (64%), squamous cell carcinomas (34%), and large cell carcinomas (2%). Of the 580 (83%) with stage information available, 80% were early stage (stage I = 376; stage II = 86), 17% (n = 100) were stage III, and 3% (n = 18) were stage IV (Supplemental Table 3). After sample processing and QC, 652 case samples (93%) were eligible for inclusion in the analysis. There were no significant differences in the demographic or clinical variables between the cases in the final data set and those that were not successfully processed (Supplemental Table 3).
The case cohort had a significantly higher proportion of former smokers (58.1% vs 48.8%; P = .001), higher average age (median: 69 vs 65 years; P < .001), and lower BMI (median 26 vs 28.6; P < .001) than controls. A similar proportion of both cases and controls (76% vs 76.2%; P = .93) would have been eligible for screening according to the generalized eligibility criteria used in the National Lung Screening Trial RCT (age 55-74 with at least 30 pack-year smoking history or smoker within 15 years)
Imputed values do not match direct values as imputations were calculated separately in smokers and former smokers within the case cohort.
70.2 ± 13
65.5 ± 13
<.001
% NLST eligible (n)
76 (418)
76.2 (497)
.93
BMI, body mass index; FEV1, forced expiratory volume in one second; FVC, forced vital capacity; IQR, interquartile range; NLST, National Lung Screening Trial.
a Imputed values do not match direct values as imputations were calculated separately in smokers and former smokers within the case cohort.
We constructed a multivariable model using the clinical factors available in both the case and control data sets (base clinical model). This model had an AUC of 0.723 (95% CI 0.695-0.751). As would be expected, higher age, lower BMI, and lower FEV1/FVC ratio were significantly associated with increased likelihood of LC. Female sex was also associated with increased LC risk, although only with borderline statistical significance (Table 2). Although not statistically significant, it should be noted that there was an inverse relationship between smoking status and LC risk, with current smoker status being more prevalent in the controls than the cases (Table 2). This highlights the uniquely high-risk nature of the control cohort, as it is well established that being a current smoker increases LC risk. When all clinical factors were considered independently, age had the highest discriminatory ability with an AUC of 0.682 (0.653-0.712), and sex had the lowest with an AUC of 0.523 (0.495-0.552).
Table 2Demographic and clinical factors included in the base clinical model
Factor
OR – Likelihood to Be Case (95% CI)
P Value
AUC (95% CI)
Older age
1.12 (1.09-1.14)
<.001
0.682 (0.653-0.712)
Female sex
1.27 (1-1.63)
.055
0.523 (0.495-0.552)
Current smoker status
0.79 (0.61-1.02)
.07
0.546 (0.518-0.574)
Higher BMI
0.91 (0.89-0.94)
<.001
0.629 (0.598-0.66)
Higher FEV1/FVC ratio
0.99 (0.97-1)
.02
0.581 (0.548-0.613)
AUC, area under the curve; BMI, body mass index; FEV1, forced expiratory volume in 1 second; FVC, forced vital capacity; OR, odds ratio.
Nine recently published PRSs for LC that were developed based on large GWAS data sets and externally validated (Supplemental Table 4) were independently validated here. These were as follows: the 109 SNV PRS published by Graff et al,
Systematic evaluation of cancer-specific genetic risk score for 11 types of cancer in the Cancer Genome Atlas and Electronic Medical Records and Genomics cohorts.
Performance metrics of all the PRSs validated in our cohort are reported in Table 3. All of the PRSs provided some magnitude of statistically significant improved discrimination over the base clinical model (Figure 1) and had a clear divergence in score distribution between cases and controls (Figure 2).
Table 3Summary results table of PRS performance
First Author (Year Published)
Number of SNVs in PRS (SNVs Available in the Manchester Data Set)
Published Raw AUC for PRS (95% CI)
AUC in Manchester Cohort (95% CI)
AUC Clinical Model + PRS (95% CI)
Additional AUC Over Clinical Model (P Value − Clinical Model vs Clinical Model + PRS)
Systematic evaluation of cancer-specific genetic risk score for 11 types of cancer in the Cancer Genome Atlas and Electronic Medical Records and Genomics cohorts.
Figure 1PRS derived AUC improvement over base clinical model for each of the PRS tools validated. AUC, area under the curve; PRS, polygenic risk score.
Figure 2PRS score density for cases and controls for each PRS tool validated. Mean score for overall cohort is normalized to 0. Dashed lines = mean scores for case or control subgroups. PRS, polygenic risk score.
The best-performing tool was the 19 SNV PRS published by Jia et al, with an independent AUC of 0.588. It added 0.015 AUC to the base clinical model (P < .0001), increasing the overall AUC from 0.723 to 0.738 (0.71-0.766). More than 67% of the top quintile of PRS scores were cases, compared with 40% of the bottom quintile (P < .0001) (Table 4). Five SNVs (26%) reached a P value threshold of <.05 in our association analysis (Supplemental Data).
Table 4Case proportion of each PRS quintile for all PRSs validated
The 19 SNV Fritsche PRS and the 35 SNV Hung PRS also performed relatively well. Both added approximately 0.01 AUC to the base clinical model, increasing overall AUC from 0.723 to 0.733 and 0.734 respectively (P < .0001). Hung-35 had an independent AUC of 0.575, slightly higher than Fritsche-19 at 0.569; 65% of the top quintile of Fritsche-19 scores and 61% of the top quintile of Hung-35 scores were cases, compared with 44% and 49% of the bottom quintiles (P = .002 and P = .0006) (Table 4). The 2 iterations of the Hung PRS were the only ones among all of those that were validated to have a consistent increase of case proportion across all genomic risk quintiles (Table 4). Four SNVs (21%) of the Fritsche-19 PRS were statistically significant in our data set (P < .05). Twelve SNVs are shared between all 3 of the best-performing PRSs, with Hung-35, Fritsche-19, and Jia containing only 20, 5, and 3 unique SNVs, respectively (Supplemental Data).
Considering its small size, the 6 SNV PRSs published by Shi also performed relatively well. It had an independent AUC of 0.56, adding 0.009 to the base clinical model, increasing overall AUC from 0.723 to 0.732 (P < .0001). Five of the 6 (83%) SNVs in this PRS reached statistical significance (P < .05) in our data set. However, one of these significant SNVs, rs6495309, was in the incorrect effect direction in our results (published risk allele: ‘T’ OR: 1.3; Our data: ‘T’ OR: 0.69), indicating that C should be considered the risk allele. Other studies confirm that C is the risk allele for this SNV.
Genetic predisposition to lung cancer: comprehensive literature integration, meta-analysis, and multiple evidence assessment of candidate-gene association studies.
The reason for this discrepancy is unclear. Consequently, we retested this PRS, substituting in C as the risk allele but maintaining the published allelic weighting. The updated PRS had an independent AUC of 0.569 (95% CI 0.536-0.601), 0.009 higher than the original Shi PRS. It added 0.012 of AUC to the base clinical model, increasing overall AUC from 0.723 to 0.735 (95% CI 0.707-0.762; P < .0001), roughly comparable to Fritsche-19 and Hung-35, despite including significantly fewer SNVs.
The remaining PRSs, Graff, Hung-128, Fritsche-14, Dai, and Zhang, all displayed modest discrimination. Considering that the Dai PRS was developed in a Chinese population and that this cohort is European, it is notable that it still improved AUC above the base clinical model with statistical significance. On the other hand, somewhat surprisingly, the tool with the most limited performance in this cohort was the Zhang PRS, which was prospectively validated in a UK Biobank cohort of 345,794 European subjects and is the most recently published of all the PRSs tested.
PRS scores in the control cohort
Among the control cohort (for which comprehensive clinical data were available), there was no significant association between any of the PRS scores and PLCOM2012 or LLPV2 scores (RPMs used for LC risk prediction in NHS England’s Targeted LHC program). There was no significant association between any PRS score and family history of LC, previous COPD diagnosis, or LDCT detected coronary artery calcification.
Strikingly, in binary logistic regression analysis, previous cancer diagnosis was associated with higher Graff (P = .006), Hung-35 (P = .012), and Hung-128 (P = .023) scores with statistical significance. In total, there were 70 participants with previous cancer diagnoses, including 40% who reported either previous breast cancer (n = 14) or skin cancer (n = 15). Discrimination analysis with previous cancer diagnosis used as the outcome resulted in the following AUCs: Hung-35: 0.6 (0.529-0.67), Graff: 0.589 (0.517-0.661), and Hung-128: 0.565 (0.491-0.639).
Novel SNVs in the Manchester cohort
Seventeen risk loci associated with LC case status, comprising 206 candidate SNVs, were identified through association analysis. Ten of the identified loci had multiple SNVs present, whereas 7 were lone-SNVs (Supplemental Table 5). In the Manhattan plot, there was a particularly pronounced peak at chromosome 7, with the lead SNV (rs17389497; NC_000007.13:g.78466130C>G) reaching a P value of 1.0048 × 10−7 (Figure 3); this SNV maps to the MAGI2 gene on chromosome 7. The second most significant intronic SNV was rs4878090 (NC_000009.11:g.90143928A>G), which maps to the DAPK1 gene on chromosome 9 (Supplemental Figure 2). SNVs from these genes are not included in any of the published PRSs validated in this study.
Figure 3Manhattan plot of GWAS-style association analysis performed in the Manchester case-control data set. Dashed line = threshold for genome-wide significance. GWAS, genome-wide association study.
In this study, we validated several previously published PRS tools for LC risk prediction in a Manchester-based case-control cohort. Most of these tools were originally developed and validated in large RCT or biobank-based data sets, which often include lower-risk participants with less smoke exposure and fewer risk factors.
Explaining differences in the frequency of lung cancer detection between the National Lung Screening Trial and community-based screening in Manchester, UK.
Real-world LC screening programs seek to increase efficiency by initially inviting participants with significant smoking history, even before performing individualized risk assessment (which further increases the risk profile of the screening cohort). SNV-level and PRS-level results may differ in segments of the population with varying smoke exposure.
Influence of single nucleotide polymorphisms among cigarette smoking and non-smoking patients with coronary artery disease, urinary bladder cancer and lung cancer.
Consequently, validation of PRS efficacy in a high-risk cohort, representative of a screening population, is an important step in assessing the potential benefit of using genetic risk factors in actual, real-world LC screening selection.
We showed that all 9 of the PRSs tested were predictive of LC risk in the Manchester cohort. When applied in conjunction with a base model comprising several nongenetic risk factors, discrimination was significantly improved by the addition of each of these 9 PRSs, albeit by varying magnitudes. Of the 257 unique SNVs tested across all the PRSs, 38 (15%) reached nominal statistical significance (P < .05) in the Manchester data set. We also observed a general trend toward an increasing proportion of cases across quintiles of increasing PRS risk for most of the tools tested. Fundamentally, these results demonstrate that the inclusion of robust and validated measures of genetic risk could improve LC risk prediction.
The best-performing PRS (Jia) had an independent AUC of 0.59 and added approximately 0.015 AUC to the base clinical model. The Fritsche-19 and Hung-35 PRSs also added more than 0.01 AUC to the base clinical model. To our knowledge, none of these PRSs had been previously validated in a cohort including individuals recruited from an actual LC screening program. Although the AUCs of these PRSs may seem relatively modest, the benefits of slight improvement in model discrimination and risk prediction may aggregate when used in the risk stratification of a large population in the context of a screening program.
For example, studies have demonstrated that even with limited AUC improvement, the Jia and Hung PRSs could be used to effectively modulate the screening commencement age for individual smokers in a population.
By highlighting segments of the screening cohort at even higher disease risk because of their genetic profile, PRSs could also be used to target the provision of chemopreventive therapeutics, assist in the triage of indeterminate screening results, or regulate the frequency of screening rounds.
Our analysis examining the distribution of cases across PRS quintiles provides further illustration as to the potential utility of PRSs, even with seemingly modest AUC improvements. Overall, 54% of the total cohort were cases. However, cases only made up 40% of subjects with scores in the bottom quintile of Jia PRS scores, compared with almost 70% in the top quintile. All PRSs except Zhang and Fritsche-19 had the lowest case proportion in the bottom PRS quintile, and all PRSs without exception had the highest case proportion in the top PRS quintile. Notably, the Hung PRSs demonstrated a consistent increase in case proportion across all PRS quintiles. This may indicate that they add predictive value across the whole cohort, as opposed to only discriminating between those at the highest and lowest genetic risk and could provide tentative evidence for PRS calibration, although further study is needed to confirm this.
We found a significant association between at least 4 of the PRS tools and previous cancer diagnosis. The Graff PRS, which had the second largest cross-cancer association after Hung-35 in our study, has previously been shown to demonstrate pleiotropy, with very high cross-cancer association in 2 large biobank cohorts.
The inclusions of SNVs from TERT-CLPTM1L and HLA in the PRS is likely to have contributed to this association, because both are well-established general cancer susceptibility loci.
To our knowledge, the Hung-35 PRS (which had the most significant association with previous cancer diagnosis in our study) has not had its cross-cancer predictiveness tested previously. SNV pleiotropy may be useful in developing a cross-cancer PRS for risk prediction
; this could facilitate the integration of additional diagnostic and preventative services into an LC screening program, increasing benefit to participants and overall program cost-effectiveness. On the other hand, it may also lead to the selection of screening participants at higher risk of other cancers, which may reduce life expectancy and benefit to be gained from LC screening.
An additional benefit of using genetic factors in LC risk prediction is that, unlike traditional risk factors, such as age and smoking history, they are often independent of risk factor–associated comorbidities, such as COPD and cardiovascular disease. Standard risk prediction criteria favor older and more comorbid attendees for screening, who are at the greatest risk of LC but may have limited benefit to be gained from screening. This phenomenon was observed in the Manchester LHC pilot, with multiple measures of respiratory and cardiovascular comorbidity associated with calculated LC risk and screen-detected LC.
One approach to mitigating this limitation is to use benefit-based selection, such as the LYFS-CT (Life Years Gained from Screening-CT) model, which considers life expectancy when determining screening eligibility.
An alternative approach may be to integrate additional non–comorbidity-linked factors, such as polygenic risk, which stay constant through an individual’s life and are not necessarily linked to increased risk of other diseases.
These approaches are not mutually exclusive, as LYFS-CT relies on LC risk prediction in its determination of the estimated benefit an individual will receive from screening
; improved risk prediction therefore enhances both benefit-based, and risk-based screening selection.
In the independent association analysis in our data set, 2 promising genomic loci emerged. One cluster of SNVs was located on the MAGI2 gene. Previous studies have demonstrated that MAGI2 regulates the activity of PTEN, a tumor suppressor.
There is evidence of MAGI2 variants in several cancer types; some studies have proposed MAGI2 as a biomarker that could be used to predict prostate cancer aggressiveness and recurrence.
The second locus of interest was in the DAPK1 gene on chromosome 9. This gene is involved in the modulation of cell apoptosis and autophagy and functions as a tumor suppressor. DAPK1 underexpression has been reported in several cancer types.
There is some evidence linking DAPK1 function and LC. One study of 135 patients with stage I NSCLC found that 44% had hypermethylation of the gene; those with the altered gene expression had significantly poorer survival.
Although we did not perform an external validation of our findings, severely limiting the conclusions we can draw, with further research, these loci may offer additional SNVs for inclusion in PRSs for LC.
It is important to note that a statistically significant improvement in risk prediction does not always indicate a clinically significant improvement. The routine adoption of a PRS tool within an LC screening program will depend on its clinical impact and cost-effectiveness. It may not be cost-effective to include an auxiliary biomarker to risk prediction tools when the additional predictiveness it confers is marginal compared with traditional risk factors, such as age and smoking history.
On the other hand, an effective PRS might reduce the total number of people eligible for screening or reduce the frequency of screening, increasing program efficiency. It might also favor the selection of those who have a lower smoking exposure and therefore a lower burden of comorbidity who have “more to gain” from screening.
To reduce costs, the PRS could be targeted to those close to the risk threshold rather than being used more broadly. Ongoing research may also identify specific subgroups that will gain particular benefit from PRS testing. It is also expected that genetic testing will continue to reduce in price as it becomes more clinically ubiquitous.
Once a well-optimized PRS tool, applicable to a range of populations and a variety of LC types, becomes available, formal cost-effectiveness analysis will be required to determine the best approach to application, as has been performed in other disease areas.
A limitation of this study is that data for the case cohort were too limited to facilitate the calculation of PLCOM2012 risk scores. Missing variables included the following: previous cancer diagnosis, family history of LC, COPD diagnosis, ethnicity, and educational qualifications. Had PLCOM2012 scores been available for the whole cohort, it would have been possible to evaluate whether the PRSs validated would have augmented risk prediction over and above the actual methods being used for selection in screening programs in the United Kingdom.
Some of the variables included in RPMs, such as family history and smoke exposure, might already be accounting for a portion of the risk impact conferred by genetic variants. Considering genetic risk factors in combination with demographic and lifestyle risk factors and testing them in actual screening populations ensures that personal risk is not overestimated and that the genetic component of the RPM has independent utility in a screening selection context. The demographic variables that were available facilitated the construction of a base clinical model that was used as a substitute for an RPM in our analysis. Age was the main contributing factor to discrimination, which would be expected from RPM-based risk prediction. However, the AUC derived from this model was lower than that of RPMs observed in comparative studies
The limited size and selective nature of this validation cohort is also important to consider; some lack of replication is always expected when validating a PRS tool in a new population, but firm conclusions regarding the efficacy of the tools tested must only be drawn after obtaining further validation in additional data sets that represent a diverse range of populations and settings.
The case cohort in this study was derived from a biobank rather than a screening program setting, and the cancers were likely to have been diagnosed in clinic rather than through screening. Although the 80% early-stage distribution of the cancers in the case cohort is a fair representation of the expected stage distribution in UK-based screening programs,
there may be differences in the clinical and genetic characteristics of these cases compared with screen-detected cases. Because the cases in this study were sourced from a hospital biobank, we would not expect to see the “healthy volunteer bias” often reported as a limitation in studies that validate PRSs in the UK Biobank.
It is notable that many of the PRSs we tested were previously validated in the same UK Biobank data set. Although we do not have individual-level deprivation data for the cases, the MCRC Biobank collects samples from patients who are located in the same socioeconomically disadvantaged areas of Manchester from which the control cohort was sourced. This is a significant strength of this study. Going forward, studies in which both the case and controls are derived from the same screening program, with uniform demographic data available, will provide more robust evidence to the impact of PRS inclusion on risk prediction.
This study was restricted to individuals of European descent, a common feature of many PRS studies for LC. The lack of non-European GWAS and Biobank data sets is a significant challenge in the development of PRSs for diverse populations; correcting this imbalance is crucial in ensuring that the use of PRS in screening selection does not exacerbate health inequalities.
Despite these limitations, this study contributes important evidence supporting the hypothesis that LC risk prediction in ever-smokers can be improved by considering genetic risk factors. We demonstrated that PRS tools predominantly developed in RCT or biobank populations functioned successfully in a case-control cohort highly representative of a target population for LC screening. Consequently, there is potential for a PRS to be integrated into RPMs used for LC risk prediction, improving model discrimination, and thereby refining screening selection in community-based screening programs. The exceptionally high-risk nature of the control cohort in this study adds robustness to this conclusion; this cohort would not be expected to be protected from LC because of limited exposure to other risk factors. Similarly, novel SNVs that emerged in this cohort, although requiring external validation, are derived from a cohort highly representative of Manchester-based screening attendees and could potentially be included in future PRS development.
Data Availability
The data sets used and/or analyzed during the current study are available from the corresponding author on request.
Conflict of Interest
The authors declare no conflicts of interest.
Acknowledgments
The authors thank Deborah Burt and Katherine Sadler for their assistance.
Funding
M.B.L., E.J.C., D.G.E., J.B., M.J.S., and P.A.J.C. are supported by the NIHR Manchester Biomedical Research Centre (IS-BRC-1215-20007).
Author Information
Conceptualization and Methodology: M.B.L., M.J.S., E.J.C., J.B., D.G.E., P.A.J.C.; Data Curation and Investigation: M.B.L., H.J.B.; Analysis: M.B.L., M.J.S., J.B.; Software: M.B.L., J.B.; Writing-review and editing: all authors
Ethics Declaration
Ethics approval and informed consent was obtained for all aspects of this study: The Community Lung Health Study – Cambridge East Research Ethics Committee – 17/EE/0092.
Manchester LHC pilot research database – North West Greater Manchester West Research Ethics Committee – 16/NW/0013.
Genetic predisposition to lung cancer: comprehensive literature integration, meta-analysis, and multiple evidence assessment of candidate-gene association studies.
Explaining differences in the frequency of lung cancer detection between the National Lung Screening Trial and community-based screening in Manchester, UK.
Systematic evaluation of cancer-specific genetic risk score for 11 types of cancer in the Cancer Genome Atlas and Electronic Medical Records and Genomics cohorts.
Influence of single nucleotide polymorphisms among cigarette smoking and non-smoking patients with coronary artery disease, urinary bladder cancer and lung cancer.
The Article Publishing Charge (APC) for this article was paid by Gold Open Access Transformative Agreement between Elsevier and the University of Manchester.