Detection and characterization of male sex chromosome abnormalities in the UK Biobank study

.


Introduction
The most common sex chromosome aneuploidies in men are 47,XXY (Klinefelter syndrome [KS]) and 47,XYY, with population prevalence estimates of 100 per 100,000 men and 18 to 100 per 100,000 men, 1,2 respectively.Men with KS typically present during adolescence with delayed puberty or as adults with infertility.Other recognized features include tall adult stature; high body fat percentage; 3 poor muscle tone; low bone mineral density (BMD); and increased risks of neurocognitive disability, psychoses, and disorders of personality. 4S has also been associated with higher risks of type 2 diabetes and venous thromboembolism. 5,6By contrast, 47,XYY is less well-characterized because many of these individuals may not present to health services and thus are unaware of their karyotype.Reported features associated with 47,XYY may therefore be affected by sampling bias.These include tall stature; scoliosis; learning difficulties; 7 poor muscle tone; 8 increased central fat; and increased risks of seizures, asthma, and emotional and behavioral problems (eg, autism and attention deficit disorder). 9Although infertility has been reported in some men with XYY, most studies report normal sexual development and fertility. 10revious studies identified men with KS or 47,XYY from medical records, and therefore case ascertainment was based on recognition of their typical phenotypic features.Therefore, the reported penetrance of these features may have been biased and the full spectrum of clinical features are overlooked.A more robust alternative approach is to identify such individuals from large population-based studies using systematic measurements to produce unbiased estimates of the effects of sex chromosome aneuploidy on unselected diseases.We recently used this approach to show that mosaic X-chromosome aneuploidy in women (mosaic Turner syndrome, 45,X) conferred a lower penetrance of infertility than that reported by earlier clinic-based studies. 11n this study, we analyzed single-nucleotide variation (SNV) array genotype data in 207,067 men of European ancestry aged 40 to 70 years from the UK Biobank.We identified 213 men with sex chromosome aneuploidy indicative of KS and 143 men with 47,XYY and related these karyotypes to extensive study data and medical records to understand the penetrance of male sex chromosome aneuploidy on typical reproductive outcomes and its wider clinical impacts.

Study population
The UK Biobank is a large prospective cohort that recruited approximately 500,000 participants aged 40 to 70 years across the island of Great Britain.A broad range of phenotypic and health-related information was collected from each participant, including physical measurements, lifestyle indicators, biomarkers in blood and urine, imaging, and routine health record data. 12n the UK Biobank, 488,377 participants had DNA samples assayed using 1 of 2 genotyping arrays: UK Biobank Lung Exome Variant Evaluation (UK BiLEVE study, N = 49,950) and Affymetrix Axiom UK Biobank array (UK Biobank Axiom [Affymetrix], N = 438,427).These 2 arrays tested 807,411 and 825,927 SNVs, respectively, with 95% overlap between arrays. 12We restricted our analysis to men of White European genetic ancestry as classified by the approach previously described by Thompson et al. 13 In brief, this approach uses k-means-clustering to group individuals by the first 4 genetic principal components.In addition, we excluded individuals who were classified as White European by our kmeans approach but self-identified as being of ancestry other than White European. 13We further excluded individuals whose samples failed UK Biobank genotyping quality control (QC) parameters and those who withdrew consent.Accordingly, 207,067 men were included in all association testing analyses.We were unable to incorporate non-European individuals when modeling the relationship between abnormal karyotypes and phenotypes outlined in this manuscript.We identified only 16 non-White European males with abnormal karyotypes: 9 with 47,XXY and 7 with 47,XYY.

Identification of male sex chromosome aneuploidy heterozygotes from SNV array data
To identify men with sex chromosome aneuploidy, we downloaded genotyping fluorescence signal intensity (log2 ratios, LRR) and QC information for all SNVs on the X chromosome (chrX) and Y chromosomes (chrY) from the UK Biobank data showcase (https://biobank.ndph.ox.ac.uk/ showcase/field.cgi?id=22431 and https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=1955).We excluded SNVs that (1) were located within pseudoautosomal regions (PAR), 14 (2) did not have a calculable LRR on both arrays, (3) did not pass QC in all 106 batches, or (4) were flagged as failing QC by UK Biobank.After these steps, 16,599 chrX SNVs and 579 chrY SNVs remained.We then calculated the median LRR across all remaining SNVs on chrX and chrY to generate the values mLRR-X and mLRR-Y, respectively.These values represent the median fluorescence signal intensities across the entire X or Y chromosome. 15Using the thresholds described by Bycroft et al, 12 men with -1 ≤ mLRR-Y < 0.23 and mLRR-X > -0.2 were categorized as having 47,XXY (KS) and men with mLRR-Y ≥ 0.23 and mLRR-X < -0.2 as 47,XYY (men with mLRR-Y ≥ 0.23 and mLRR-X > -0.2 were categorized as 48,XXYY and were not included in further analyses).

Confirmation of male sex chromosome aneuploidy heterozygotes from exome sequencing data
To confirm sex chromosome aneuploidy status using an orthogonal approach, we used exome sequencing data available for 83,104 White European men in UK Biobank. 16,17To estimate sex chromosome dosage, we calculated the average read depth of 3 target regions: (1) non-PAR regions on chrX, (2) X-degenerate regions (XDRs) on chrY as defined by Skov et al, 18 and (3) autosomes.First, we used SAMtools 19 (version: 1.9) to convert the provided CRAM 20 files for each participant to Binary Alignment Map files on the basis of the GRCh38 reference sequence (ftp:// ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_ decoy_hla.fa).UK Biobank provided the GRCh38 coordinates of the targeted regions for its exome sequencing design with a Browser Extensible Data file (https://biobank.ndph.ox.ac.uk/showcase/ukb/auxdata/xgen_plus_spikein.GRCh38.bed).We created 3 subsets of this Browser Extensible Data file by extracting the overlap between the target regions and non-PAR regions on chrX, XDRs on chrY, and autosomes according to their GRCh38 coordinates.Then, they were converted to picard interval lists using the Picard (version: 2.21.6-SNAPSHOT) function BedToIntervalList, on the basis of the same reference sequence.Using these picard interval lists, the Binary Alignment Map file of each participant was inputted to calculate the average coverages of non-PAR regions on chrX, XDRs on chrY, and autosomes using the Picard 21 function CollectHsMetrics.The relative read depth of non-PAR regions on chrX and XDRs on chrY were defined as the average coverage in each of these regions divided by the average coverage across the autosomes.The relative read depth of non-PAR regions on chrX and XDRs on chrY multiplied by 2 were used as a proxy of chrX dosage and chrY dosage, respectively.Men with chrX dosage > 1.2 were categorized as having 47,XXY (KS) and men with chrY dosage > 1.5 as 47,XYY.Men with chrX dosage > 1.2 and chrY dosage > 1.5 were categorized as 48,XXYY.

Disease association testing
To test for the disease burden associated with male sex chromosome aneuploidies, we performed logistic regression models with KS or 47,XYY (coded 1) compared with the normal male karyotype 46,XY (coded 0) as the exposure.Outcomes comprised 875 International Classification of Diseases (ICD)-10 coded diseases amalgamated from death registries, hospital episode statistics, primary care records (in a subset, n = 94,959), and self-reported conditions (from the first occurrence of disease data set released by UK Biobank).The data set contains further 19 case definitions from dedicated working groups that used multiple sources for case identifications, such as for chronic obstructive pulmonary disease (COPD) or end stage renal disease.For each participant, all events from all sources were mapped to an ICD-10 code and the date of the first disease occurrence from any source was taken as the event date.From this data set, we filtered out likely erroneous disease events if the disease occurrence date: (1) was unknown or missing, (2) matched or preceded the date or year of birth, or (3) occurred after the data set release date.We performed logistic regression models in R (version: 3.6.0)among unrelated men of White European genetic ancestry (maximum N = 162,322) and adjusted for age at study baseline, test center, and the first 10 genetically derived principal components.Resulting odds ratios were converted to risk ratios (RR) using the formula described by Zhang and Yu. 22We applied a stringent Bonferroni corrected P value threshold of P < .05/875= 5.7 × 10 -5 to define statistical significance (Supplemental Table 1 and Figure 1).

Study phenotype association testing
To test the association of male sex chromosome aneuploidy status against selected anthropometric, reproductive, metabolic, cardiovascular, learning/memory, and behavioral study-measured traits (Supplemental Table 2), we used a linear mixed model implemented in BOLT-LMM (version: 2.3.2). 23 The outcome childlessness was derived from the response 0 to the question "How many children have you fathered?"among men aged 55 and older.The Townsend deprivation index was used as an indicator of socioeconomic status, on the basis of participants' home postcodes.The 2 binary exposure variables described earlier were converted to BGEN file format using plink2 24 (version: 2.00-alpha) and inputted to BOLT-LMM via the bgenFile flag.A genetic relationship matrix was generated on the basis of all autosomal variants that had minor allele frequency of >1%, passed QC in all 106 batches, and were present on both genotyping arrays.Genotyping chip, age at baseline, and the first 10 genetically derived principal components were included as covariates.For binary outcomes, we also performed logistic regression and calculated the RR from the odds ratio as described earlier.

Nuclear magnetic resonance metabolic biomarkers association testing
We analyzed 168 circulating metabolic traits measured by proton nuclear magnetic resonance spectroscopy (Nightingale Health Plc) in nonfasting plasma samples in UK Biobank men with 46,XY (n = 49,806), KS (N = 48), or 47,XYY (n = 38).For each metabolic traits, we first performed adjustment for technical variations using the R package ukbnmr (unpublished data: Ritchie SC et al 2021.https://www.medrxiv.org/content/10.1101/2021.09.24.21264079v2.),then performed inverse rank normalization, and then further adjusted for sex, age at the first study visit, body mass index (BMI), and the first 10 genetically derived principal components.Associations between abnormal karyotype and each metabolic trait were tested in separate linear regression models (Supplemental Figure 1).

Prevalence of male sex chromosome aneuploidy in a population scale biobank
Using genotyping array data on 207,067 men of European ancestry, we identified 213 men with 47,XXY (KS, prevalence 103/100,000 men) and 143 with 47,XYY (69/100,000 men; Figure 2A).Of these cases, who also had exome sequencing data, we observed 100% confirmation of aneuploidy status (62/62 men with KS and 54/54 men with 47,XYY) (Figure 2B and C).
Only 49 of 213 (23.0%) men with KS and 1 of 143 (0.7%) with 47,XYY had a diagnosis of sex chromosome abnormality on routine medical records or self-reported data (ICD10: Q98 other sex chromosome abnormalities, male phenotype, not elsewhere classified).Similar proportions were found in the subsample of men who had primary care data: only 24 of 89 (27.0%) with KS and 1 of 76 (1.3%) with 47,XYY had known sex chromosome abnormality.Conversely, of the men with a diagnosis of sex chromosome abnormality on their health record, from our analysis we classified 4 as 46,XX (mLRR-Y < -1) and a further 8 as having a normal male karyotype.

Discussion
Using systematic case ascertainment in a large, unselected population of men of European ancestry aged 40 to 70 years, we report the prevalence of KS (103/100,000 men) and 47,XYY (69/100,000 men).Notably, only a small minority of these men had a diagnosis of sex chromosome abnormality on their medical records or by self-report (23% of KS and 0.7% of 47,XYY) and yet these conditions conferred substantially increased risks for multiple, potentially preventable diseases.The underdiagnosis of KS and 47,XYY has been previously indirectly quantified in other settings, on the basis of the differences in their clinical prevalence compared with estimates from population-based cytogenetic surveys in newborn infants.Such studies estimated that only between 7% (in the United Kingdom) and 57% (in Australia) of expected KS cases were diagnosed on the basis of clinical presentation.For XYY, only between 3% (in the United Kingdom) and 18% (in Denmark) of expected cases were diagnosed. 2ur prevalence estimates in an adult study population are somewhat lower than those reported in those newborn infants (KS: 152/100,000 males and 47,XYY: 98/100,000 males). 2 Although this could be interpreted as indicating higher mortality rates, it is recognized that UK Biobank comprises a more educated and healthier sample than the general population, likely owing to healthy volunteer bias. 25imilarly, the prevalence of other adverse genetic conditions is reportedly lower in UK Biobank than in other more representative studies. 26revious studies have highlighted higher disease risks in men with KS.Bojesen et al 27 identified 832 men with KS from hospital records in Denmark and reported higher risks for venous thrombosis (hazard ratio = 5.3, 95% CI = 3.3-8.5),pulmonary embolism (hazard ratio = 3.6, 95% CI = 1.9-6.7),COPD (hazard ratio = 3.9, 95% CI = 2.5-6.1),type 2 diabetes (hazard ratio = 3.7, 95% CI = 2.1-6.4), and atherosclerosis (hazard ratio = 4.5, 95% CI = 2.8-7.1).Swerdlow et al 6 accessed data on 3518 UK patients with KS diagnosed since 1959 and followed up till mid-2003 and reported higher mortality from diabetes mellitus (standardized mortality ratio = 5.8, 95% CI = 3.4-9.3),pulmonary embolism (standardized mortality ratio = 5.7, 95% CI = 2.5-11.3),and chronic lower respiratory disease (standardized mortality ratio = 2.1, 95% CI = 1.4-3.0).Zöller et al 5 reported a higher risk for venous thromboembolism (incidence rate ratio [IRR] = 6.4,95% CI = 5.1-7.9) in 1085 men diagnosed with KS between 1969 and 2010 in Sweden.Our findings confirm these strong disease associations and also the reported higher risks of psychiatric illness and osteoporosis.
We observed some notable differences between KS and 47,XYY.KS is a well-recognized cause of reproductive dysfunction and this was reflected in our data by their older age at puberty, lower testosterone levels, and high risk of being childless.Reproductive dysfunction likely also contributes to their tall stature (due to later pubertal growth completion), lower bone density and muscle strength, and greater adiposity.By contrast, men with 47,XYY appeared to have normal reproductive function, with no alteration in their puberty timing or testosterone levels and a more modestly higher risk of being childless, which could be explained by their similarly higher chance of living without a partner.
Hence, despite these marked differences in reproductive function, it is unclear why both KS and 47,XYY should show striking similarities in conferring substantially higher risks for many diseases in common-type 2 diabetes, atherosclerosis, venous thrombosis, pulmonary embolism, and COPD, which persisted after adjustments for several lifestyle behavioral-related traits (BMI, smoking, deprivation).Higher risks of type 2 diabetes, atherosclerosis, and microalbuminuria, with lower HDL cholesterol and higher adiposity, together indicate higher insulin resistance in both KS and 47,XYY men.Both conditions confer a triple dose of the PAR, containing the growth-promoting SHOX gene, which likely partially contributes to their tall stature, and there are case reports of insulin resistance in other conditions characterized by SHOX excess. 29,30However, the underlying mechanisms are yet unknown.
Similarly, it is unclear why risks for venous thrombosis and pulmonary embolism are raised in both KS and 47,XYY to a similar substantial degree, around 6-to 7-fold higher risk for venous thrombosis.This is similar or even higher than that conferred by factor V Leiden, a genetic variant carried by around 5% of White and of European descent individuals. 31Hence, it might be considered to add sex chromosome aneuploidy to the screening for genetic causes of thrombophilia.Furthermore, because KS and 47,XYY confer higher risks for multiple potentially preventable diseases, future studies should explore the potential benefits of wider testing.
Strengths of our study include the use of systematic case ascertainment and 100% confirmation of genome-wide association study (GWAS) array-based categorization using exome sequencing data in a large subsample.Furthermore, our numbers of individuals with a male sex chromosome aneuploidy are similar to those reported by UK Biobank, which found 355 such individuals (99.2% of our 358 cases). 12GWAS array genotyping and exome sequencing are increasingly performed in clinical settings, however male sex chromosome aneuploidy status is not routinely derived.In addition, the wide range of traits and diseases available in UK Biobank allowed us to systematically quantify the disease and phenotypic effects of male sex chromosome aneuploidy.
Limitations include the healthy volunteer bias of the UK Biobank sample and the yet incomplete linkage to primary care health data.Hence it is likely that the true disease risks associated with KS and 47,XYY analyses are even higher than the substantial estimates that we observed.
In conclusion, our findings show that male sex chromosome aneuploidy can be reliably detected using GWAS or exome sequencing data.KS and 47,XYY were mostly unrecognized but conferred substantially higher risks of diverse potentially preventable diseases, including metabolic and vascular diseases, which were only partially explained by higher levels of BMI, deprivation, and smoking.Future studies should consider the utility of deriving male sex chromosome aneuploidy status when genetic testing is undertaken for existing clinical indications, eg, for thrombosis risk.Furthermore, our findings add significantly to ongoing debates regarding the potential benefits of wider population genetic screening.

Figure 1
Figure 1 Circos plot summarizing phenome-wide disease association tests for KS and 47,XYY compared with 46,XY.Each segment represents each International Classification of Diseases (ICD)-10 chapter in lexicographical order.P values (on a negative logarithmic scale) were from logistic regression models for KS (outer circle) and 47,XYY (inner circle) with each of 875 ICD-10 coded disease outcomes, adjusted for age and 10 principal genetic components.Outcomes reaching the multiple testing corrected statistical significance threshold (P < .05/875= 5.7 × 10 -5 ; dashed line) are indicated by large circles (for positive associations) and diamonds (for negative associations).AIODCE, arthropathies in other diseases classified elsewhere; CAFAC, cutaneous abscess, furuncle and carbuncle; DOAAACIDCE, disorders of arteries, arterioles and capillaries in diseases classified elsewhere; MABDDTUOT, mental and behavioral disorders due to use of tobacco; NEC, not elsewhere classified; OBAATCODCTOC, other bacterial agents as the cause of diseases classified to other chapters; OCOPD, other chronic obstructive pulmonary disease; OEAMD, other extrapyramidal and movement disorders; ONDOLVALN, other noninfective disorders of lymphatic vessels and lymph nodes; ONGAC, other noninfective gastroenteritis and colitis; OLIOSAST, other local infections of skin and subcutaneous tissue; OSCAMPN, other sex chromosome abnormalities, male phenotype, NEC; OWPF, osteoporosis without pathological fracture; PIDCE, polyneuropathy in diseases classified elsewhere; SASATCODCTOC, streptococcus and staphylococcus as the cause of diseases classified to other chapters; SDDOSS, specific developmental disorders of scholastic skills; UALRI, unspecified acute lower respiratory infection.

Table 1
Typical features of Klinefelter syndrome and 47,XYY compared with men with normal (46,XY) karyotypes

Table 2
Anthropometric characteristics of Klinefelter syndrome and 47,XYY compared with men with normal (46,XY) karyotypes Beta, regression coefficient from linear regression models; BMI, body mass index; KS, Klinefelter syndrome; RR, relative risk from logistic regression models.
Y. Zhao et al.