Delineation of a KDM2B-related neurodevelopmental disorder and its associated DNA methylation signature

Purpose: Pathogenic variants in genes involved in the epigenetic machinery are an emerging cause of neurodevelopment disorders (NDDs). Lysine-demethylase 2B (KDM2B) encodes an epigenetic regulator and mouse models suggest an important role during development. We set out to determine whether KDM2B variants are associated with NDD. Methods: Through international collaborations, we collected data on individuals with heterozygous KDM2B variants. We applied methylation arrays on peripheral blood DNA samples to determine a KDM2B associated epigenetic signature. Results: We recruited a total of 27 individuals with heterozygous variants in KDM2B. We present evidence, including a shared epigenetic signature, to support a pathogenic classification of 15 KDM2B variants and identify the CxxC domain as a mutational hotspot. Both loss-of-function and CxxC-domain missense variants present with a specific subepisignature. Moreover, the KDM2B episignature was identified in the context of a dual molecular diagnosis in multiple individuals. Our efforts resulted in a cohort of 21 individuals with heterozygous (likely) pathogenic variants. Individuals in this cohort present with developmental delay and/or intellectual disability; autism; attention deficit disorder/attention deficit hyperactivity disorder; congenital organ anomalies mainly of the heart, eyes, and urogenital system; and subtle facial dysmorphism. Conclusion: Pathogenic heterozygous variants in KDM2B are associated with NDD and a specific epigenetic signature detectable in peripheral blood.


Introduction
Genes encoding for epigenetic regulators are an emerging class of monogenic disease genes associated with neurodevelopment disorders (NDDs). This group of disorders, collectively referred to as "Mendelian Disorders of the Epigenetic Machinery" (MDEMs), present with neurodevelopmental delay, congenital malformations, and/or growth abnormalities. 1 For an increasing number of MDEMs, distinct genome-wide methylation signatures (episignatures) have been identified. 2 These signatures present as valuable tools in clinical practice because they are unique for each disorder and can be detected in peripheral blood samples, providing a robust and easily accessible diagnostic tool that enables the diagnosis of uncharacterized individuals as well as the identification of novel pathogenic variants through pinpointing the causal gene. 3 The KDM2B gene (lysine-demethylase 2B, also called FBXL10, NDY1, CXXC2, and JHDM1B; OMIM 609078) encodes for a well-studied component of the epigenetic machinery. The canonical, full-length KDM2B protein acts by demethylating lysine residues K4, K36, and K79 of histone 3. [4][5][6][7][8][9][10][11] This catalytic activity is provided by the JmjC domain, which is conserved from yeast to human. 4,9 The CxxC domain directs KDM2B to promotor regions by binding unmethylated CpG dinucleotides. [12][13][14] This DNA-binding capacity has been linked to the recruitment of the polycomb repressive complex 1 to developmental genes. 10,15,16 KDM2B has been implicated in many biological processes, including cell cycle regulation, metabolic regulation, and DNA-damage repair. 11,13,17,18 Moreover, in line with a central role in epigenetic and transcriptional regulation, KDM2B is essential for organism development and regulates cellular differentiation. 12 KDM2B, together with SETD1B, has been implicated as a disease-causing gene in the rare 12q24.31 microdeletion syndrome. [19][20][21] In addition, 2 sporadic patients and 1 family with heterozygous KDM2B missense variants have been described. [22][23][24] A monogenic KDM2Brelated human disorder has however not been delineated, and the significance of the reported variants remains uncertain.

Inclusion criteria and data collection
Individuals were included based on the identification of a heterozygous KDM2B suspected to be pathogenic on the basis of in silico predictions and/or inheritance. Individuals carrying biallelic variants of uncertain significance (VUS) were not considered for this study.
Individual 1 was identified as the index patient after a KDM2B variant was annotated to be of interest after diagnostic trio exome sequencing. Families 2 to 5 were included after local, in-house database searches. All remaining individuals were included after personal communication, literature search, or from searches using the GeneMatcher platform. 25 For the published cases, we contacted the original authors for updated clinical information. For all individuals, clinical and genetic data were collected through a standardized spreadsheet, which was completed by the respective physicians and/or researchers.

Genetic variant detection
Variants in individuals/families 24,25,29,30, and 34 were identified as described before. 19,20,22,24 The variant in individual 4.3 was identified through targeted Sanger sequencing. All other variants were detected through clinical and/or research-based exome sequencing.

Analysis and classification of KD2MB variants
Structural analysis of variants was performed using Pymol (The Pymol Molecular Graphics System, Version 2.5, Schrödinger, LLC). All Figures were generated using Pymol. Four in silico prediction algorithms were consulted: Sorting Intolerant from Tolerant, Metadome, MutationTaster, and Polymorphism Phenotyping v2. [26][27][28][29] All variants were manually analyzed using Alamut Visual v2.15 (Sophia Genetics). Variants were classified according to the 2015 American College of Medical Genetics and Genomics/Association for Molecular Pathology guidelines. 30 Episign results were used to support classification according to criterium Pathogenic Strong 3 (PS3), and Pathogenic Moderate 1 (PM1) was applied for variants in the CxxC domain.

Development of episignature
Materials and methods associated with episignature development are provided as Supplemental Methods in the supplementary material.

Results
We initiated this study after the identification of a de novo c.1912G>A (p.Gly638Ser) variant in KDM2B (NM_032590.4; Table 1; Supplemental Table 1) through diagnostic trio exome sequencing in the index patient (individual 1), who was diagnosed with speech delay, autism spectrum disorder (ASD), and a congenital heart defect (CHD) ( Table 2;  Supplemental Table 2). The identified variant was absent from the Genome Aggregation Database (gnomAD), 31 predicted damaging by multiple algorithms (Supplemental Table 1), and affects a well-conserved residue (Supplemental Figure 1A) located within the CxxC domain ( Figure 1A and B). KDM2B presented as an outstanding candidate disease gene because the gene is intolerant for both putative loss-of-function (LoF) (observed/expected = 0.09 [0.05-0.18]) and missense (z-score = 3.44) variants in the general population. 31 In addition, several animal models presented with severe congenital defects. 12,[32][33][34][35] We therefore aimed to study additional individuals with heterozygous KDM2B variants and formed this cohort after online matchmaking using the GeneMatcher platform, 25 Tables 1 and 2). Our cohort encompassed 3 putative LoF variants, 13 missense variants, 1 in-frame deletion, and three 12q24.31 microdeletions. A total of 17 variants were confirmed to be de novo. Of note, one of these variants was identified twice and thus occurred de novo at 2 independent occasions (c.1847G>A; individuals 18 and 31). Four variants were inherited (families 3, 4, 5, and 25; Figure 1C and Supplemental Figure  1B). Four individuals with microdeletions have been reported previously: family 25 with an inherited 12q deletion 19 and individuals 29 and 30 with a de novo 12q deletion including KDM2B and SETD1B. 20 We observed a remarkable clustering of variants (8 missense and 1 in-frame deletion) in the DNA-binding CxxC domain ( Figure 1A and B), of which, 7 missense variants were predicted damaging by all algorithms. The only exception was p.Ile652Val, which was the only inherited variant and is reported twice in gnomAD. We performed structural modeling, which showed that this residue is located toward the surface and the substitution is not expected to influence local structure. All other variants affecting the CxxC domain however are expected to affect protein function by either interfering with the binding with Zn 2+ ions (p.Cys616, p.Cys627, and p.Cys630) or affecting the local structure (p.Gly638, p.Asp632, and p.Lys635; Figure 1B). The p.Val316Ile variant is likely located at the active site of the JmjC domain and therefore is expected to affect its catalytic function ( Figure 1D).
Because of the role of KDM2B in the epigenetic machinery, we hypothesized that impaired KDM2B function leads to genome-wide changes in DNA methylation, an effect that has been observed in >50 other genetic syndromes. 2,3,36 These methylation changes present as disease specific episignatures, which are detectable in peripheral blood. As such, episignatures provide not only fundamental insights into the molecular consequences of genetic variants but also easily accessible diagnostic tools to identify syndromes or reclassify VUS. 3 We generated genome-wide methylation array data for 21 individuals (Table 1;  Supplemental Table 1) according to the previously established protocols. 37 We excluded 2 samples because of technical errors (samples 2 and 4.2; Supplemental Figure 2D). Another 3 samples failed to group with case samples after cross validations and were excluded for the establishment of the episignature (samples 5.1, 14, and 19; Supplemental Figure 2B, D, and E). These 3 variants were previously classified as VUS on the basis of their inheritance and/or presence in gnomAD (Table 1; Supplemental Table 1).
The remaining 15 samples, representing 13 variants, were used to establish a KDM2B episignature ( Figure 2). Methylation patterns were assessed for sample quality, degree of methylation change, and statistical robustness of observed changes at each probe, allowing for effective modeling of the methylation differences observed between case samples and matched controls (see Materials and Methods). Comparisons were performed against age-van Jaarsveld et al.
Page 4 Genet Med. Author manuscript; available in PMC 2023 January 08. and sex-matched controls, leading to the identification of 156 statistically differentially methylated probes ( Figure 2A). Hierarchical clustering based on this probe set showed distinct clustering of case samples away from controls, with all samples presenting a more similar methylation profile to one another than the matched controls ( Figure 2B and C). Cross validation assays, based on the removal of each single sample from the probe selection training process, confirmed that the probe set was able to effectively identify KDM2B variants because all case samples remained grouped together in each iteration (Supplemental Figure 3B). In conclusion, we have established an episignature that is able to discriminate KDM2B variants from controls and classified these 13 variants as pathogenic (Table 1). Interestingly, the KDM2B associated episignature mainly consisted of hypermethylated probes ( Figure 2A).
We next tested the sensitivity and specificity of the episignature using a support vector machine. For each sample, we determined a methylation variant pathogenicity (MVP) score between 0 and 1 on the basis of matching the KDM2B episignature. All KDM2B samples included in the training set received scores of >0.8, whereas control samples remained near 0, indicating high sensitivity for the detection of the KDM2B episignature (Supplemental Figure 3C). Specificity was tested using a similar classifier that was instead trained against a large number of samples with confirmed diagnoses of non-KDM2B related disorders from our Episign knowledge database. In total, 75% of both case and control samples were used for training the classifier with the remaining 25% reserved for testing ( Figure  2D). Case samples again scored high (>0.85) whereas the remainder of samples scored low (<0.5), with few exceptions. The most notable exceptions were cerebellar ataxia, deafness, and narcolepsy (ADCADN; OMIM 604121), Hunter-McAlpine syndrome (HMA; OMIM 601379), and dystonia 28, childhood-onset (DYT28; OMIM 617284). One other sample among the control samples did score remarkably high for the KDM2B signature ( Figure   2D, red arrow head). This patient was previously diagnosed with intellectual developmental disorder with seizures and language delay (IDDSELD; OMIM 619000), a disorder caused by SETD1B variants. Upon closer investigation, we identified this sample to originate from a case with a 12q24.31 microdeletion including SETD1B but not KDM2B. 20,38 We hypothesize that the deletion might have affected KDM2B regulatory regions. For reference, we have included the clinical description of this individual (individual 33, Supplemental Table 2).
A total of 4 KDM2B variants within our cohort tested negative for the signature. Among the negative samples are 3 missense variants for which pathogenicity was doubtful based on the a priori predictions and/or gnomAD data. The other negative sample in our cohort was that of the only splice-site variant, indicating that the predicted splice effects do not occur or at least not to a level that interferes with gene functionality. We classified these 4 variants as VUS because a negative episignature result does not suffice to infer an absence of functional effects (Table 1).
Of note, 2 individuals in our cohort with microdeletions encompassing both KDM2B and SETD1B (individuals 29 and 30) were previously shown to have the SETD1B-associated episignature. 20

Author Manuscript
Author Manuscript

Author Manuscript
Author Manuscript was from a girl who was previously diagnosed with Phelan-McDermid syndrome (OMIM 606232) due to a 22q13 deletion. These results indicate that multiple episignatures can coexist in a single individual, and the method is able to correctly identify 2 syndromes independently.
The clustering of missense variants in the CxxC domain suggests a distinct effect on KDM2B function over LoF variants. We therefore trained a separate classifier using 5 CxxC domain missense samples (Figure 3; Supplemental Table 1). Interestingly, the resulting probeset not only correctly differentiated all KDM2B variants (ie, including the LoF variants) from controls but also was able to distinguish CxxC and LoF variants ( Figure  3B and C). Of note, 106 hypermethylated probes among the 107 significant probes selected for the CxxC-trained episignature present with an on average increased methylation level, even exceeding that of the hypermethylated probes of the pan-KDM2B probeset (mean methylation difference of all hypermethylated probes: 16.56% ± 4.21% vs 10.38% ± 3.68%; Figure 2A and Figure 3A). Importantly, a probeset trained using LoF samples was able to discriminate CxxC from LoF variants as well (Supplemental Figure 4). In conclusion, CxxC missense variants cause a distinct episignature that is associated with increased hypermethylation levels, suggesting an additional effect on KDM2B functioning of these variants.
Next, we set out to reclassify 4 variants that were not tested for the episignature on the basis of the American College of Medical Genetics and Genomics/Association for Molecular Pathology guidelines 30 (Supplemental Table 1). Importantly, we considered the CxxC domain as an established hotspot for pathogenic variation in KDM2B (criterium PM1). We reclassified 2 variants as pathogenic, 1 variant as likely pathogenic, and 1 as VUS (Table 1; Supplemental Table 1). In summary, we classified 15 variants as pathogenic, 1 variant as likely pathogenic, and 6 variants remained as VUS (Table 1; Supplemental Table  1, Supplemental Figure 5).
Clinical data of all individuals are systematically presented in Table 2. A more detailed description for each individual and their clinical histories are provided as supplemental material. Data form previously described individuals are included in Supplemental Table 2 as well. Of the 21 individuals with a-likely-pathogenic variant, 4 had a microdeletion and 2 had an additional genetic diagnosis. The remaining 15 individuals all presented with speech delay, developmental delay (DD), learning difficulties, and/or intellectual disability (ID). Behavioral concerns such as ASD and attention deficit hyperactivity disorder were common (9/15). Growth parameters were within the normal range for most patients. We observe several congenital defects, including heart defects (7/15), unilateral kidney agenesis (4/15), and ophthalmological anomalies (6/14). Two patients had cryptorchidism and 2 had epilepsy. We collected facial photographs of 12 (12/21) individuals ( Figure 4). Facial features noted in several individuals with CxxC-domain variants were a broad nasal tip, large ear lobes, and exaggerated Cupid's bow. Interestingly, in the individuals with LoF variants the nose was often more prominent, with a narrow nasal ridge and malar flattening, with the exception of individual 10 who also had a diagnosis of Noonan syndrome. Despite these subtle features, no consistent facial gestalt could be identified. In summary, we have delineated a novel syndrome that is caused by heterozygous KDM2B variants and characterized clinically by DD/ID; behavioral challenges including autism and attention deficit hyperactivity disorder; congenital anomalies mainly of the heart, urogenital system, and eyes; and variable facial dysmorphism.
Because our methylation analysis revealed differences between CxxC and LoF variants, we studied potential genotype-phenotype relationships for these subgroups. Unilateral kidney agenesis and eye anomalies were only reported in CxxC cases. In addition, CHDs were present in 6 individuals with a CxxC variant (6/9) and only in 2 individuals with a LoF variant (2/8). Of note, this involved 1 individual with Noonan syndrome and 1 with Phelan-McDermid syndrome, and both syndromes are associated with CHD. We thus note that congenital organ anomalies might be overrepresented in individuals with CxxC variants; however, the current limited number of available cases precludes to draw any conclusions. Epilepsy did not occur in association with the CxxC-domain variants but occurred in 1 patient with a JmjC-domain variant, 1 patient with a frame-shift variant, and 2 patients with a 12q24.31 microdeletion.

Discussion
We describe a novel NDD caused by heterozygous pathogenic variants in KDM2B and present an episignature associated with the disorder. We identify the CxxC domain as a mutational hotspot, and variants affecting this domain are associated with a specific episignature. In line with other MDEMs, individuals with pathogenic variants in KDM2B present with variable phenotypic expression, including DD/ID, congenital organ anomalies, and/or facial dysmorphisms. Most pathogenic variants are of de novo origin, however, in 3 families, the variant was found to be inherited. Interestingly, individual 25.2 is an adult male with a 12q24.31 microdeletion, who did not seem to be clinically affected even though the KDM2B episignature was present in his sample (data not shown). These observations might reflect the variable expression of this disorder; although, we also consider alternative hypotheses. Because we were unable to test the grandparents, we cannot exclude germline mosaicism. Alternatively, all inherited variants originated from the father, possibly indicating that males are less severely affected. In mice, Kdm2b has been shown to be involved in X-chromosome silencing, 39 and as such, a different clinical expression in males vs females seems plausible. Larger cohorts are needed to fully encompass the phenotypes and penetrance associated with the disorder and refine its (domain) specific episignature(s).
For the KDM2B episignature, we noticed elevated MVP scores for 3 other disorders ( Figure 2D). The first was ADCADN, caused by pathogenic variants in DNMT1, a methyltransferase known as the central player in the maintenance of CpG methylation. 2,40 Interestingly, DNMT1 has been suggested to regulate H3K4 methylation, providing a direct mechanistic link with KDM2B. 41 The second was HMA, a syndrome associated with duplication of 5q35. 42,43 This region includes NSD1, which encodes a lysine methyl transferase known to methylate H3K36, 44  Hypermethylation has previously been linked to KDM2B because loss of Kdm2b in mouse embryonic stem cells leads to genome-wide hypermethylation of promotors associated with polycomb repressive complexes, implying that Kdm2b protects against de novo DNA methylation during mouse development. 12 Interestingly, re-expression of a short Kdm2b isoform, lacking the catalytic JmjC domain, resets the methylation of CpG-islands to baseline levels and rescues embryonic lethality. 12 This short isoform is highly expressed in mouse embryonic stem cells, 34 suggesting important functions of Kdm2b besides lysine demethylation activity.
The clustering of variants in the CxxC domain, an excess of congenital defects, and a distinct episignature with even further elevated levels of hypermethylation associated with CxxC-domain variants are suggestive of a-partial-dominant-negative effect of these variants. Interestingly, the CxxC domain has been specifically implicated in the developmental functions of KDM2B because CxxC-domain mutants fail to rescue cellular differentiation induced by Kdm2b depletion in mouse embryonic stem cells, 10 and specific deletion of the CxxC domain induces developmental defects in the heterozygous state whereas heterozygous knock-outs appear healthy. 32,33 Recently, a heterozygous conditional mutant mouse with loss of the CxxC domain in the developing brain was shown to exhibit impaired memory and ASD-like behaviors. 49 These animal models therefore share many features with the human syndrome and also point to the CxxC domain harboring important developmental functions. Further studies are necessary to fully understand the broad effect of KDM2B on human development.
The KDM2B-associated NDD represents a novel addition to the emerging group of MDEMs. 1 Its associated episignature can aid in reclassification of VUS and the detection of missed variants during routine diagnostic testing, eg, due to variants outside coding regions or due to inherited variants missed by standard trio filtering.   Table 1 Overview of KDM2B variants in the cohort   Table 2 An overview of the phenotypes associated with