Delineating the molecular and phenotypic spectrum of the SETD1B-related syndrome

Purpose Pathogenic variants in SETD1B have been associated with a syndromic neurodevelopmental disorder including intellectual disability, language delay, and seizures. To date, clinical features have been described for 11 patients with (likely) pathogenic SETD1B sequence variants. This study aims to further delineate the spectrum of the SETD1B-related syndrome based on characterizing an expanded patient cohort. Methods We perform an in-depth clinical characterization of a cohort of 36 unpublished individuals with SETD1B sequence variants, describing their molecular and phenotypic spectrum. Selected variants were functionally tested using in vitro and genome-wide methylation assays. Results Our data present evidence for a loss-of-function mechanism of SETD1B variants, resulting in a core clinical phenotype of global developmental delay, language delay including regression, intellectual disability, autism and other behavioral issues, and variable epilepsy phenotypes. Developmental delay appeared to precede seizure onset, suggesting SETD1B dysfunction impacts physiological neurodevelopment even in the absence of epileptic activity. Males are significantly overrepresented and more severely affected, and we speculate that sex-linked traits could affect susceptibility to penetrance and the clinical spectrum of SETD1B variants. Conclusion Insights from this extensive cohort will facilitate the counseling regarding the molecular and phenotypic landscape of newly diagnosed patients with the SETD1B-related syndrome.


Next generation sequencing analysis and recruitment of individuals
The results from Sanger sequencing confirmation can be found in Supplementary Figure S1, for those nine individuals for which this was available.

Individual 1 and 36:
Exome sequencing of DNA extracted from leukocytes was carried out for the proband and their parents using Human Core Exome (Twist Bioscience) capture followed by 2*150bp Illumina sequencing. Variant calling was performed with GATK 3.8 and variants were annotated using Alamut-Batch (v1.11). De novo, X-linked recessive, homozygous and compound heterozygous variants inherited in trans were identified for analysis in the proband using a gene-agnostic trio bioinformatics pipeline. Orthogonal validation was performed by targeted Sanger sequencing of SETD1B in all three family members of individual 1.

Individual 2 and individual 6
This study was approved by local institutional IRB/ethical review boards (Project ID: 07/N018, REC Ref: 07/Q0512/26), and written informed consent was obtained prior to genetic testing from the families involved. Clinical details were obtained through medical file review and clinical examination.
Genomic DNA was extracted from peripheral blood samples according to standard procedures of phenol chloroform extraction. WES on each proband was performed as described elsewhere 1 in Macrogen, Korea. Briefly, target enrichment was performed with 2 μg genomic DNA using the SureSelectXT Human All Exon Kit version 6 (Agilent Technologies, Santa Clara, CA, USA) to generate barcoded whole-exome sequencing libraries. Libraries  Centre.

Individual 3:
The study was approved by the ethics committees of the Hospital District of Helsinki and Uusimaa and the Institutional review board of Columbia University, New York (IRB-AAAS3433).
For family Individual 3, DNA samples from the affected male individual and both parents underwent exome sequencing. Exomic libraries were prepared using the SureSelect Human All Exon V6 kit (60.46 Mb target region) and paired-end sequencing was performed on a HiSeq2500/4000 instrument (Illumina Inc, San Diego, CA, USA), with an average sequencing depth of on target regions of 68x. Lowquality reads were removed and the filtered reads were aligned to the human reference genome (GRCh37/Hg19) using Burrows-Wheeler Aligner-MEM (BWA) 2 . Duplicate removal, insertions/deletion (Indel)-realignment and base quality score recalibration were performed with Picard-tools and the Genome Analysis Toolkit (GATK). Single nucleotide variants (SNVs) and InDels were called by the GATK HaplotypeCaller 3 . Copy number variants (CNVs) were called in the exome data from the affected individual using CONiFER (v0.2.2) 4 . As part of the quality control, family relations and sex were confirmed using VCFtools and plink 5,6 . SNV/InDel variant annotation and filtering were performed using ANNOVAR 7 and custom scripts. Variants were filtered by first retaining exonic and splice region variants and based on variant segregation (e.g. autosomal recessive, de novo, X-linked). Next, variants with a predicted effect on protein function or pre-mRNA splicing (missense, frameshift, nonsense, start-loss, splicing, etc.) with a population specific minor allele frequency (MAF) of <0.005 (for AR) and <0.0005 (for AD) in all populations of the Genome Aggregation Database (gnomAD) 8 were retained.
Last, bioinformatic prediction scores were annotated from dbnsfp35a and dbscSNV1.1 to evaluate missense and splice site variants respectively 9,10 . For CNVs, gene annotation was done using the BioMart Database 11 and variant frequency was assessed using the Database of Genomic Variants 12 and gnomAD 8 using the same frequency cut-offs as above for SNV/InDels. SNV/InDel variants were confirmed using Sanger sequencing using an ABI3130XL Genetic Analyzer. No other candidate variants than the SETD1B variant were identified in the analysis.

Individual 4:
Diagnostic trio whole exome sequencing was performed using the AgilentSureSelect v5 capture kit followed by sequencing on an Illumina Hiseq2500 platform (outsourced to GenomeScan, Leiden, The Netherlands). Analysis was performed in the LUMC's clinical genetic laboratory using an GATK-based pipeline and in-house developed analysis software (LOVDplus). No other candidate variants than the SETD1B variant were identified in the analysis.

Individuals 5, 7 and 33:
Diagnostic trio whole exome sequencing was done as previously described 13

Individual 8 and 9:
Genomic DNA was isolated from peripheral blood leukocytes. Library preparation was using the Filtering of variants was done using Alissa Interpret (Agilent Technologies). Variants with < 5 reads, a frequency of more than 1% in public (ESP, dbSNP, 1KG) and/or in house databases were excluded. De novo, homozygous or compound heterozygous variants present in exons or within +/-6 nt in the intron were evaluated.

Individual 22:
The study was approved by the Pediatric Ethics Committee of the Tuscany Region, in the context of the DESIRE project (Seventh Framework Programme FP7; grant agreement no. 602531). We performed trio-exome sequencing as previously reported 19 (Vetro et al, 2020 For the annotation and filtering of exonic/splice-site single-nucleotide variants (SNVs) and coding InDels we used commercially available software (VarSeq, Golden Helix, Inc v1.4.6), focusing on nonsynonimous/splice site variants with minor allele frequency (MAF) lower than 0.01 in the GnomAD database (http://gnomad.broadinstitute.org/). We further excluded population-specific variants by interrogating our internal database (WES data from over 900 patients with DEE and 200 healthy parents) and evaluated the potential functional impact of SNVs and InDels by the pre-computed genomic variants score from dbNSFP 21 which was integrated in the annotation pipeline. We also manually interrogated in-silico prediction tools [22][23][24] , as well as evolutionary conservation scores 25,26 .
For selected variants, we visually inspected the quality of reads alignment by using the Integrative Genomics Viewer 27 and then proceeded to validation by Sanger sequencing (primers and conditions are available upon request).

Individual 23:
Written informed consent was obtained for all participants in this study under a research protocol approved by the Institutional Review Board at Nationwide Children's Hospital (IRB18-00662, "Gene Discovery in Clinical Genomic Patients"). Paired-end genome sequencing libraries were constructed for DNA from the proband, mother, and father using NEBNext Ultra II FS DNA Library Prep Kit (New England BioLabs). Whole-genome sequencing was performed on an Illumina NovaSeq6000 instrument according to manufacturer protocols. Reads were mapped to the GRCh37 reference sequence and secondary data analysis was performed using Churchill 28 . The average sequence depth achieved per sample was ~33x. Our general approach to variant annotation and prioritization has already been described 29 ; for this case we prioritized rare nonsynonymous coding variants under several possible inheritance models: Dominant (de novo), recessive (homozygous or compound heterozygous), and Xlinked (hemizygous). We identified two de novo coding variants in the proband: hg19:chr12-122261055-C-T, NM_001353345.2:c.4570C>T SETD1B:(p.Arg1524Ter) and hg19:chr1-179314193-T-C, NM_003101.6:c.1099T>C: SOAT1:(p.Phe367Leu).

Individual 26 and 28:
Diagnostic exome sequencing was done at the Departments of Human Genetics of the Radboud University Medical Center Nijmegen, The Netherlands, and performed essentially as described previously 30 . This study was approved by the institutional review board 'Commissie Mensgebonden Onderzoek Regio Arnhem-Nijmegen' under number 2011/188.

Individual 31:
Trio-exome sequencing of individual 31 occurred as previously described 30

Individual 32:
DNA was extracted from peripheral blood using the Promega Maxwell RSC DNA Extraction Kit. The Clinical Exome Sequencing (CES) library was generated using the Agilent SureSelect Human All Exon V6 plus a custom mitochondrial genome capture kit. Captured DNA fragments were then sequenced using the Illumina Nextseq 500 or HiSeq 4000 sequencing system, with 2x100 basepair (bp) paired-end reads. Single nucleotide variants (SNVs) and small insertions and deletions (<10 bp) were detected by mapping and comparing the DNA sequences with the human reference genome (GRCh37/hg19).

Variant confirmation by Sanger sequencing is performed for all insertions and deletions as well as
substitutions that do not meet the laboratory's coverage and quality score thresholds.  The beads were washed 3x with the lysis buffer and SETD1B constructs were eluted in lysis buffer supplemented with 20mM L-glutathione after 10min incubation at RT.

Thermal shift assay
Thermal shift assay was performed according to a previously published protocol 33

Severity scoring
For each individual, a phenotype severity score (Supplementary

Genome-wide methylation profiles and data analysis
Genome-wide methylation profiles were obtained using the Infinium MethylationEPIC BeadChip array (Illumina) 34 . The SETD1B EpiSignature has been implemented in the clinical genome-wide DNA methylation assay, "EpiSign", and methylation profiles were analysed using the Multiclass Classification Algorithm of EpiSign v2 35      After an uneventful pregnancy and home delivery, a boy was born at gestational age of 41+1 weeks with a birth weight of 4.5 kg. On day 1 after birth he experienced episodes of apnea with desaturation and tonic seizures. A cerebral ultrasound and MRI were performed, showing cystic encephalomalacia at the right side of the brain with bilateral ventriculomegaly and displacement of the brainstem.
Fenestration of the cyst occurred and an Ommaya shunt was inserted. Seizures were treated with phenobarbitone for 2 months. At the age of 11 months focal motor seizures reoccurred, and antiepileptic drugs were restarted (valproic acid and clobazam). In the year thereafter, valproic acid was switched to oxcarbazepine with good effect. Because of exotropia, he had strabismus surgery at the age of 6 years. He developed some form of speech (copying words and singing short lines) from the age of 2 years, but lost this ability once he started walking. His motor development was severely delayed (ambulation at 4 years). Currently, at the age of 11, he produces sounds but no words.
Furthermore, he has pronounced joint hypermobility and severe constipation, for which a high dose macrogol is required. He has autistiform behavior but does not meet diagnostic criteria. Behavioral and sleeping problems seem to increase throughout the winter period. Of note, the patient is also status post normal chromosome microarray, fragile X testing, MPS screening, inborn errors of metabolism screening, and brain MRI. Most recent thyroid ultrasound was normal. Individual 16 is a currently 1 year old female, born as the first child to non-consanguineous Chinese parents at 38+1 weeks of gestation, with a birth weight of 2.26 kg and a good start (APGAR 10/10/10).
Delivery was by cesarean section due to mother's preeclampsia. After birth, she was hospitalized for treatment because of neonatal pneumonia, hyperglycemia, conjunctivitis, low birth weight and patent arterial duct. One day before she was admitted to hospital at the age of 3 months and 12 days, she had high fever (39.6 C) with unknown cause. Parents tried to physically lower her body temperature by rubbing with alcohol and treated her with antibiotics, and her body temperature decreased to normal gradually. 12 hours before admission, she started with sighing breath, lasted for 5-10 minutes, followed by seizures unconsciously with right upper limb convulsion, left upper limb and both lower limbs tonic-clonic seizures, while head moving leftwards, both eyes gazing leftwards, with a pale face, purple lips, and tightly closed mouth. Seizure lasted about 10 minutes, and her body temperature was 37.9 C at that time. She was admitted to the hospital unconsciously. Lung CT showed signs of a bilobular pneumonia and blood work suggested mild anemia and septicemia. She can raise her head steadily by 3 months, is able to smile, but not very active, not very good at following light or moving objects. Her physical examination showed hypertonia, with high deep tendon reflexes (knee and  c.5820_5826delCTATGAC, p.Tyr1941fs variant. He presented at 1-1/2 years of age with medically intractable Lennox-Gastaut syndrome. He was noted to be globally developmental delayed since infancy and is affected with intellectual impairment and autistic features. Currently he is becoming less verbal and there is a concern for a regression of expressive language skills. He has fuller cheeks, tapered fingers, thoracolumbar scoliosis and pes planus. Individual 35 has a history of early developmental delay, medically refractory absence epilepsy with a myoclonic component, and autism spectrum disorder. He was delivered at term with a birth weight of 7 pounds 11 ounces. He experienced early GE-reflux and underwent a frenectomy as an infant. He had early developmental issues and did not walk until starting early intervention therapies at 14 months of age. He also had a significant language delay. His parents began to notice brief staring spells sometimes with a larger jerk in his infancy. This was subsequently diagnosed as epilepsy around the age of 3 years.
He was found on EEG to have generalized polyspike spike-wave discharges. Since then he has had medically refractory absence epilepsy with a myoclonic component but also has had generalized tonicclonic seizures. He was thought initially not to have autism, but as he got older his processing issues and perseverative behaviors became more evident and he was diagnosed with autism spectrum disorder at the age of 4. He has never had regression.