## Abstract

### Purpose

Fetal fraction (FF) is the percent of cell-free DNA (cfDNA) in the mother’s peripheral blood that is of fetal origin, which plays a pivotal role in noninvasive prenatal screening (NIPS). We present a method that can reliably estimate FFs by examining autosome single-nucleotide polymorphisms (SNPs).

### Methods

Even at a very low sequencing depth, there are plenty of SNPs covered by more than one read. At those SNPs, we define read heterozygosity and demonstrate that the percent of read heterozygosity is a function of FF, which allows FF to be inferred.

### Results

We first demonstrated the effectiveness of our method in inferring FF. Then we used the inferred FF as an informative alternative prior to computing Bayes factors to test for aneuploidy, and observed better power than the

*Z*-test. In analysis of clinical samples, we were able to identify female–male twins thanks to the accurate FF inference.### Conclusion

Knowing FF improves efficacy of NIPS. It brings a powerful Bayesian method, allows “no call” for samples with small FFs, renders screening for XXY syndrome simpler, and permits an adaptive design to sequence at a higher depth for samples with small FFs.

## Keywords

## INTRODUCTION

Fetal fraction (FF) is the percent of cell-free DNA (cfDNA) in maternal peripheral blood that comes from the placenta of the fetus. FF plays a pivotal role in noninvasive prenatal screening (NIPS), which aims to examine whether a given chromosome is trisomy in the fetus. The American College of Medical Genetics and Genomics (ACMG) recommended in its position statement that all laboratories should include a clearly visible FF on the NIPS report, and all laboratories should establish and monitor analytical and clinical validations for FF. The lower limit of FF maintaining a reliable result is approximately 4%, and a low FF in maternal circulation was associated with an increased risk of fetal aneuploidies. Thus, ACMG recommends that no call due to low FF needs to be specified in the NIPS report.

^{1.}

^{2.}

^{1.}

Given the importance of the FF in NIPS, many methods have been proposed to detect FF. Early methods examined DNA sequences from the Y chromosome, either using polymerase chain reaction (PCR) or massive parallel sequencing technology. Since methods based on the Y chromosome can only be applied to male fetuses, researchers focused efforts on developing methods that can apply to both male and female fetuses. Some used fragment size of cfDNA, where fetal cfDNA is generally shorter than maternal cfDNA; some explored the methylation differences between paternal and maternal cfDNA. However, these methods were not yet accurate enough for practical use. There are two very promising and elegant approaches that require no additional data. One explores the difference between DNA digestion of fetal and maternal cell-free DNA. The other explores the fact that fetal cfDNA sequences are not uniformly distributed along the genome, presumably because actively expressed genomic regions were digested faster than those dormant gene regions. But these two methods present difficulties for smaller FFs.

^{3.}

^{4.}

^{5.}

^{6.}

^{,}^{7.}

^{8.}

The most successful methods are those that utilize inheritance patterns in single-nucleotide polymorphism (SNP) markers. Early methods assumed both maternal and paternal genotypes are known, and at a set of loci where the father is AA and the mother is BB, one can deduce FF by counting reads carrying As and Bs at those loci from sequencing data of maternal cfDNA. A recent method only genotyped the mother. It first identified maternal homozygous loci, tallied and computed nonmaternal allele fractions at these loci from low-coverage sequencing data of maternal plasma, and used these to train a linear model to predict FF. High-depth targeted sequencing of maternal plasma was successfully used to determine FF.

^{10.}

^{,}^{11.}

^{12.}

^{13.}

^{,}Because of high sequencing depth, minor allele frequencies can be reliably estimated, the informative maternal–fetal joint genotypes can be inferred (one homozygous, one heterozygous), and based on this, FF can be determined. These methods, however, are not completely satisfactory as they are either not accurate enough, not cost-effective, or too laborious.In this paper we introduce a statistical method that can infer FF using low depth sequencing of maternal cfDNA. The precision of the FF inference can be greatly improved if we also sequence maternal white blood cell DNA (wbcDNA) at a low depth. Our method is based on the read heterozygosity (RHet). A SNP locus is called RHet if it is covered by at least two reads carrying different alleles. RHet is determined mainly by the unobserved maternal–fetal joint genotypes and the FF. Other contributing factors include the inbreeding coefficient of the mother, the inbreeding coefficient of the fetus, the sequencing error rate, and SNP allele frequencies of the reference population. Sequencing of maternal wbcDNA contributes to the FF inference in two ways: it can be used to infer maternal inbreeding coefficient, and we can combine reads from the maternal cfDNA and wbcDNA sequencing to infer a diluted FF from a higher coverage data set and scale back.

The traditional

*Z*-test method for NIPS first estimates chromosomal dosages for a sample (assuming a euploid mother carrying a single fetus), then it calculates the deviation of the chromosomal dosages from the mean dosage of a set of euploid controls (euploid mothers each carrying a euploid fetus), and lastly it normalizes the deviation by the sample standard deviation of the euploid controls to obtain a*Z*-score. Samples with*Z*>3 are declared trisomy positive. The cutoff of 3 is chosen such that the false positive rate is about 0.001 if the*Z*-score is normally distributed. If the fetus is trisomy, then the deviation is expected to be the FF. Thus, the higher the FF, the higher the power to detect true trisomy.When FF is known (denoted by

*h*), a more powerful test can be developed.*Z*-test only compares the deviation of the centered chromosomal dosage from 0. Since for a trisomy sample the chromosomal dosage is expected to be*h*above the mean dosage of euploid controls, we can also compare how close the centered chromosomal dosage is to*h*. In addition to increased power to screen for trisomy, knowing FF brings several other benefits. First, if FF is too small, which is a major source of false negatives, we can declare “no call.” Second, testing aneuploidy of sex chromosomes, such as Klinefelter syndrome (47, XXY), Turner syndrome (45, X), and XYY syndrome, becomes a much simpler problem. Third, it allows us to develop an adaptive design to sequence at a higher depth for samples with small FFs.## Materials And Methods

### NIPS samples

Beginning 15 March 2016, we enrolled pregnant women who were undergoing routine obstetrical care at the Beijing Hospital. The institutional review board of the Beijing Hospital approved the study. All experiments were performed in accordance with relevant guidelines and regulations. Written informed consent was obtained from all patients. To be eligible for the study, pregnant women had to be at least 18 years of age and had to be carrying a fetus with a gestational age of at least 8 weeks. More information is in the Supplementary Material and Methods.

### Naive model

An ideal scenario would be no sequencing errors, and both mother and the fetus have an inbreeding coefficient of 0. Denote FF by

*h*, and at an arbitrary biallelic SNP (A and B) denote the frequency of allele A as*p*. Assuming no inbreeding, the Hardy–Weinberg equilibrium holds for each genotype of one individual, such that ${f}_{AA}={p}^{2}$, ${f}_{AB}=2p\left(1-p\right)$, and ${f}_{BB}={\left(1-p\right)}^{2}$. The distribution of the joint maternal–fetal genotypes and the A allele frequency of the mixture can be derived, as shown in Table 1.Table 1Allele frequency in the mixture

MM-FF | Prob | f_{A} |
---|---|---|

AA-AA | p^{3} | 1 |

AB-AB | p(1 − p) | 1/2 |

BB-BB | ${\left(1-p\right)}^{3}$ | 0 |

AA-AB | ${p}^{2}\left(1-p\right)$ | 1 − h/2 |

BB-AB | $p{\left(1-p\right)}^{2}$ | h/2 |

AB-AA | ${p}^{2}\left(1-p\right)$ | 1/2 + h/2 |

AB-BB | $p{\left(1-p\right)}^{2}$ | 1/2 − h/2 |

Note joint genotypes AA-BB and BB-AA are Mendelian incompatible and excluded from the table.

Those SNPs covered by one read are ignored because of their likelihood containing no

The read heterozygous (1,1) has the probability of $p\left(1-p\right)\left(1+h-{h}^{2}\right)$. If we assume

*h*. Suppose a SNP has coverage of 2, with counts of two alleles as (2, 0), (1, 1), and (0, 2). To evaluate likelihood for each count, we are conditioning on the joint genotype to obtain a weighted sum of binomial likelihood.$\begin{array}{c}\hfill \hfill \\ \hfill Pr\left(\left(2,0\right)\mid p,h\right)=\hfill & \hfill 1\u22152p\left(1-p\right)\left(1-h+{h}^{2}\right)+{p}^{2}\hfill \\ \hfill \hfill \\ \hfill Pr\left(\left(1,1\right)\mid p,h\right)=\hfill & \hfill p\left(1-p\right)\left(1+h-{h}^{2}\right)\hfill \\ \hfill \hfill \\ \hfill Pr\left(\left(0,2\right)\mid p,h\right)=\hfill & \hfill 1\u22152p\left(1-p\right)\left(1-h+{h}^{2}\right)+{\left(1-p\right)}^{2}\hfill \\ \hfill \hfill \end{array}$

1

The read heterozygous (1,1) has the probability of $p\left(1-p\right)\left(1+h-{h}^{2}\right)$. If we assume

*p*is uniformly distributed and integrate out*p*, we have the expected proportion of read heterozygosity among SNPs covered by two reads as $\frac{1}{6}\left(1+h-{h}^{2}\right)$, which is a function of*h*.### Full model

Let is

*p*,*q*be the allele frequency of the*A*,*B*allele with*p*+*q*= 1, and*F*be the individual’s inbreeding coefficient. The standard model to account for inbreeding coefficient as a generalization of Hardy–Weinberg equilibrium^{15.}

$AA~{p}^{2}+pqF,AB~2pq\left(1-F\right),BB~{q}^{2}+pqF.$

2

The joint distribution of mother–fetus genotypes can be computed (Tables 2).

Table 2Probability of joint genotypes

AA | AB | BB | |
---|---|---|---|

AA | $p\left(p+q{F}_{1}\right)\left(p+q{F}_{2}\right)$ | $p\left(p+q{F}_{1}\right)q\left(1-{F}_{2}\right)$ | 0 |

AB | $pq\left(1-{F}_{1}\right)\left(p+q{F}_{2}\right)$ | $pq\left(1-{F}_{1}\right)\left(1-{F}_{2}\right)$ | $pq\left(1-{F}_{1}\right)\left(q+p{F}_{2}\right)$ |

BB | 0 | $q\left(q+p{F}_{1}\right)p\left(1-{F}_{2}\right)$ | $q\left(q+p{F}_{1}\right)\left(q+p{F}_{2}\right)$ |

Each row is the genotype of the mother and each column is the genotype of the fetus.

The only possible errors that can occur after bi-allelic ascertainment are A to B or B to A, and the sequencing errors are different from A to B (denote

*e*_{AB}) and from B to A (denote*e*_{BA}). Accounting for sequencing errors, we can write the allele frequency of the mixture for each instance of joint genotype (Table 3), from which the likelihood can be obtained by modeling reads covering a given locus as binomial sampling.Table 3Allele frequency in the mixture of maternal and fetal cell-free DNA

MM-FF | Prob | f_{A} |
---|---|---|

AA-AA | $p\left(p+q{F}_{1}\right)\left(p+q{F}_{2}\right)$ | $1-{e}_{AB}$ |

AB-AB | $pq\left(1-{F}_{1}\right)\left(1-{F}_{2}\right)$ | $1\u22152+1\u22152\left({e}_{BA}-{e}_{AB}\right)$ |

BB-BB | $q\left(q+p{F}_{1}\right)\left(q+p{F}_{2}\right)$ | e_{BA} |

AA-AB | $pq\left(p+q{F}_{1}\right)\left(1-{F}_{2}\right)$ | $\left(1-h\u22152\right)\left(1-{e}_{AB}\right)+h\u22152{e}_{BA}$ |

BB-AB | $pq\left(q+p{F}_{1}\right)\left(1-{F}_{2}\right)$ | $h\u22152+{e}_{AB}-h\u22152\left({e}_{AB}+{e}_{BA}\right)$ |

AB-AA | $pq\left(1-{F}_{1}\right)\left(p+q{F}_{2}\right)$ | $\left(1\u22152+h\u22152\right)\left(1-{e}_{BA}\right)+\left(1\u22152-h\u22152\right){e}_{AB}$ |

AB-BB | $pq\left(1-{F}_{1}\right)\left(q+p{F}_{2}\right)$ | $\left(1\u22152-h\u22152\right)\left(1-{e}_{BA}\right)+\left(1\u22152+h\u22152\right){e}_{AB}$ |

Fetal fraction (FF) is denoted by

*h*. The column marked “Prob” is the probability of the joint genotypes in the first column.*f*_{A}denotes frequency of A allele conditioning on joint genotypes, taking into account sequencing error.At SNP

Denote

*i*covered by more than one read, denote ${G}_{i}=\left({c}_{i}^{A},{c}_{i}^{B}\right)$ the counts of reference and alternative alleles. Let*x*be the frequencies of*A*; the binomial likelihood for ${G}_{i}$ is$B\left(x,{G}_{i}\right)=\frac{\left({c}_{i}^{A}+{c}_{i}^{B}\right)!}{{c}_{i}^{A}!{c}_{i}^{B}!}{x}^{{c}_{i}^{A}}{\left(1-x\right)}^{{c}_{i}^{B}}.$

3

Denote

*G*the collection of all ${G}_{i}$ for*i*in an index set of*S*. For each SNP*i*, we write out the binomial likelihood conditioning on the joint genotypes, weighted by the probability of the joint genotype, and sum together to obtain marginal likelihood${M}_{i}\left(h,{F}_{1},{F}_{2},e\right)=\sum _{j\in MMFF}P\left(j\right)B\left({f}_{A}\left(j\right),{G}_{i}\right).$

4

Multiplying the marginal likelihood for each SNP to obtain a composite likelihood

Finally, from nonpolymorphic markers covered by exactly one read, we can estimate the error spectrum used in the model (

$L\left(h;{F}_{1},{F}_{2},e\right)={\prod}_{i\in S}{M}_{i}\left(h,{F}_{1},{F}_{2},e\right).$

5

Finally, from nonpolymorphic markers covered by exactly one read, we can estimate the error spectrum used in the model (

*e*_{AB}and*e*_{BA}for A and B in four nucleotides). We may optimize Eq. 5 to obtain ĥ.## RESULTS

### Read heterozygosity informs FF

The rationale to infer FF from low depth sequencing data is detailed in the scientific facts and logic arguments in this section. First, even at a low coverage, there are sizable numbers of SNPs from the 1000 Genomes Project that are covered by more than one read. This can be deduced simply by assuming Poisson distribution at each locus and verified by real data (Supplementary Table S1). Second, at those SNPs covered by at least two reads, we can define read heterozygosity (and similarly read homozygosity). For example, at coverage of two, we may define (A, B) or (B, A) as read heterozygous, and (A, A) or (B, B) as read homozygous, where A and B are reference and alternative alleles respectively. Note that read heterozygosity, which involves sampling uncertainty, is different from the genotype heterozygosity. For example, at coverage of two, a genotype heterozygous AB has a 50% chance to produce read heterozygosity. Third, the percent of read heterozygosity is a function of FF (denoted by

*h*). Under the naive model with ideal assumptions (see “Materials and Methods”), the percent of SNPs that are read heterozygous among SNPs covered by two reads is $\frac{1}{6}\left(1+h-{h}^{2}\right)$. Thus*h*can be inferred. Note, however, the naive model only has theoretical value and performs poorly in real data analysis, as real data violate most of its assumptions.### Statistical model to infer FF

For a given biallelic SNP, its allele frequency largely determines how likely we observe read heterozygosity. We identified three additional factors that affect read heterozygosity: the inbreeding coefficient of the mother

*F*_{1}, the inbreeding coefficient of the fetus*F*_{2}, and the sequencing error rate*e*. Because a majority of the cfDNA comes from the mother,*F*_{1}determines the baseline of percent of read heterozygosity of a sample and is critical to the accuracy of FF inference. We therefore sequenced maternal wbcDNA and developed a statistical model to infer*F*_{1}(Supplementary Material and Methods).*F*_{2}, on the other hand, contributes little to the percent of read heterozygosity, particularly when the FF is small. In practice, we can safely ignore*F*_{2}by setting it to 0. Sequencing errors produce more read heterozygosity than read homozygosity. The effect can be modeled and the sequencing error rate can be estimated for an individual sample from the nonpolymorphic part of genomes covered by a single read. At each marker covered by more than one read, we can write the single-marker likelihood, which involves parameters (*h*,*F*_{1},*F*_{2}, and*e*) and data (counts of reference and alternative alleles and population allele frequencies). Multiplying all the single-marker likelihoods we obtain the composite likelihood, which is a function of*h*(with*F*_{1}and*e*inferred and plugged in and*F*_{2}set to 0). Maximizing the composite likelihood we obtain the maximum likelihood estimate of FF (see “Materials and Methods”).### Numerical results on FF inference

To demonstrate the effectiveness of our statistical method in inferring FF, we performed (in silico) simulations to mix reads from sequences of a mother and her son, conducted (in vitro) laboratory experiments to mix DNA of mothers and their male children, and reanalyzed real data from clinical NIPS samples with putative male fetus (in natura). These study designs allow us to compare fetal fractions estimated by our method against those estimated from the sex chromosomes. The details of FF estimates from sex chromosomes can be found in Supplementary Material and Methods, Supplementary Figs. S1 and S2, and reference.

In the in silico experiments, we first mix reads from mothers and their children at different FF (Supplementary Material and Methods) to examine how

*F*_{1}and*F*_{2}impact the inference of FFs. Fig. 1a, b demonstrates that*F*_{1}has a large effect on FF and*F*_{2}has a negligible effect. We then focused on the variations of the FF estimates due to sampling. The data were simulated from mixing reads of two samples (a mother–child pair) in the 1000 Genomes Project (Supplementary Material and Methods). Figure 1c plotted the mean (of 100 replicates) with bar of the sample standard deviation of the inferred FF (*y*-axis) versus the true FF (*x*-axis). The sample standard deviation is 0.006 for*FF*= 0.02 and 0.008 for*FF*= 0.20. Lastly, we investigated how different sequencing depths affect the uncertainty of the FF estimates. Here we assume maternal wbcDNA was sequenced at the same depth as the cfDNA. We sampled and mixed reads to simulate 100 NIPS samples with*FF*= 0.10 at different sequencing depths. Violin plots in Fig. 1d confirmed that the higher sequencing depth, the smaller the variation of the FF estimates, and it appears that 0.5× would provide a good balance between sequencing depth and accuracy. The same simulations were done for*FF*= 0.04 and*FF*= 0.06 and similar patterns of variation were observed (Supplementary Fig. S3).In the in vitro experiments, we used DNA from 11 mother–son pairs, mixed the DNA at different fetal proportions, sequenced the mixture at 0.5×, and inferred the FFs separately using sex chromosome dosages and read heterozygosity. Figure 2a compared the inferred FFs. The two sets of inferences are in high concordance to each other, with the coefficient of determination of ${R}^{2}=0.987,$ and maximum absolute deviation being 0.014. Comparing both sets of inferences against the truth, however, showed that both estimates have slight upward biases and larger variations for large FFs (Fig. 2b, c). Experimental variation in DNA quantity measurements for mixing experiment is likely to be the explanation (Supplementary Material and Methods).

In the in natura experiments, we used samples from patients who consented to participate in an ongoing study to improve methods of NIPS (Supplementary Material and Methods). The study was approved by the Institutional Review Board of Beijing Hospital and the DNA samples were de-identified. We first retrospectively selected 69 clinical samples who carry putative male fetus with FFs (obtained from sex chromosome dosages) ranging from 0.03 to 0.15 and we intentionally collected more samples of small FFs, resequenced their cfDNA and wbcDNA at 0.5×, and inferred FFs. Figure 3a compared FFs inferred from sex chromosomes with those inferred from the read heterozygosity. The overall pattern of this plot is highly similar to that of Fig. 2a, with exception of two outliers whose FFs inferred by read heterozygosity are about twice as large as those inferred by sex chromosomes (Fig. 3b). We hypothesize these two samples are female–male twins. Following the IRB protocol, we obtained anonymized patient data and confirmed that those two samples are indeed female–male twins and both pregnancies were results of in vitro fertilization (IVF). For samples with a single male fetuses, FFs inferred by our method are in high concordance with FFs inferred from sex chromosomes (coefficient of determination ${R}^{2}=0.972$) and the largest three absolute deviations are 0.017, 0.016, and 0.015.

To further evaluate our method in real data and examine the prevalence of IVF in our clinical samples, we randomly chose 443 clinical samples and performed the same data collection and statistical analysis as those 69 retrospect samples. Figure 4 demonstrated that our method works well. The 23 red dots (about 5% of 443) in Fig. 4 are IVF pregnancies with two embryos implanted. Along the blue line (

*y*= 2*x*) are all red dots, indicating female–male twin pregnancies. Those vertical clusters of dots along the*y*-axis are female fetuses, and the dots along the diagonal line are male fetuses. For samples with a single male fetus, FFs inferred by our method are in high concordance with FFs inferred from sex chromosomes (${R}^{2}=0.971$) and the largest three absolute deviations are 0.027, 0.019, and 0.013.### Incorporating priors on FFs to test for aneuploidy

The Because chromosomal dosage estimates have a heavier tail than the normal distribution (Supplementary Fig. S4), the By incorporating an informative null prior to account for the heavy tail, our Bayesian method can reduce false positives. With the knowledge of the FF, we can also incorporate the

*Z*-test compares chromosomal dosages of a sample against a set of euploid controls to detect fetal trisomy.^{17.}

- Chiu R.W.K.
- Chan K.C.A.
- Gao Y.
- Lau V.Y.M.
- Zheng W.
- Leung T.Y.
- et al.

Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma.

1:CAS:528:DC%2BD1MXhtlemuw%3D%3D

10.1073/pnas.0810641105

*Proc Natl Acad Sci U S A.*2008; 105: 20458-20463

*Z*-test tends to produce more false positives at a fixed threshold.^{16.}

^{16.}

*informative*alternative prior to compute a Bayes factor to test for the aneuploidy (Supplementary Material and Methods). Intuitively, the*Z*-test only examines how far a chromosomal dosage is away from the euploid dosage; with the knowledge of the FF, we can also examine how close a chromosomal dosage is to the putative dosage by assuming fetal trisomy. We may use a normal prior for FF under the alternative hypothesis to capture the uncertainty of the FF estimates, which critically depends on the sequencing depth (Supplementary Tables S2 and S3). Note that the ability to incorporate informative priors (both under the null and under the alternative) is a deciding advantage of the Bayesian approach over the*Z*-test method, which is oblivious to the alternative hypothesis by design.To compare powers between the

*Z*-test and our Bayesian method that incorporates informative priors, we simulated slightly overdispersed chromosomal dosages under the null, and used this to decide the cutoff value for both*Z*and Bayesian methods (Supplementary Material and Methods). Then for each target FF denoted by Θ, we simulated chromosomal dosages from*N*(Θ, σ) and computed*Z*scores and Bayes factors, where σ depends on sequencing depth and different chromosomes or regions have different σs (Supplementary Tables S2 and S3). The power can be estimated by the percent of test statistics surpassing their respective cutoff values. Table 4 demonstrates that the Bayesian method outperforms the*Z*-test, particularly for more difficult situations (smaller sequencing depths and smaller FFs). More power simulation results using different priors can be found in Supplementary Tables S4 and S5.Table 4Power comparison between Bayesian method and the

*Z*-test methodFF^{a} | Method | 0.1× | 0.2× | 0.5× | 1× | 2× | 5× |
---|---|---|---|---|---|---|---|

0.04 | BF | 0.8306 | 0.9845 | 0.9998 | 1.0000 | 1.0000 | 1.0000 |

0.04 | Z | 0.8125 | 0.9779 | 0.9993 | 1.0000 | 1.0000 | 1.0000 |

0.06 | BF | 0.9963 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

0.06 | Z | 0.9930 | 0.9999 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

0.08 | BF | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

0.08 | Z | 0.9999 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

0.12 | BF | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

0.12 | Z | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |

FF^{b} | Method | 0.1× | 0.2× | 0.5× | 1× | 2× | 5× |

0.04 | BF | 0.0602 | 0.0800 | 0.3620 | 0.6710 | 0.8830 | 0.9914 |

0.04 | Z | 0.0396 | 0.0529 | 0.3500 | 0.6550 | 0.8436 | 0.9832 |

0.06 | BF | 0.1797 | 0.2365 | 0.7398 | 0.9575 | 0.9973 | 1.0000 |

0.06 | Z | 0.1521 | 0.2240 | 0.7039 | 0.9321 | 0.9932 | 1.0000 |

0.08 | BF | 0.3622 | 0.4539 | 0.9391 | 0.9981 | 1.0000 | 1.0000 |

0.08 | Z | 0.3540 | 0.4535 | 0.9163 | 0.9952 | 0.9999 | 1.0000 |

0.12 | BF | 0.7092 | 0.8348 | 0.9994 | 1.0000 | 1.0000 | 1.0000 |

0.12 | Z | 0.6767 | 0.7973 | 0.9984 | 1.0000 | 1.0000 | 1.0000 |

The comparisons were done at different FFs (row) and different coverages (column). The recorded power is at the type I error of 0.001.

*BF*Bayesian method,

*Z*Z-test.

^{a}Simulated based on empirical data of chromosomes 13, 18, and 21.

^{b}Based on three 5-Mb regions each drawn from the three chromosomes.

## DISCUSSION

We developed a statistical method to infer FF in NIPS, extensively studied its performance, and demonstrated that incorporating the knowledge of FF improves statistical power of NIPS. Our method makes use of read heterozygosity of SNP markers on autosomes and can be applied to samples with either female or male fetuses. The use of read heterozygosity, however, makes our method sensitive to maternal inbreeding coefficient. We therefore propose to sequence maternal wbcDNA in addition to cfDNA. Sequencing maternal wbcDNA brings several benefits. First, it allows us to infer the maternal inbreeding coefficient to better estimate FF. Second, one can mix sequencing reads from wbcDNA and cfDNA to infer a diluted FF under a higher sequencing depth, and the diluted FF can be used to estimate FF after appropriate scaling. Third, although we didn’t pursue here, we would like to note that maternal wbcDNA sequencing can improve NIPS by providing individual-specific reference.

Sequencing wbcDNA in addition to cfDNA bears extra cost. Our method, however, can make do without sequencing wbcDNA. One approach is to plug in the population average inbreeding coefficient, and fit the full model to obtain fetal fraction (the plug-in method). The other is to jointly fit the full model to infer both maternal inbreeding coefficient

*F*_{1}and fetal fraction*h*. This can be done efficiently using an iterative method by fixing*F*_{1}to update*h*and then fixing*h*to update*F*_{1}until both converge (the iterative method). The plug-in method worked well for samples with modest inbreeding coefficient (say, between −0.03 and 0.03), but suffered significant bias for samples with extreme inbreeding coefficient. Such an example can be found in Supplementary Fig. S5 (left), in which the outlying sample has an inbreeding coefficient of 0.085. The iterative method, on the other hand, worked well for samples with extreme inbreeding coefficients, but produced a larger variation for samples with modest coefficients (Supplementary Fig. S5, middle). Naturally we combined the plug-in method and the iterative method via thresholding*F*_{1}(estimated from the iterative method), such that if $\u2223{F}_{1}\u2223\phantom{\rule{0.25em}{0ex}}>\phantom{\rule{0.25em}{0ex}}0.03$ we used the iterative method to estimate*h*and otherwise we used the plug-in method. Supplementary Fig. S5 (right) demonstrated the strength of the combined approach. More extensive numerical studies are warranted.Our method is designed for nonadmixed samples, but it can be extended to work with admixed samples such as African Americans and Mexicans. The trick is to select a subset of SNPs to make the inference. The natural weight goes into the likelihood calculation for each SNP is

*p*(1 −*p*) where*p*is the reference allele frequency. Taking African Americans as an example, we defined $\mathrm{logr}={\mathrm{log}}_{2}\frac{{p}_{1}\left(1-{p}_{1}\right)}{{p}_{2}\left(1-{p}_{2}\right)}$ for each SNP where*p*_{1}and*p*_{2}are reference allele frequencies of a SNP in European and African populations respectively. We select SNPs whose logr are in a small range, e.g., (−1, 1), to make inferences. Since these SNPs are less ancestry informative, our Hardy–Weinberg assumption is arguably applicable to these SNPs. Supplementary Fig. S6 showed such SNPs are plenty (more than 60%) among SNPs whose minor allele frequencies are >0.01 between any pair of ancestral populations. We simulated cfDNA from African American samples with different fetal fractions (Supplementary Material and Methods), and used selected SNPs to infer their fetal fractions, using either European allele frequencies or African allele frequencies. Supplementary Fig. S7 showed that this approach works well, particularly when the two sets of estimates were averaged.The knowledge of FF is critical to increase the efficacy of NIPS. One study suggests that samples with smaller FFs may have increased risk of fetal aneuploidy. Reporting a “no call" and referring the subject to an invasive test is an effective way to reduce false negatives among these samples. Alternatively, we can perform additional sequencing to increase the sequencing depth to a level that has sufficient power for a given FF. The similar adaptive design can be applied to screen microduplications and microdeletions, where the power tends to be much smaller than the whole chromosome trisomy screening (Table 4).

^{2.}

Because of the widespread practice of drastically reducing costs, NIPS is usually done at the raw sequencing depth of 0.1× in China. When the FF is as large as 6%, 0.1× appears to have sufficient power to screen for trisomy of whole chromosomes. When the FF is at 4%, however, the power is 83% at the type I error of 0.001, which is unsatisfactory given the high social and economic cost of false negatives (Table 4). The situation is much worse for using NIPS to screen for microdeletion and microduplication (NIPS+). Such screenings are done at the raw sequencing depth of 0.5× in China. At such a sequencing depth, our power simulation suggested that for a 5M region, the power is merely 36% at FF of 4%, and 74% at FF of 6% (Table 4). On the other hand, our data suggested that there are 2.2% clinical samples whose FF is <4% and 12.9% clinical samples whose FF is <6% (Supplementary Fig. S8). Therefore, it is imperative to increase sequencing depths for both NIPS and NIPS+ to guarantee the efficacy of the screening.

## Code Availability

Software to infer fetal fraction and the source code can be downloaded from https://haplotype.org/download/hetFF.tar.gz. It is free for academic use and can be licensed for commercial use.

## Ethics declarations

### Disclosure

Yongtao Guan is a consultant for Beijing USCI Medical Laboratory serving as its Chief Scientific Officer and owns its stocks and options. Beijing USCI Medical Laboratory contributed to fund the study, but played no role in designing experiment, analyzing data, and interpreting results.

## Acknowledgements

This work was supported by the National Key Research and Development Program of China under contract number 2016YFC100707 awarded to USCI Medical Diagnostic Laboratory Inc. The majority of the work was done when Y.G. was a faculty member at Baylor College of Medicine and was supported in part by the US Department of Agriculture/Agriculture Research Service under contract number 6250-51000-057.

### Author Contributions

These authors contributed equally: Minghao Dang, Hanli Xu

### Additional information

**Publisher’s note:**Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## References

- Noninvasive prenatal screening for fetal aneuploidy, 2016 update: a position statement of the American College of Medical Genetics and Genomics.1:CAS:528:DC%2BC28Xhs1aks77L10.1038/gim.2016.97
*Genet Med.*2016; 18: 1056-1065 - Cell-free DNA analysis for noninvasive examination of trisomy.1:CAS:528:DC%2BC2MXhtFOls7fJ10.1056/NEJMoa1407349
*N Engl J Med.*2015; 372: 1589-1597 - Quantitative analysis of fetal DNA in maternal plasma and serum: implications for noninvasive prenatal diagnosis.1:CAS:528:DyaK1cXlvVaiurc%3D10.1086/301800
*Am J Hum Genet.*1998; 62: 768-775 - Non-invasive prenatal assessment of trisomy 21 by multiplexed maternal plasma DNA sequencing: large scale validity study.10.1136/bmj.c7401
*BMJ.*2011; 342 - Size-based molecular diagnostics using plasma DNA for noninvasive prenatal testing.1:CAS:528:DC%2BC2cXotlyrtr8%3D10.1073/pnas.1406103111
*Proc Natl Acad Sci U S A.*2014; 111: 8583-8588 - Hypermethylated RASSF1A in maternal plasma: a universal fetal DNA marker that improves the reliability of noninvasive prenatal diagnosis.1:CAS:528:DC%2BD2sXhtw%3D%3D10.1373/clinchem.2006.074997
*Clin Chem.*2006; 52: 2211-2218 - Quantification of fetal DNA by use of methylation-based DNA discrimination.1:CAS:528:DC%2BC3cXht12is73O10.1373/clinchem.2010.146290
*Clin Chem.*2010; 56: 1627-1635 - Calculating the fetal fraction for noninvasive prenatal testing based on genome-wide nucleosome profiles.1:CAS:528:DC%2BC28XhtFSru7%2FP10.1002/pd.4816
*Prenat Diagn.*2016; 36: 614-621 - Determination of fetal DNA fraction from the plasma of pregnant women using sequence read counts.1:CAS:528:DC%2BC2MXht1yqt7%2FM10.1002/pd.4615
*Prenat Diagn.*2015; 35: 810-815 - Maternal plasma DNA sequencing reveals the genome-wide genetic and mutational profile of the fetus.10.1126/scitranslmed.3001720
*Sci Transl Med.*2010; 2 - Targeted massively parallel sequencing of maternal plasma DNA permits efficient and unbiased detection of fetal alleles.1:CAS:528:DC%2BC3MXitFSksr4%3D10.1373/clinchem.2010.154336
*Clin Chem.*2011; 57: 92-101 - FetalQuant(SD): accurate quantification of fetal DNA fraction by shallow-depth sequencing of maternal plasma DNA.10.1038/npjgenmed.2016.13
*NPJ Genom Med.*2016; 1: 16013 - A novel approach toward the challenge of accurately quantifying fetal DNA in maternal plasma.10.1002/pd.2656
*Prenat Diagn.*2010; 30: 1226-1229 - Non-invasive prenatal sequencing for multiple Mendelian monogenic disorders using circulating cell-free fetal DNA.1:CAS:528:DC%2BC1MXmtFKjsLg%3D10.1038/s41591-018-0334-x
*Nat Med.*2019; 25: 439-447 - Estimating inbreeding coefficients from NGS data: impact on genotype calling and allele frequency estimation.1:CAS:528:DC%2BC3sXhslKksbfP10.1101/gr.157388.113
*Genome Res.*2013; 23: 1852-1861 - Informative priors on fetal fraction increase power of the noninvasive prenatal screen.1:CAS:528:DC%2BC1cXhs1CgtrvE10.1038/gim.2017.186
*Genet Med.*2018; 20: 817-824 - Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma.1:CAS:528:DC%2BD1MXhtlemuw%3D%3D10.1073/pnas.0810641105
*Proc Natl Acad Sci U S A.*2008; 105: 20458-20463

## Article info

### Publication history

Accepted:
August 1,
2019

Received:
May 15,
2019

After further review of the manuscript, the author decided to withdraw the NIH grant acknowledgment as the manuscript was not directly related to the specific aims of the grant. The PDF and HTML versions of the Article have now been modified accordingly.### Identification

### Copyright

© 2020, The Author(s), under exclusive licence to the American College of Medical Genetics and Genomics

### User license

Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0) | How you can reuse

Elsevier's open access license policy

Creative Commons Attribution – NonCommercial – NoDerivs (CC BY-NC-ND 4.0)

## Permitted

### For non-commercial purposes:

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article (private use only, not for distribution)
- Reuse portions or extracts from the article in other works

## Not Permitted

- Sell or re-use for commercial purposes
- Distribute translations or adaptations of the article

Elsevier's open access license policy

### ScienceDirect

Access this article on ScienceDirect## Linked Article

- Correction: Inferring fetal fractions from read heterozygosity empowers the noninvasive prenatal screeningOpen Access
*Genetics in Medicine*Vol. 22Issue 2