1. What is the per-cycle fecundability for heterosexual couples? 12
2. What % of couples will eventually be able to have a child, with and without ART? 13
3. What % of couples will ever have infertility? 14
4. What % of infertility comes from male vs female factors? 15
5. How much of TFR reduction is driven by a delay in childbearing, as opposed to other factors, such as reduced desired number of offspring? 15
6. Why did I ignore same-sex couples? 16
1. What % of men will ever be infertile? 16
2. What % of male infertility is treatable by current ART/IVF techniques? 17
1. What % of women will ever be infertile? 18
Infertility over time / environment 18
Birth control and Partner Preference 22
Molecular mechanisms of female reproductive aging 38
Diagnosing female infertility 41
Predicting female infertility 42
Potential projects in reproductive aging 45
Insurance coverage for infertility treatment 53
A reason for future optimism 54
In-vitro maturation in clinical application 56
Mitochondrial replacement therapy 60
Embryonic stem cell nuclear transfer/ embryo editing 77
The road to causally sound embryo selection 81
Research directions to improving embryo selection 86
Incorporating rare variants into embryo selection 89
What is the risk of death in pregnancy? 93
Child/Infant Mortality Trends 94
2. Improving childcare tech 101
3. ⭐⭐🚀Pro-natalism is underfunded relative to other causes 102
4. Fertility education programs 102
1. ⭐⭐🚀Exposome add-on to NIH AllofUs biobank: 102
1. Improving hormone assays for monitoring IVF cycles and predicting ovarian reserve 103
Improving IVF research quality and methodology 103
Regulating IVF clinic add-ons more stringently 103
⭐ ⭐Funding pivotal IVF replication and methodologists 104
Pressure fertility journals to raise rigor 104
Large-scale embryo culture medium trial 104
Naming and shaming IVF add-ons with poor evidence 104
Restricting insurance coverage to IVF add-ons with evidence 105
⭐ Reproducibility project for fertility 105
1. ⭐Better prediction through improving biobanks 105
2. A project focused on predicting embryo implantation success. 105
a. Better prediction of embryo quality or uterine receptivity. 105
i. Predicting embryo quality through pooling data, genetic data, and ML video analysis of embryos in culture 105
b. Understanding uterine receptivity 106
Pressuring aging researchers to add reproductive measures as outcomes in trials 106
1. IVM in animals as a ceiling 106
2. ⭐🚀 Funding better surgical tools for follicular aspiration 106
1. Prizes for successful human IVM from primordial follicle 106
2. In-depth investigation of ovarian cryopreservation and autotransplantation 106
1. Legal Report on IVG in different jurisdictions 107
2. ⭐🚀🚀Ask experts for public goods ideas 107
3. Single-cell seq atlas for reproductive cells 107
4. ⭐🚀🚀 Making a well-characterized IPSC cell bank for research. 107
5. ⭐🚀Non-destructive ways to image and screen embryos and gametes. 108
6. ⭐⭐Lowering sequencing costs 108
1. Reproducibility Project for sperm selection 108
2. ⭐⭐Clarifying between-sperm correlation with individual differences 108
3. ⭐Raman microscopy combined with machine learning vs sperm sequencing 109
1. Obtaining regulatory clarity (overlaps with an idea from the IVG section) 109
Embryo Selection on polygenic traits 109
1. Improving computational methods 109
2. ⭐⭐🚀Funding more within-family studies in biobanks to improve between-embryo prediction and better understand causality 109
3. Improving biobank phenotyping 109
4. ⭐⭐Increasing diversity in biobanks 109
5. Incorporating rare variants into embryo selection 110
1. Identifying and characterizing better animal models for human pregnancy 110
2. Fund organoid development for better research models of key questions 110
3. Examining the COVID lockdown –> reduction in prematurity 110
1. ⭐🚀High-quality polling on public attitudes toward ART 110
2. ⭐Fund an animal breeding expert to answer “what low-hanging fruit are we missing in humans?” 111
3. Ongoing “replication fund” 111
Trying to articulate why fertility is important feels a bit like arguing that “suffering is bad”, or something similarly self-evidential, but the following stylized facts may be convincing to some:
This will result in offspring that have substantially lower genetic risk for most diseases, improving health in a durable fashion. Some diseases already have effective prevention and/or treatment already. However, for diseases that have no effective treatment or prevention, and dim prospects for short-term success (eg, Alzheimer’s disease, or many other neurodegenerative conditions, such as Huntington’s disease), this approach, which is largely agnostic to the underlying (extraordinarily complicated) molecular biology, may be our only short/medium-term hope.
As a way to organize this document, I’ve decided to proceed in a mostly chronological order, from the factors that influence reproductive choice (demography) to producing gametes and embryos, relevant technological interventions (IVF, IVM, IVG, and more), choosing between embryos (embryo selection), and pregnancy.
At the end of the document, Potential Opportunities, I gather all the funding opportunities I identified in my research, which are also scattered throughout in their relevant sections.
The following topics are covered in each section:
The TL;DR is that global fertility rates are converging.
US, Europe, much of Latin America, and most of Asia are below replacement, but with substantial heterogeneity in TFR. Parts of Africa, the Middle East, and Central Asia are still substantially above replacement but will probably converge soon.
A condensed summary on the decline in fertility rates and their causes, from Our World in Data: women’s empowerment, economic development, declines in religiosity, access to contraception, elite and media driven change in norms.
Figure 1.
The demographic transition is the transition from a demographic regime with high infant mortality + high number of children to low infant mortality + low numbers of children. During the intermediate phase, as infant mortality rates fall but the number of children being born is still high, population growth rates are very high. Birth rates have continued to fall, and in some countries, are low enough (in combination with lower or negative population momentum ) to cause population decline. Strong economic growth increases fertility rates somewhat, with the post-WWII baby boom in the US as the prototypical example (though aided by higher religiosity).
From Empty World, the demographic transition is robust to differences in contraception technology/access, religion[2], and ethnicity.
An important concept to understand in addition to total fertility rate is population momentum. For a period of time after TFR falls below replacement, a population can still grow because of the relatively young age structure of the population.
Another way to understand this: fertility rates (TFR) are normalized to total population size, but the same TFR has different consequences depending on the % of a population which is currently in reproductive age. If only 10% of your population is reproductive age versus 20%, the same TFR of 2, will produce different population growth rates. The % of a population that is made up of people of reproductive age is affected by past population growth rates and death rates. An especially young population with a TFR of 2 will grow for a time; an especially old population with a TFR of 2 will shrink. Thus, young populations “lock in” some growth even with below-replacement TFR and conversely, an old population, even with a TFR at replacement, will experience population decline. Populations will eventually stabilize at a new equilibrium if a replacement rate fertility is maintained for long enough.
There are a variety of UN population projections available here with different rates of birth rate convergence, life expectancies, and other changes in parameters.
Another important wrinkle in the demographic transition is that the timing of it has changed over time. While Europe underwent the demographic transition over many decades (starting in the 1800’s, with France as the earliest) and multiple generations, Iran and China halved their fertility rates in just 10 years. Per Empty Planet[3], early 2000’s UN population projections, in the "medium" scenario, assumed that countries would follow similar timing as previous countries' transitions and overestimated the resulting fertility. Empty Planet also argued that Chinese TFR may be lower than is reported. Large Chinese urban centers, like Shanghai and Beijing, have fertility rates from 1-1.3, and the reported ideal family size is around 1. The latter may be overstated due to social desirability bias pushing respondents to state lower fertility preferences than they actually desire, but is in any case much lower than the approximate ideal of 2 in Europe.
Figure 2.
Differences in how quickly different ethnic or religious groups transitioned from growing to stable population sizes have played a large role in ethnic conflict (eg, Catholics in Ireland), as Morland reports in Demographic Engineering[4]. Morland also states that ethnic conflict is more common in the 20th century than previous centuries (unsure how this was operationalized, so I’m not very confident in this fact.)
Immigration can make up for population shortfalls for some time, but since even high-fertility countries are generally converging to replacement fertility, this cannot continue indefinitely.
In addition, immigration restrictionist politics in some regions may prevent the high levels of immigration that would be needed to offset projected population decline. East Asian countries (eg, South Korea, Japan, China) have, so far, not accepted immigrants in large enough numbers to offset their expected fertility decline. Eastern Europe seems to be following a similar path. Depending on the electoral success and subsequent policies of anti-immigration parties in other parts of Europe and the US, immigration may slow or expand in those areas. Guest worker programmes, used in parts of the Middle East (like Dubai), are another possibility.
The goal of this section is to give a quantitative sketch of infertility.
For healthy young heterosexual couples, their chance of achieving pregnancy (not a live birth) after 1 year of trying without ART is probably about 85% and about 93% after 2 years.
Per-cycle fecundability is the probability of successfully achieving conception (pregnancy), not live birth, in a given menstrual cycle. A classic study found that in a cohort of couples trying to conceive without assisted reproductive technology (ART), the probability was 29% in the 1st menstrual cycle the couple began trying to conceive; the 2nd cycle per-fecundability rate was 29%, the 3rd was 16.8%, and subsequent were lower. Over a whole year, the cumulative probability of achieving pregnancy for a couple was 82%.
Live birth rates per cycle are probably somewhat lower, with other studies suggested that about 15% of pregnancies end in miscarriage in the 1st trimester, and more pregnancy losses occur later on. The overall proportion of conceptions that result in a live birth is not known with precision – here is a study examining these issues in more depth.
Ignoring those complications, and using “exposure to unprotected intercourse over time” instead of per-cycle fecundability, we get the following data, from the Speroff Clinical gynecologic endocrinology and infertility textbook[5]:
Figure 3.
These rates are approximately the same as the fecundability data above, though live birth rates are expected to be somewhat lower. My best guess is that these inconsistencies are driven by differences in study population, like age, fertility, etc.
The proportion of couples who achieve at least one live birth (a higher bar than a pregnancy) depends critically on the age at which they start attempting to conceive. Thus, as people delay childbirth, it is likely that the proportion of couples who will succeed in achieving pregnancy without ART will decline.
This simulation, using data derived from France before the Demographic Transition, implies that if couples begin trying to conceive at a young age (between ages 20-24), about 96% can expect to have at least one child:
Women who married at age 20–24 years between 1670 and 1789 had 7.0 children on average and 3.7% remained childless. Women who married at age 25–29 years had a mean of 5.7 children and 5.0% remained childless. Women who married at 30–34 years had a mean of 4.0 children and 8.2% remained childless.
I have not found a simulation addressing the specific question of “how much does increasing the average age at which couples start trying to conceive by x years affect per-couple fecundability”.
This simulation does address some of this question. Some of the assumptions:
The results without ART:
final proportions of women who deliver a live baby reach 94% for women starting at age 30 years, 86% for those starting age 35 years and 65% for those starting at age 40 years....
While with ART:
In both cases, ART only partly reduces the gap. If a woman postpones an attempt to become pregnant by 5 years, from age 30 to 35 years, her chances of conceiving will be reduced by 9% (91–82%) and ART will make up for only 4%. If she postpones from age 35 to 40 years, the chances will be reduced by a further 25% (82–57%) and ART will make up for only 7%. In other words, ART makes up for only half of the births lost by postponing an attempt to become pregnant from 30 to 35 years (4.2/8.5), and <30% of the births lost by postponing from 35 to 40 years (7.1/25.2)....More optimistic results might be reached by encouraging women aged 35–40 years to turn to ART faster than assumed in the model, after 3 and 2 years respectively. Note, however, that this delay includes the time to decide to visit a doctor plus the time to make the necessary medical investigations, plus the time to start ART. It does not mean that the woman is not doing anything before 2 or 3 years.
That simulation assumes a relatively long interval between childlessness and seeking ART, and a longer delay will reduce the number of children born through ART[6].
The net impact of ART, per those simulations is:
Our results show that the chance of giving birth to a live baby decreases between ages 30 and 35 years, and even more so between ages 35 and 40 years. In both cases, ART only partly reduces the gap. If a woman postpones an attempt to become pregnant by 5 years, from age 30 to 35 years, her chances of conceiving will be reduced by 9% (91–82%) and ART will make up for only 4%. If she postpones from age 35 to 40 years, the chances will be reduced by a further 25% (82–57%) and ART will make up for only 7%. In other words, ART makes up for only half of the births lost by postponing an attempt to become pregnant from 30 to 35 years (4.2/8.5), and <30% of the births lost by postponing from 35 to 40 years (7.1/25.2).
It is important to point out that in some cases of infertility, such as tubal infertility or many cases of male infertility, IVF +/- can effectively turn sterile couples into normal-fertility (for their age) couples. In the above paragraph, the decrease in fertility is largely (female) age-driven, which is only somewhat amenable to IVF.
The CDC estimates that about 12% of women from the ages of 15-49 have ever used infertility services in 2015-2019. Other sources, like this simulation, have found the following numbers for heterosexual couples who are unable to conceive[7] (which is not the same as having a live birth, and is a lower bar), ranging from 1% at age 25 to 5% at age 35 to 54% at age 45:
Figure 4.
The numbers above, since they derive from inability to achieve conception as opposed to achieving a live birth, are likely a conservative lower bound on infertility at different ages.
The data here are messy but Speroff estimates that male factors account for about 20% of infertility and play a role in another 20-40%, with estimates deriving from this study.
A simulation study that tried to keep other factors constant found the following for six European countries:
Our results suggest that by delay of first motherhood, the incidence of permanent involuntary childlessness rose from 2 to 3% in 1970/1985 to 6 to 7% in 2007 for the countries studied (Fig. 1). In other words, 3–4% of the population of women who wanted to have at least one child did not succeed in fulfilling this wish because they had postponed too long...In spite of the massive delay of parenthood, TFRs recovered in almost all European countries since the 1980s and the 1990s (Goldstein et al., 2009). This trend is also obvious for the six countries studied where recoveries varied from 0.08 in Austria to 0.41 in Sweden (Table I). These recoveries are mainly due to the fact that after a period of marked postponement during which temporarily less children were born and consequently TFRs dropped, many couples still tried to realize the number of children they had previously envisaged, after years of delay. Most of them succeeded in doing so but some waited too long as the data of Fig. 1 demonstrate. Apart from this so-called tempo or timing effect (Bongaarts and Feeney, 1998), part of the recovery is also explained by more structural determinants such as the level of economic stability and unemployment, the cultural background and also by policy measures aiming to have a more woman- and child-friendly society (Goldstein et al., 2009; ESHRE Capri Workshop Group, 2010; Mills et al., 2011). Apparently, the positive effect of TFR recoveries is much larger than the negative effect of postponement (Table I).
This effect works out to a TFR reduction of between 0.03 to 0.06.
An important caveat is that the data examined here are derived from heterosexual couples. I chose to focus on heterosexual couples due to time constraints and because same-sex couples make up a relatively small portion of total parents.
However, same-sex couples do use ART at relatively high rates, and some of the technology profiled later on seems especially attractive for them, such as in-vitro gametogenesis (which would enable cross-sex gamete production) and artificial wombs.
Per this paper, about 5% of men are sub fertile or infertile. Some possible causes are shown below, from Speroff:
A careful reader will note that the male infertility prevalence presented above exceeds the prevalence of couple infertility presented earlier, which seems logically impossible. This is because “infertility” and “subfertile” are often defined in different ways depending on the context. In the former case, couple infertility would more precisely be called “couple sterility”, while “male infertility” would include any delay in achieving a pregnancy as well.
As a broad generalization, mild cases of male subfertility characterized by low sperm counts are straightforwardly treatable by current ART techniques, ranging from gonadotropins for some hormonal causes of low sperm counts to surgeries to repair varicoceles and more. About 10-20% of infertile men have azoospermia, diagnosed when there is no sperm in the ejaculate, and generally considered the most severe form of male infertility.
Within this category, clinically, andrologists distinguish between obstructive and non-obstructive azoospermia. The former includes conditions that disrupt sperm transport and/or ejaculation, like cystic fibrosis, congenital absence of the vas deferens, or nerve damage that prevents ejaculation. Men with obstructive azoospermia can usually achieve fertilization with a variety of techniques that retrieve sperm directly, whether from the epididymis or the testicles directly. With the use of ICSI, which directly injects a sperm into an oocyte, even very small numbers of sperm (in one case, a single sperm) can be used successfully. In addition, men with a variety of sperm abnormalities that impair sperm motility can still achieve fertilization with ICSI.
Men with nonobstructive azoospermia (NOA), which are likely about 1% of the male population, can have a variety of causes for their infertility: cryptorchidism, mutations, chromosomal abnormalities, trauma or illness, radiation, chemotherapy, disorders of sexual development, and more (~50% have no identifiable cause[8]). In this group the problem is disrupted sperm development. These group of men have the worst outcomes– a cohort study found a success rate of 13.4%[9] (where outcome=live delivery) undergoing testicular sperm extraction followed by IVF and ICSI.
Putting all that together, perhaps 87% of men with NOA, which is about 1% of the male population, will not be treatable with current ART, giving a final figure of 0.87%.
In addition, sperm donation prices are much lower than egg donation costs. Sperm preservation to preserve fertility is substantially cheaper than egg freezing and usually much less invasive[10]. Since male factor infertility is also a smaller fraction of total infertility cases, it may be less impactful to focus on overall.
Note: some of this content overlaps with content in the female reproductive aging and diagnosing female infertility section.
There are several ways to answer this question. With data from women in rural France who were married between 1670 and 1789, assumed to be naturally fertile[11] this simulation showed that 3.7% of women married between ages 20-24 years remained childless, 5% of women married between ages 25-29 years, and 8.2% of women married between ages 30-34 years. Women with access to modern medical technology, all other factors being equal, should have substantially lower infertility rates, since tubal infertility and STD related infertility are now easily treatable, as well as a significant proportion of ovulatory disorders like hyperprolactinemia and PCOS.
If we want to put an upper bound on how important environmental causes of infertility could be, we need an estimate of how prevalent environmental-induced infertility is and a sense of how it has changed over time. To clarify, “infertility” as it is used in this section refers to an inability to achieve a live birth when it is desired. This is less precise than the medical definition, which includes a specified time period.
More bluntly (h/t Milan): how much of the problem of people having fewer kids than they want is because of infertility?
A speculative possibility is that widespread circadian rhythm derangement from less sun exposure and widespread artificial lighting may influence fertility. There is some evidence that the pineal gland affects fertility and humans exhibit seasonal variation in conception rates. However, since infertility rates overall have not increased substantially despite large disruptions to circadian rhythm over the last century, it is unlikely this can play a large role in fertility rates overall.
Another concerning trend is the change in pubertal timing. The age at which girls begin puberty has been decreasing since the beginning of the 20th century, and the precise causes(s) are not well-understood. So far, the consensus points to better nutritional status, higher rates of obesity, lower levels of physical activity (since high levels of physical activity can delay puberty), and perhaps endocrine disruptors. As far as I know, there is no evidence that this trend has caused infertility[12], and it should instead be viewed as evidence that we don't understand puberty and fertility very well, and perhaps make us more uncertain as a whole.
On the other hand, increases in obesity have probably reduced fertility somewhat as well. An ASRM practice bulletin summarizing the effects of obesity on reproduction focused on the relationship between female obesity and PCOS (which often causes anovulation), female obesity and pregnancy complications, and male obesity on sperm function. I have not seen quantitative estimates of the effects of obesity on infertility overall– eg, what % of infertility globally/nationally is caused by obesity?
Advances in fertility preservation have somewhat reduced the burden of infertility caused by cancer (in both men and women), though increased survival has very likely increased the proportion of the population with cancer-related infertility.
My best guess is that infertility rates, once adjusted for delays in reproduction, have not substantially increased and may have decreased. From 1982 to 2002, infertility rates appear to have declined in the US, which continued into 2015 (Speroff cites CDC data on this). Globally, infertility appears to have been stable, with some decline in infertility in low-income countries (primarily Sub-Saharan Africa and South Asia). A caveat with the above data is that it uses as its denominator “proportion of women of reproductive age (20–44 y) who are exposed to the risk of pregnancy...desire a child”. It seems that if women who are infertile also don’t desire a child, they could get systematically undercounted in those surveys, and thereby cause underestimation of infertility rates. However, I did not look very deeply into these studies, so this may not be an important risk.
Age-adjusted infertility rates may have decreased somewhat since the early 20th century, primarily due to better treatment of STDs[13], post-birth complications, and advancements in medical care for infertility. As an example of the potential impact of STI’s on infertility, consider the “infertility belt” in Central Africa, which suffers from poor treatment of STI’s, as well as poor treatment of post-birth complications (which can sometimes cause infertility).
Both STI’s and post-birth complications are treated better in high-income countries relative to historical norms, implying a reduction in infertility, as long as there hasn’t been a large rise in the prevalence of STI’s that might cancel out better treatment. However, I have not been able to find a review trying to answer this question, so I’m very unsure about this conclusion. A recent CDC study found that PID rates had decreased from 2006 to 2017, but I’m unsure of the long-term trajectory, eg, what PID rates were in the 1900’s.
A decline in smoking in the US has probably slightly reduced infertility as well, since it has a consistent correlation with infertility that seems at least partially causal. Speroff cites this study to argue that “up to 12% of female infertility could be related to smoking”.
A much-publicized meta-analysis from 2017 found a decline in sperm counts of 59.3% in men from Western countries, since 1973. One of the same authors has published papers arguing that animal data and some human epidemiology suggests that a common ingredient in plastics, phthalates, has anti-androgenic effects.
The opposite side of this debate, summarized in a NYTimes article argues that this apparent decline may be an artifact of changes in measurement technique or not all that important if it is real. Since there is not a strong relationship between sperm count and fertility above a certain threshold, it is unlikely that a moderate decline in sperm counts would substantially increase male infertility rates. An paper using simulations by a respected French demographer came to a somewhat similar conclusion, stating:
A decline in fecundability by 15% implied a decrease in fertility by 4%, and an increase in the proportion of couples eligible for infertility treatments by 73%. An increase in the mean age at initiation of first pregnancy attempt by 2.5 years from 25 years entailed a decrease by 5% in fertility and an increase by 32% in the proportion of couples eligible for infertility treatments...A relatively important decrease in fecundability and an increase by 2.5 years in age at first pregnancy attempt are likely to have only a limited impact on fertility. However, they may have a large impact on the proportion of involuntarily infertile couples, likely to resort to assisted reproduction techniques.
A more high-level reason to be skeptical that biological infertility per se is currently a large constraint on TFR is the following:
Isabel Juniewicz has written a more in-depth blog post on this topic, and comes to roughly similar conclusions that biological infertility per se is not yet an important factor in declining TFR.
I spoke to Daniel Goodwin, who is working on a project for managing small molecule pollution. He argued convincingly that on a societal level, we take too long to recognize the harmful effects of novel chemicals, but I was not convinced that biological infertility per se is impacted significantly by pollution. He pointed at evidence that testosterone levels and sperm counts are dropping. I am less skeptical that pollution may be having subtle effects on behavior, which may in turn be reducing fertility, but this seems especially difficult to study in humans– mice, of course, could be fed a diet rich in pollutants and checked for behavioral dysfunction.
Another approach is to look for multiple markers of environmental disruption, instead of a single measure like sperm count. This paper does that, and finds multiple examples of markers of sexual development dysfunction are all moving in the same direction, which makes me (and some of the people I spoke to) somewhat more willing to believe this idea than before.
While I am skeptical that environmental pollutants have a large impact on infertility, there are other benefits of better pollution management that may make it a smart idea overall. With that in mind, some infertility add-ons to a pollution-focused project may be wise. I would defer to Daniel Goodwin on project ideas for this. Some ideas:
My colleague Mackenzie Dion has a more extensive discussion of birth control following this section, so I will only sketch my impressions of the science here, summarized from Speroff.
There are many variations on hormonal birth control which vary in dosage, timing, and method of delivery. There are some known risks of hormonal birth control, such as increased clotting risk, but it seems generally safe, and likely has some positive health benefits over the longer term related to reduced risk of ovarian cancer. In my view, the most relevant aspect of hormonal birth control to the whitepaper is that there is some contradictory research on its effect on libido and sexual/partner preferences. Mackenzie disagreed with this take, citing research linking hormonal birth control to autoimmune disease, some changes in brain activity, increased antidepressant use, and links to vitamin/mineral deficiency.
Of course, hormonal birth control relates to TFR in a more prosaic way: reducing unwanted births, should, all else being equal, reduce fertility. This probably has some effect, but there are enough ways to control fertility that even countries with less access to birth control have undergone the demographic transition. At the extreme end, France underwent the demographic transition in the 1700’s, well before reliable contraceptive technology was available.
Conversely, widespread availability of contraceptives probably speeds up the demographic transition, at least per Empty Planet, and may reduce abortions (since they are sometimes used as a form of birth control), but likely does not have a strong effect on fertility overall.
The widespread availability of LARCs (long-acting reversible contraception), and their promotion in the US beginning in the 1990’s, may have contributed to changed fertility timing by reducing teen pregnancies, as this study on Colorado finds attributes a 5% reduction in teen pregnancies to them specifically. The reduction in teen pregnancies may have reduced fertility overall, or simply pushed some teen births into 20’s and 30’s, changing fertility timing.
A minor positive effect that hormonal birth control may have on fertility is that they may reduce rates of STD infections. STDs can cause infertility (primarily in women), and in some regions are a leading cause of infertility (eg, the “infertility belt” in Africa). Through this mechanism, hormonal birth control might reduce infertility.
For the reasons described above, it seems that the most plausible path through which hormonal birth control could affect TFR would be through changed behavior. For that reason, my colleague (Mackenzie Dion) has focused on the possible effects of birth control on partner preferences.
There has been persistent speculation about how hormonal birth control use may affect factors related to fertility such as altering partner preference and which could have social implications as far as contributing to increased divorce rates. The literature in animals has found some evidence that MHC similarity affects health outcomes.
In a CDC survey polling women from 2017-2019 ages 15-49, 65% were on some form of contraception with 14% taking the oral contraceptive pill, 10.4% using long-acting reversible contraceptives (ie IUDs, arm implants), 3.1% using Depo-Provera, contraceptive ring, or patch. This amounts to about 27% of women in the US taking some form of hormonal contraception. The data did not distinguish between women on hormonal and non-hormonal IUDs.
Much of the literature about hormonal birth control and partner preferences speculates whether taking hormonal birth control changes partner preferences. The mechanism often cited is that people are attracted to potential partners who have differing major histocompatibility complex (MHC) alleles and that the use of hormonal contraception is associated with preferring MHC-similar partners. MHC genes code for proteins on the surface of cells that bind to pathogens for T cells to then recognize.
Research on the association between MHC similarity and partner choice is conflicting. A 2020 metanalysis found no significant effect of MHC preference on mate selection whereas a 2017 metanalysis did. A recent genetic analysis of 3691 couples found that MHC similarity between couples did not differ from chance, and hormonal contraception use when the relationships were initiated also had no effect. A study that instructed women to smell t-shirts worn by men found a significant preference shift toward MHC similar men after initiating pill use that was not found in the control group. A preference for MHC dissimilar mates has been found in mice.
It seems possible that the effect of hormonal birth control on MHC preference shifts can be detected in a controlled research setting, but given the complex nature of human partner selection, this effect is then swamped by other variables in uncontrolled environments.
There appear to be reproductive advantages to MHC heterozygosity in mice. For example, MHC heterozygous mice were also found to have higher rates of reproductive success than MHC homozygous mice.
There is some suggestive evidence that MHC heterozygosity may produce immune benefits in mice and humans: for example, HIV-infected people with MHC heterozygosity had less viral replication than HIV-infected people with MHC homozygosity, and MHC heterozygote mice had higher survival rates and larger weights than MHC homozygotic mice when infected with multiple strains of Salmonella and Listeria.
There have not been any studies on whether administering hormonal contraception to mice causes a preference shift from MHC-dissimilar mates to MHC-similar mates. Given the conflicting evidence in humans and the demonstration of MHC-dissimilar mate preference in mice, studies along these lines may further elucidate the nature of this effect.
Although the results are mixed and often contradictory, hormonal contraception may have effects on female sexuality beyond MHC-preference including sexual function and desire. Many hormonal contraceptives, namely the combination pill (and inconsistently, the progestogen-only “minipill”), the patch, and NucaRing, hormonal IUDs (although mostly just during the first year), suppress ovulation which may effect sexual behavior and self-perception. Women feel more attractive and desirable when ovulating and men find their female partners more attractive and themselves when their partners are ovulating.
While there appears to be weak evidence that hormonal contraception effects self-perception of attractiveness and desirability, the direct link to fertility is not clear. One possible path by which perceived attractiveness could affect fertility might be frequency of intercourse or different levels of interest in having children. Hormonal birth control may subtly impact fertility. Given the various confounding factors and the previously weak effect sizes, further research will likely not be high-impact.
At various times, different countries have tried different pronatalist policies. From speaking with Lyman Stone of the Institute for Family Studies and Demographic Intelligence on this topic, within the extant range of policy interventions a change in fertility rates of 0.05 to 0.2 (where replacement TFR = 2.1) is about what is realistically achievable for fiscal pronatalism.
His guess for the most cost-effective fiscal intervention is a single large cash payment like a “baby bonus” that front-loads the incentive. His guess was that 100k-400k is the approximate cost for incentivizing an additional US birth, which is much cheaper than the statistical value of life used by US government agencies[14]. A meeting with Dean Spears and his team, who are starting an interdisciplinary economics and demography group at UT Austin, generally corroborated these claims.
Lyman emphasized that these estimates are derived from policies within the Overton Window(ideas considered acceptable by the mainstream population) –eg, nobody has ever tried paying women to have children at rates that are comparable to a regular job.
Lyman was pessimistic about new fertility technology having large effects on TFR. He estimated that in high-income countries, ~6-7% of all births involved ART (assisted reproductive technology) and that we are not close to the limits of natural fertility, even for older women. That is, biological infertility per se is not the main constraint behind below-replacement TFR. He also thought that if ART could fix reproductive aging, this might not boost TFR all that much, since it might just push child-rearing to later in life. Finally, he pointed to a paper on the “child penalty” to mother wages that suggests the work of parenting, not pregnancy per se, is the main “cost” of having a child. To the degree that childcare per se, and not pregnancy and childbirth, is the main cost of having a child, this suggests that new ART would not radically change the current decision calculus.
Lyman thought changes in culture/religion/norms could have much more powerful effects, though effecting cultural/norm change is easier said than done. He has a working paper (not yet published) arguing that a change in the Georgian Orthodox Church, raised the status of parenting, boosted TFR from 1.5 to 2.2 without substantial change in government spending. Another neat example along cultural lines: “inviting the Pope to do a speaking tour to all the Catholic churches in your country...”, presumably referring to this paper in Brazil[15].
He also pointed towards data showing that secularization in France caused a decline in fertility, as well as data showing that reforms in censorship laws in the UK had similar effects, as evidence that values –>fertility. As evidence for fertility preferences being important drivers, he pointed to evidence that stated fertility preferences at 18 are predictive of TFR at 40 and that such preferences are higher in high-fertility groups.
Along similar cultural lines, Lyman argued that some of the intense focus among development experts on population control and contraception was driven by “developmental idealism”. Basically, instead of focusing on hard-to-export/copy institutions like rule of law, private property, etc. development experts emphasized the demographic transition more than they should have, under the mistaken assumption that declining family size per se had a large causal influence on economic development[16]. He argues that this focus on exporting small family sizes may be somewhat responsible for low TFR in some countries, but it is unclear to me how much he attributes to this. For further reading he recommended work by Arland Thornton, William Easterly and the Anti-Political Machine and Legacies of Despotism and Development.
Lyman also used the example of breastfeeding rates rising over time as an example of values driving behavior more than fiscal incentives:
A hugely time intensive element for moms is breastfeeding, and yet breastfeeding rates have RISEN dramatically even as women's wages have risen! Breastfeeding time has risen even as pumps have become more common! Why??? Simple: because the last few decades have seen a change in how parents conceptualize health, chemicals, nature, and children, such that today parents see formula as inferior and breastfeeding as what "good parents" do. This, despite the fact that formula has gotten tons better over time, the health benefits of long-term breastfeeding are empirically shaky, and the opportunity cost of breastfeeding has risen dramatically! It's values all the way down. Values, values, values.
Another topic that Lyman brought up was the cost of childcare. High-fertility groups rely on surplus labor from grandparents and older siblings, which lowers the cost of childcare. He recommended I speak with Samuel Hammond of Niskanen Center and Patrick Brown at EPPC about childcare, its effects on fertility, ways to reduce the cost, etc. Another point he raised along these lines was that Utah’s laws on children are the most “free-range” of any state, and it also has the highest TFR.
This modeling paper on the effects of IVF on TFR, given certain reasonable assumptions, largely agrees with Lyman’s pessimism regarding IVF boosting national fertility. The assumptions:
We assume that all couples want two children. Thus, all couples who have achieved a first child try for a second one, except those who have two or three children from the first LBD...IVF delivery gives on average 1.26 children. The twin and triplet rates in natural pregnancies add to an average of 1.01 children per delivery...assume that after 1 year of non-conception, a diagnostic fertility work up is performed, by which couples with an absolute or severe cause of infertility, such as two-sided tubal blockage or very poor semen quality, are identified and treated by IVF without delay. [they also assume 100% uptake and access to IVF services]
The authors then model two different scenarios: requiring 1 year or 3 years of waiting before IVF services are offered (the latter of which was a reasonable stand-in for European healthcare provision of IVF). The results:
Figure 5.
That is, under the unrealistic optimistic assumption of 100% uptake of IVF by women who are having trouble conceiving, and assuming every couple wants 2 children, no IVF access versus IVF access after 1 year of trouble conceiving would result in a TFR boost of 0.11. More realistic uptake values of ~50% uptake of IVF would halve that difference, and further adjustments, such as some couples only desiring 1 child, would further reduce that difference. In addition, part of the advantage is driven by the higher average number of children in an IVF pregnancy, which has likely converged to natural pregnancy rates as single-embryo transfer (described later in this document) has become the norm.
Dean Spears and his team proposed some project ideas:
General source when something is unsourced:
Hormones are the signaling molecules used to coordinate biological activity on a large scale. In fertility, the relevant hormones are mostly steroid hormones and peptide hormones. Steroid hormones are variations on a three 6-carbon ring joined with a 5-carbon ring, and are sorted into 21-carbon ring, 19-carbon ring, and 18-carbon ring, with varying functional groups making up the rest of the variation. They are derived from cholesterol.
Because steroid hormones are not water-soluble, the majority of steroid hormones are carried in the blood by proteins. For sex steroid hormones, sex-hormone-binding globulin, which is mostly made in the liver, carries them. However, the free fraction of a hormone, which is not carried by carrier proteins, is the biologically active component.
Figure 6.
A normal human ovary can produce all three sex steroid classes: estrogens, progestins, androgens.
My general impression is that the main actions of sex steroids are well-understood: their structure, their receptor transduction pathways, their degradation pathways, etc. This knowledge has translated into a variety of synthetic hormone analogs/drugs with varied effects, eg, Tamoxifen, which has estrogenic effects on some tissues (endometrium, bone), and anti-estrogenic effects on breast tissue. There are estrogen analogs, SERMS, anti-estrogens, aromatase inhibitors, anti-progestins, and the equivalents for androgens (though SARMS are not clinically used). There is also an equivalent level of knowledge for the trophic (pituitary-produced hormones which regulate the actions of other hormone-producing tissues) hormones- eg, GnRH, FSH, LH, hCG-- and synthetic equivalents for all of them as well.
Measuring hormone levels is routine, though the currently used methods are not perfect. Some problems include autoantibodies causing hormone clumping and slightly different hormone isoforms having substantively different biological activity but showing up as the same on immunoassays. This is likely responsible for some diagnostic “fuzziness” and heterogeneity. A possible takeaway is that better hormone measurement techniques may yield unexpected fruit by improving diagnostic precision. Because anti-Müllerian hormone levels, aside from age, are the best predictor of ovarian reserve, improving hormone assays, if AMH levels are currently imperfectly measured, might improve ovarian reserve prediction, and is something I’m somewhat interested in.
Endocrinology (field of medicine focused on hormones) is sufficiently well understood that people with practically all varieties of hormone deficiency can be adequately sustained with synthetic hormones, though not perfectly[17]. There are secondary effects of hormones that are less well-understood. For example, vasopressin, whose main effect is regulating kidney function and blood pressure, may have some important CNS/behavioral effects. Similar caveats apply to CNS effects of androgens and estrogens/progestins, which are real but not well-understood in humans. The onset/timing of puberty is also not well-understood, though leptin and kisspeptin likely play a role, and rising obesity (which increases leptin levels) rates likely play a role in secular changes in pubertal timing.
Hormone effects can vary substantially with the timing and duration of dosing. The best example is GnRH, which in pulsatile fashion initiates puberty, stimulates sex hormone production/release, and ovulation, but if given continuously has the opposite effect, eg, delays puberty, shuts down sex hormone production, etc. My impression is that precise hormone timing is slightly less well-understood, but it is understood well enough to induce ovulation, safe and effective birth control, and increase uterine receptivity to implantation.
Gonads are the organs that produce germ cells (gametes) and sex hormones. The knowledge of how they develop embryonically comes in large part from various disorders of sexual development (DSDs)[18]. DSDs are relatively rare, with an estimated prevalence of ~ 1/4500, though a much more expansive definition of genital anomalies (including cryptorchidism and hypospadias) yields an estimate of 1/200. If the definition is widened further to include late-onset congenital adrenal hyperplasia (the majority of whom may be completely asymptomatic), Turner Syndrome, and Klinefelter, then estimates of up to 1.7% can be obtained, though the majority of those affected individuals are not at all sexually ambiguous.
Depending on the exact diagnosis, current assisted reproductive technology (ART) can sometimes assist people with DSD. My sense is that the “long tail” of specific reproductive disorders in both men and women will be very difficult to address without a technology that sidesteps/fixes gametogenesis wholesale like IVG. This is because they are very heterogeneous, both in outcomes and in causes, and a specific treatment would likely address only a small subset of fertility issues. Current ART can effectively address inability to carry a pregnancy (surrogacy), hormonal issues that make pregnancy difficult (ovulation induction), and somewhat address moderately low quantities of viable gametes (IVF + ICSI). Individuals who cannot make any viable gametes will also be helped by IVG.
An important distinction between male and female fertility is that newborn females start off with about 500 thousand to 2 million germ cells, which are constantly undergoing follicular atresia (a form of programmed cell death). At puberty some undergo ovulation (~400-500 mature in a lifetime). There is some debate about stem cells possibly generating new oocytes after birth, but my impression is this didn’t pan out as a research direction. Males are continuously producing new gametes beginning in puberty, though de-novo mutations increase with paternal age, and sperm counts and male fertility do decline somewhat with age.
Brief review of natural (in-vivo) gamete formation (gametogenesis) derived from a mix of work in mice and humans cells[19], paraphrased/copied from here:
The most fertility-relevant hormones are: GnRH, FSH, LH, HCG, prolactin, estrogen, progesterone, testosterone, Anti-Mullerian Hormone, activin, and inhibin.
Ovulation is a tightly organized hormonal sequence whose fundamentals (FSH/LH surge, rise of estrogen, etc.) are well-understood, as they form the basis for the medical induction of ovulation, as well as hormonal birth control. There are likely still some possible improvements to the precise timing of some medications (GnRH, FSH, LH), since there is substantial interindividual variability of timing and some changes in timing that occur with age. A review of tailoring FSH dose for IVF based on biomarkers did not show any benefit for live birth rate, though it might reduce the incidence of ovarian overstimulation.
Because FSH and LH have different glycoforms with different levels of biological activity, and the timing of the FSH and LH surge matters for IVF, better hormone assays might improve our understanding of ovulation. A schematic of the key hormonal cycle is shown below, with the difference between the two stemming from the moment that estrogen switches from inhibiting to stimulating FSH/LH production (“FSH LH surge”).
Schematic of pituitary and sex hormones before FSH/LH surge
Schematic of pituitary and sex hormones after FSH/LH surge
An important takeaway is that natural ovulation in humans generally results in a single dominant follicle, with the others undergoing atresia. Crudely put, IVF consists of giving enough hormonal support that more than one follicle becomes “dominant”, which can then be extracted in the egg retrieval procedure.
FSH increases the number of LH receptors and itself prepares follicles for further maturation. Follicles consist of a single oocyte and support (“granulosa”) cells. Progesterone augments pituitary secretion of LH and is responsible for the FSH response to GnRH. As progesterone levels keep increasing, this eventually feeds back and inhibits GnRH secretion. The number of follicles that grow each cycle depends on the “residual pool of inactive primordial follicles”.
The follicle that eventually gets recruited to undergo maturation/growth has actually begun recruitment 85 days before. The cohort that goes through follicular growth undergoes atresia unless they become the dominant cycle. Having high levels of FSH receptors allows the dominant follicle to survive the later drop in FSH. AMH inhibits primordial follicle growth but is also associated with higher ovarian reserve.
Lots of other local factors are involved in follicle maturation and survival, some of which are used in in-vitro maturation (IVM) experiments, like BMP, NGH, BDNF, NT-3/4/5, inhibin, activin, IGF-1. The process of follicle maturation requires angiogenesis of the dominant follicle. Some of these factors may not be totally necessary, since Laron Dwarf women, who don’t produce IGF-1, are still fertile. Oocytes depend on neighboring granulosa cells to feed them pyruvate, synthesize cholesterol, etc. and manage a lot of basic metabolic functions, and are given those through gap junctions. Genetic mutations in the growth factors, the gap junction proteins, etc. can all cause varying degrees/kinds of ovarian failure/infertility. The heterogeneity of possible genetic causes of female infertility represents a situation where a “side-stepping” approach like in-vitro gametogenesis seems likely to fix many causes at once.
In young women, each cohort of follicles that gets recruited is ~3-11 follicles per ovary. With high FSH levels, estrogen is the dominant follicular fluid substance, which is necessary for the follicles. Steroid hormone levels in the follicular fluid are orders of magnitude higher than in the blood, such that administration of estrogen into the blood would not influence local concentrations much. LH is important for the final stage of maturation because it simultaneously speeds up androgen production in the dominant follicle (which can then be converted to estrogens), but speeds up regression of other follicles.
The key to selection of 1 dominant follicle is that high FSH sensitivity within the dominant follicle (through local estrogen causing more FSH receptor production) combined with negative feedback from high levels of systemic estrogen causes all other follicles to lose gonadotropin support, because FSH levels drop. Decline in FSH causes decline in FSH-dependent aromatase activity, which leads to a decline in estrogen, which causes the androgen-estrogen balance to swing towards androgens, which leads to atresia. It seems like FSH is much more important to follicle maturation, since you can effectively eliminate LH activity in primates and just use FSH alone to simulate ovulation; the same thing has been done in gonadotrophin-deficient women. For more on ovulation, pages 363-367 of the Speroff textbook have a clear and more detailed explanation of this.
Female infertility is more often[20] the rate-limiting step in couple fertility than male infertility. Of all the causes of female infertility, reproductive aging is the most common and has the fewest available treatments. In addition, as people in high-income countries continue to delay child rearing, it will likely become more important going forward. For all those reasons, reproductive aging seems like an especially high-impact area to focus on. On the other hand, if interventions for reproductive aging were substantially more successful, it might lead to a compensatory rise in more delayed child training, which might reduce the net benefit in fertility terms.
I present a quantitative summary of age-related declines in female fertility below. Some important takeaways:
This high-quality source on the decline of female fertility with age is drawn from a dataset of natural fertility historical populations, who do not restrict their fertility. The graph below illustrates the ALB (age at last birth) for women in these populations, the age at which a woman is recorded to have had her last birth.
Figure 9.
There is substantial individual variation in this pattern and it is substantially heritable, with a moderate correlation between menopausal age (which follows after ALB by a few years) of mothers, daughters, and sisters. Though there have been some substantial changes in the environment that might affect ALB, such as better nutritional status and likely lower rates (and/or better treatment) of STI-related infertility, these data mostly match modern data well.
There are other lines of data showing that fertility declines with age in women, even after controlling for factors like reduced intercourse frequency and increased male partner age. These include fertility data on extant modern populations that avoid birth control (Hutterites), women trying to conceive with donor sperm (which eliminates the older male partner effect), and rates of egg retrieval and success with IVF cycles.
The pathophysiology of female reproductive aging is an active area of research, but likely involves several mechanisms. This article provides an overview:
Apart from follicle counts that decrease with age, a key observation that these mechanisms must explain is the rising rates of aneuploidy with age, which likely account for higher rates of miscarriage in older woman, as well as the higher rates of Trisomy 21 in children of older mothers. ‘
These mechanisms are all associated with aging in general and many of the proposed treatments, like rapamycin and dasatinib/quercetin, are being investigated for general anti-aging purposes. There is some promising animal data showing rapamycin can extend reproductive lifespan in mice, but no human data on most of these interventions, with CoenzymeQ as a minor exception. Many of these treatments, like rapamycin, would have to be trialed before conception, since they likely have some harmful effects on fetal development.
One potentially promising intervention is NAD+ repletion using NMN. There is very promising mouse data showing this can rescue female fertility in aged mice. However, there are no ongoing clinical trials on reproductive aging using NMN.
There are hormone changes that occur with age that are likely not causally linked to lower fertility, such as a rise in FSH and a decline in inhibins. A rise in FSH partially compensates for reduced FSH sensitivity.
Some more notes on timing: At the onset of puberty, from the 300k-500k remaining units, 400-500 end up undergoing ovulation. Follicular depletion speeds up with time. FSH rises and Inhibin-B, IGF-1, and AMH all decrease. The increase in FSH causes follicular growth to begin sooner during late luteal phase and then later when anovulation becomes more common.
The number of follicles that mature are dependent on fSH levels and sensitivity to FSH. Control of ovum maturation are very complex, per Speroff:
“Events that yield an ovum for fertilization....are the products of essentially every regulating mechanism in human biology...classic endocrine signals, autocrine and paracrine/intracrine regulation, neuronal input, and immune system contributions.”
Though the increase is much less dramatic compared to the increase in de-novo mutations with paternal age, oocytes from older mothers probably have more de-novo mutations on average, which likely has a very small negative effect on offspring. This is in addition to the large increase in chromosomal abnormalities seen in oocytes from older women.
Current approaches to treating reproductive aging that are in active clinical use do not address the underlying pathologies and instead focus on “increasing the density of gametes”-- eg, using IVF to increase the number of oocytes and sperm that meet– or using donor eggs. The latter approach is very effective. IVF does improve pregnancy rates in older women compared to natural reproduction or other ART (like IUI) in older women, but the cost is high and the outcomes are still far from ideal. In addition, after the early 40’s, many IVF centers will not offer IVF at all, since outcomes become even worse. Because the number of healthy follicles is the rate-limiting step, simply increasing the dose of IVF hormones does not help with diminishing fertility, and has higher rates of side effects.
As part of the normal variation in reproductive aging, some women have substantially lower fertility even by their mid 30’s. At the extreme, if a woman undergoes menopause before the age of 40, which occurs with a prevalence of ~1%, this is termed primary ovarian sufficiency. About 10% of women are menopausal by 45, which tends to follow reduced fertility by about 13 years. These women have an especially hard time achieving pregnancy without donor eggs.
One possible solution to the problem of reproductive aging is increasing the proportion of women who use egg retrieval and cryopreservation earlier on in life, but this is very limited by:
How reliable are our methods for determining a woman’s ovarian reserve, and hence, her likely fertility? The high-level summary is that doctors have a variety of biochemical tests, imaging modalities, and genetic testing that can accurately diagnose specific causes of female infertility or subfertility. I will cover a few below.
However, our methods for accurately determining a woman’s ovarian reserve are much more crude.
A brief note regarding sensitivity/specificity: any test that is imperfect will incorrectly call some normal people “abnormal” and incorrectly call some abnormal people “normal”. Using a test with the same characteristics in different situations will affect how correct it is. If you use a test in a population with a high prevalence of a disorder, it will correctly call people “abnormal” more often. Since ovarian reserve and fertility diminish with age, the accuracy of prediction of those two traits will change with age.
Tests are either biochemical or imaging. The important biochemical tests are FSH and AMC; the important imaging is antral follicle count.
Other more complex tests of ovarian function have been tried but the results have been similar to antral follicle count alone.
Better prediction of which women will have earlier-onset subfertility would be useful for advising earlier pregnancy in those women or offering fertility preservation. One approach that seems somewhat promising is developing polygenic risk scores for early menopause and related phenotypes. Some work on this has been done already.
This recent study developed a polygenic risk score (PRS) for Primary Ovarian Insufficiency that, in the top 1% of women, confers a 4.5x risk equivalent to canonical monogenic causes of POI, like FMR1, though FMR1 is about 2.5x more rare (occurring with a 1/250 prevalence). However, many of these women likely have a family history of early menopause, so it is unclear to me how much extra utility there is in current polygenic scores relative to family history alone. Larger sample sizes, more diverse cohorts, and deeper phenotyping (if possible– it's unclear how realistic obtaining Antral Follicle Counts for a few thousand women in a biobank would be...) would all likely improve these PRS scores.
There are also numerous specific conditions that can cause female infertility. The TL;DR is that the standard workup for female infertility will identify women with hormonal causes of infertility, such as hyperprolactinemia, hypothyroidism and PCOS and structural causes of infertility, such as an obstructed fallopian tube obstruction. In most cases, these can be treated relatively easily, a demonstration of the maturity and utility of current ART.
Figure 10.
When the cause is localized to the uterus, as a last resort, gestational surrogacy (and now, increasingly, uterine transplantation) is an (expensive) possibility. Some structural issues can be surgically repaired or bypassed with IVF. PCOS can be treated with ovulation induction (though they run a higher risk of ovarian hyperstimulation syndrome), while other hormonal issues (like hyperprolactinemia) are addressed differently, but as explained in the endocrinology section, can generally be treated well.
By contrast, Premature Ovarian Failure[22], whether through genetic causes (eg, FMR1), radiation or chemotherapy exposure, does not have good treatment options besides using donor eggs. Women without ovaries or without ovarian follicles, are about in the same situation, though the former also require hormonal support during pregnancy and for general health. There is a possibility that the residual follicles found in women with premature ovarian failure (from whatever cause) might eventually be useful with some future form of in-vitro maturation (IVM) technology.
There is at least 1 successful case study involving in-vitro maturation of immature oocytes and another approach involving intentional fragmentation of some ovarian follicles combined with drug treatment (Akt stimulation) in a woman with POF resulting in a live birth. While I think IVM might be somewhat promising for patients with some residual follicles, it seems very unlikely to work for patients who have undergone menopause already.
Another category of infertility is “unexplained infertility”, which overlaps with age-related infertility. There are good diagnostic tests available for ovulatory function, ovulatory reserve, uterine function, tubal patency. However, we lack good predictors of gamete function and implantation ability, so they likely explain a big chunk of unexplained infertility. Some possible causes then include: recurring genetic defects in gametes, endometrial function abnormalities.
There are some uterine abnormalities that reduce fertility somewhat, and there are now low-risk surgeries (hysteroscopic surgeries) that can fix those problems, so fixing them is recommended. Myomas have better evidence than anatomic uterine differences which may simply be normal variation in uterine shape. From Speroff:
In sum, the accumulated body of evidence indicates that submucous myomas reduce IVF success rates by approximately 70% and intramural myomas by approximately 20–40%, and subserosal myomas have no adverse impact on outcomes. Submucous myomas increase risk for miscarriage after successful IVF at least threefold and intramural myomas by more than half.
Younger women who want more kids are better candidates for surgical treatments of uterine issues since those surgeries are a 1-time cost and IVF is a per-cycle cost.
The treatment for unexplained infertility is similar to age-related infertility: “increase gamete density”; bring together more eggs and sperm and hope that a healthy embryo will eventually successfully implant. Per Speroff, infertility causes by numbers are:
The major causes of infertility include ovulatory dysfunction (20–40%), tubal and peritoneal pathology (30–40%), and male factors (30–40%); uterine pathology is relatively uncommon, and the remainder is largely unexplained.
Males experience reproductive aging. This consists mostly of decreasing sperm counts with age and increasing rates of de novo mutations, though this is somewhat complicated by heterogeneity among older men. Sperm quality as measured by motility and other phenotypes also falls with age.
Rates of live pregnancy do decrease somewhat with paternal age, even after controlling for maternal age, but the effect is small. Thus, male fertility does decline somewhat with age, but in contrast to female fertility, which faces a hard limit years before menopause, men have had biological offspring well into their 80’s.
The number of de novo mutations in sperm and offspring increases with paternal age, which likely has consequences on offspring phenotype. There is some evidence that men differ substantially on this trait, such that some men produce sperm with many more de novo mutations at similar ages.
In this section on IVF and IVG, I’ve deviated from a pure chronological sequence, since explaining IVG is best done with the context of IVF and IVM.
In-Vitro Fertilization is the fertilization of an oocyte outside of the body, as opposed to natural fertilization. The first step is retrieval of one or more mature oocytes (“eggs”) from the woman, followed by fertilization, culture in a laboratory setting, and subsequent implantation.
As additional context, a successful live birth necessitates:
IVF intervenes at steps #2, #3, and #4 by causing increasing maturation and decreased destruction of already extant gametes in females, physically retrieving those follicles, bringing them into direct physical contact with sperm, and then reimplanting the resulting zygote in a uterus.
A brief note on outcomes– the desired end goal of couples undergoing ART is a live birth of a healthy child. Because tracking live births involves around 9 months of waiting after an ART intervention, some ART studies do not report live birth rates. Instead, they may report related outcomes, such as pregnancy. Depending on when pregnancy is measured, this has a moderate or strong relationship to a live birth. Early on, pregnancies have a high rate of failure– pregnancies closer to delivery, however, are more likely to result in a live birth. I will make a note of which outcome is being reported when appropriate.
Egg retrieval takes place shortly before ovulation and is performed via transvaginal aspiration[23] with ultrasound guidance. The majority of IVF cycles involve exogenous hormone administration, termed “ovarian stimulation”, which increases the number of oocytes available for retrieval substantially, but has some risks, namely ovarian hyperstimulation syndrome and much higher rates of twin (or higher order) pregnancies, though this can be largely prevented with single-embryo transfer. Other IVF cycles are natural IVF cycles (also referred to as in-vitro maturation), which don’t use any exogenous ovarian stimulation, but have a lower per-cycle success rate, mostly because it results in much fewer oocytes being retrieved per cycle.
The IVF process may confer additional risk for preterm birth and some other perinatal conditions, though the data are not totally clear on how much of higher risk in IVF offspring is a selection effect (couples who will have worse perinatal outcomes using IVF) versus treatment effect. The data including sibling controls still show a mildly harmful effect of IVF, and seems solid.
Oocytes can be cryopreserved for later use, fertilized immediately and transferred “fresh”, or fertilized and the subsequent embryos frozen for later use. Fertilization can occur through either incubating the oocyte(s) with many sperm, and letting a more natural fertilization process occur, or using intracytoplasmic sperm injection, which can effectively treat many types of male infertility. Embryos are generally grown for either 3 (cleavage stage) or 5 (blastocyst stage) days before being transferred back to the woman for implantation.
A schematic of the IVF process is shown below.
Figure 11.
There are a number of ARTs available before full-fledged IVF. These involve optimizing the frequency and timing of insemination, exogenously stimulating ovulation, or inserting sperm directly into the uterus, as well as a variety of surgeries that are performed infrequently.
For in-vivo fertilization to occur, sperm must contact oocytes within a certain time frame. Since sperm have a lifespan of about 3-5 days and unfertilized oocytes a lifespan of about 12-24 hours, insemination must occur within 3-5 days before and ~ 1 day after ovulation, when the oocyte is released from the ovary. As Speroff notes, different methods of estimating ovulation timing will yield different results. Timing intercourse is a low-cost intervention, but a Cochrane Review meta-analysis found meager benefit to doing so.
For women whose sole fertility problem is irregular or absent ovulation, inducing ovulation is reasonably effective. Precisely quantifying the benefit of inducing ovulation over no treatment in anovulatory women is difficult, since ovulation induction is now considered standard of care. However, Speroff states that per-cycle fecundability rates of about 15-22% can be achieved with clomiphene, which is close to that achieved by normal fertile couples.
The rates of twin births, which are riskier for both mother and babies, are substantially higher in clomiphene-induced births than natural births, at rates of about 7-10% and 1.25% (though note the regional variation in twinning rates), respectively. Higher-order births are also more common, though they are still rare in absolute terms.
There are several different medications that can be used to induce ovulation: HCG, often used in combination with clomiphene. Letrozole, an aromatase inhibitor, is also used, particularly in women with PCOS, where it may result in higher live birth rates. FSH and LH, and some analogs with different pharmacokinetics are also available. Some protocols also use GnRH agonists or antagonists to suppress endogenous gonadotropin production.
Compared to IVF, the major downside of inducing ovulation alone is the high risk of multiple pregnancies. This can be mitigated (not eliminated[24]) with single embryo transfer in IVF, in which only one embryo at a time is transferred for fertilization.
Intrauterine insemination is the placement of sperm directly in the uterus and may be the oldest form of ART:
Before this, in 1770, John Hunter described the first case of human intravaginal insemination because of severe hypospadias. In the mid-1800s J. Marion Sims reported on 55 intravaginal inseminations. Only one pregnancy occurred..
Because of the low cost, it may also be the most widely used, particularly in lower-income countries. The evidence for IUI’s superiority over natural insemination is low-quality, per the WHO report above, but is a convenient option for same-sex female couples or single women who are using donor sperm. It is used for unexplained infertility and for some mild cases of male infertility.
Success rates for IVF (as measured by pregnancy rates) were around 6% per cycle in the early 1980’s (the first IVF cycles did not use any hormone stimulation) and reached about 30% by 1983 as hormone stimulation became routine and more oocytes were retrieved per cycle. Apart from refinement of the hormone protocol, advances in embryo cryopreservation, which led to improved rates of embryo survival after thawing, likely helped as well. Better understanding of appropriate culture media for embryos probably helped as well, and perhaps the move towards transferring embryos at day 5 instead of day 3 helped too.
The less invasive transvaginal ultrasound method of egg retrieval, as opposed to the laparoscopic approach, made IVF an outpatient procedure that could be performed in an office setting in about 15 minutes of procedure time, instead of a 1-2 hour operation in a hospital requiring anesthesia. A more recent change has been a move towards single embryo transfers, in which only one embryo is transferred for implantation at a time–this brings the risk of twin (or higher-order) pregnancies to natural conception levels. At least in the US, single embryo transfer has become the norm, moving from 18% in 2010 to 77% in 2019, per CDC reporting (derived from data reported by all IVF clinics in the US):
Figure 12.
IVF success rates globally as measured by live birth per cycle appear to have peaked around 2009 at 30%, and declined to about 22% by 2016. There are several likely possibilities :
Figure 13.
On the other hand, data from Sweden shows an improvement in cumulative live birth rate per oocyte[25] retrieved from 2007 to 2017, coinciding with increasing use of newer methods of embryo freezing (vitirification instead of slow freezing) and prolonged embryo culture methods. CDC (USA) data appears to show an improvement in IVF success rates from 2010 to 2019, as measured by % of ART cycles that result in live-birth deliveries:
Figure 14.
Some US data appears to support some decline in IVF success rates: from a CDC report, Figure 6, as live-birth deliveries has not increased as much as the number of ART cycles:
Figure 15.
On the other hand, this might be better explained by more banking cycles which have not yet translated into live births.
I have not investigated this question in enough depth to be confident, but my guess is that if the decrease in IVF success is real, which I’m not sure about, changes in patient population are the most important factor, followed by widespread use of PGT-A. There may be better quality data that can firmly answer this question, but I could not find a definitive answer.
Regarding ART in general, there was an increase in the use of infertility services among women aged 15-44 years from 9% in 1982 to 15% in 1995, which declined to 12% in 2002, increased to 16.8% in 2010 and was 14.3% in 2019. This was largely because of delays in childrearing and an increase in availability of ART services. However, this includes all ART services, a broader category than IVF. In addition, from 2000 to 2014, the mean age of first-time mothers increased 1.4 years from 24.9 to 26.3.
For IVF in particular in 2010, in the US, infants born through IVF accounted for about 1.5% of all infants born that year, with considerable between-state variation, from “0.2% in Puerto Rico to 4.7% in Massachusetts”, which reached a nationwide average of 2.1% in 2019, with 0.5% in Puerto Rico and 5.5% in Massachusetts . Europe in 2010, had generally higher rates than the US, ranging from 0.6% in Moldova to 5.9% in Denmark. An update in 2017 (the latest available) found similar rates in Europe, but Spain had reached 7.9% of all infants born that year being born through IVF.
Since at least 1997, and until 2010 there has been a year-over-year increase in the number of infants in Europe born through IVF. In Israel, infants born through IVF went from 2.5% of all infants in 1997 to 4.1% in 2010.
I am very unsure what the long-term proportion of infants born through IVF will be. Spain’s rate of ~8% in 2017 may have been the highest globally (though Denmark reached 10% per a 2018 news article), and may have increased since, though more recent systematic data do not appear to be available. Several European countries are only 1-2% from Spain’s rate.
The most important factor that will likely increase the proportion of infants born through IVF is age at first birth continuing to increase. In the US, if more states adopt generous insurance coverage policies for IVF, that would also likely increase IVF rates.
All things being equal, substantial advances in IVF success rates and reductions in cost would also increase IVF uptake. More speculatively, some IVF-addons[26], like embryo selection against polygenic diseases, may make IVF a more attractive option than natural conception even for couples that do not have infertility issues. For example, if IVF + embryo selection can reduce the risk of certain currently unpreventable diseases, such as Alzheimer’s disease or schizophrenia, some parents may choose to undergo IVF for the purpose of using embryo selection.
A drug that could reduce the burden of reproductive aging, particularly for women, might reduce IVF use, as it would reduce the number of women who underwent IVF for age-related infertility. A technology like in-vitro gametogenesis, if it was cost-competitive with IVF, would also likely reduce IVF use, as it would avoid the risks of exogenous hormone administration involved with IVF.
One factor that likely accounts for large cross-national differences in proportions of infants born through IVF is insurance coverage and affordability.
About 11 percent of women and 9 percent of men experience difficulty with fertility[27]. It’s estimated that 85 percent of IVF expenses are paid out of pocket. Only 17 states legally require insurers to cover or offer coverage for infertility diagnosis and treatment though to varying degrees. According to proprietary data from FertilityIQ, a digital database for information about fertility benefits and treatments, most patients spend $40,000-60,000 on IVF, the most common assisted reproductive technology (ART), and 56 percent of IVF patients have no insurance coverage for their treatment.
Given the high costs, if more states were to require insurers to cover fertility treatments, it seems plausible that their use would increase somewhat. Massachusetts requires insurers to cover ART and has the highest percent of babies born from ART in the US, reaching levels similar to those of Denmark– though 17 other states also require it, and I have not seen data systematically comparing coverage to IVF uptake rates.
A reason for optimism regarding further improvements to the IVF process comes from the following observation: IVF research is generally underpowered for the effects it purports to detect. It is likely that some small and medium-size effect improvements to the IVF protocol have still not been identified. Some current add-ons to the IVF process, which add to the cost, are likely superfluous, or possibly even mildly harmful, so removing them could reduce IVF costs somewhat.
In conversation with Steve Hsu, he noted that many trials that clinicians use to justify add-ons are quite small and/or low-quality and recommended a “reproducibility center” that would focus on reliably improving per-cycle success rates. He is also optimistic that improvements in embryo screening can improve per-embryo implantation rates, since current aneuploidy screening methods are very crude and have high rates of technical failure (which clinical labs report as “aneuploidy”).
I spoke with Jack Wilkinson, a statistician who works on ART methodology, who echoed these concerns. He also raised other issues:
A lucid summary of these methodological concerns, as well as possible solutions, can be found here. Apart from the usual reasons for a “replication crisis” in a scientific field, Jack attributed this to a few things:
Over email and in conversation, Jack and I touched on ways to address these problems. All credit to him (and blame to me):
Education
Regulatory
Funding
Journals
Misc.
Steve Hsu proposed a project focused on coordinating many IVF centers to try different tweaks to the IVF protocol. Jack raised a similar idea, focusing on embryo culture mediums, which vary between centers and have not been rigorously evaluated. Specifically, he proposed cluster RCTs, randomizing different centers to receive different culture mediums, which reduces the administrative burden of running trials for clinicians.
Some more speculative ideas I’m interested in:
A more incremental type of in-vitro maturation is already in clinical use, though less so in America. It is not capable of maturing implantation-competent oocytes from primordial follicles (eg, from slices of ovarian tissue), but can take immature oocytes that have not been primed by exposure of either high-dose exogenous LH or HCG and successfully result in live births. Practically speaking, the current protocol for IVM usually involves some exposure to either HCG or FSH, but only once, resulting in less exposure to exogenous hormones.
This is a useful modality for women who are either more likely to face side-effects from traditional IVF cycles (eg, women with PCOS who have high rates of ovarian hyperstimulation syndrome) or who require fertility preservation very urgently (women with some cancers) and can’t undergo a full-length IVF cycle. It is also more affordable per cycle since it uses less medication. However, it results in fewer embryos per cycle, which makes downstream ART that relies on embryo numbers (embryo selection and editing) less effective. It has a slightly lower or similar implantation rate and a higher miscarraige rate. For children with cancer, IVF is not an option, as they have not begun puberty. IVM offers the possibility of fertility for them as well, though this work is very preliminary, and as of 2020, no patients with pediatric cancers had live offspring through this method. This appears to be the result of lower oocyte quality in pediatric ovaries.
Figure 16.
A recent cost-effectiveness analysis of IVM vs IVF found “IVM is more cost-effective than IVF at a willingness-to-pay up to €18000 for an additional child. Above €18000 IVF became more cost-effective”, a finding driven by the lower cost of IVM but the higher effectiveness of IVF, as well as a lower rate of side effects for IVM.
A recent non-inferiority randomized trial comparing IVM to IVF in a select patient population[30] found a lower cumulative pregnancy rate at 12 months for IVM vs IVF, driven by a lower number of embryos extracted with IVM. It is important to note that multiple rounds of IVM would of course increase the number of embryos obtained, and likely increase the live pregnancy rate but would erode the cost-effectiveness and convenience of IVM (IVM involves fewer hormone injections) vs IVF.
Figure 17.
Cumulative Pregnancy rate since randomization, IVM vs IVF, from here.
Per conversation with an expert in this field (Dr. Robert Gilchrist), he thought that substantially fewer resources had been invested in IVM vs traditional IVF, making naive comparisons unfair– that is, IVF is a relatively more mature technology that has been more optimized than IVM. Hypothetically, if IVM’s success rates were equivalent to IVF, the lower cost, reduced rate of side effects, and reduced number of injections would make it clearly superior. I did not investigate IVM in sufficient depth to be confident in this argument, but it does seem plausible. Before supporting IVM research, I would recommend research on IVM success rates in animal breeding to see if it has achieved success rates comparable to IVF in that setting.
Since competence in IVM is effectively required for IVG, I think better basic science understanding of IVM-relevant topics would help IVG as well.
A more speculative application of IVM is maturing oocytes derived from slices of ovarian tissue, a technique also referred to as “ ovarian tissue oocyte IVM or OTO-IVM”, as well as “in-vitro culture”. This ties into the topic of ovarian transplantation, which I cover below with the assistance of my colleague Mackenzie Dion.
To summarize, one advantage of OTO-IVM from ovarian tissue is that avoids the possibility of transplanting back ovarian tissue which may harbor cancer[31]. In addition, ovarian transplantation is far from a routine procedure. Other theoretical advantages of IVM over IVF include:
With reliable OTO in-vitro maturation, an oophorectomy (a simple and low-risk procedure) could be performed immediately, the ovary cryopreserved, and then oocytes matured in-vitro. With reliable autotransplantation, IVM would not be required.
This kind of IVM is explored in this review. Per this review, the number of successful live births from OTO-IVM documented in the literature is eight. Overall the authors view the results as favorable:
“3% live birth rate per oocyte. This is a promising figure when compared to 4.5–6.7% LBR per vitrified oocyte reported in oocyte donation programmes”
By contrast, the number of live births from ovarian tissue transplantation is at least 130 as of 2019. OTO-IVM is a very early-stage ART, with currently very niche applications. I did not investigate the cost of OTO-IVM, but given its experimental nature, it likely requires expertise found only in a handful of fertility centers, limiting its short-term spread, and is probably very expensive. I lack enough wet-lab expertise to have a strong sense of how promising this line of research is overall, but especially given its overlap with IVG, I think it should be funded. A PHD student in a relevant discipline (working in IVG) agreed that progress in IVM would help IVG as well. He also thought that the difficulty in obtaining human tissue for experimentation, relative to mouse tissue, was the biggest barrier to faster progress in IVM, followed by the faster development timeline in mice.
Ovarian Cryopreservation
At least 75% of follicles lost during ovarian tissue autotransplantation are lost (likely due to lack of oxygen, “ischemia”) following the transplantation procedure as the graft revascularizes (forms blood vessels) and regains homeostasis. It seems possible that improving post-transplantation procedures would improve follicle preservation rates and reduce the amount of tissue lost during autotransplantation. There may be some pharmaceutical treatments that could reduce ischemic damage. A 2021 study administered N-acetylcysteine (NAC) after human ovary transplantation into immunodeficient mice and found better outcomes relative to controls.
Another potential area of improvement is the cryopreservation method. Slow freezing is currently the dominant method of ovarian tissue cryopreservation. A meta-analysis compared vitrification (ice-free cryopreservation, also sometimes called “glassification”) and slow freezing , the two main methods of ovary cryopreservation, and did not find significant differences in follicle preservation. The meta-analysis did report that vitrified tissue had less DNA damage and better preserved stromal cells. The authors suggest that this may be indicative of vitrification being a better method for preserving fertility but that their findings need to be validated in studies “with healthy live births as the primary endpoint”, instead of laboratory-measured endpoints.
As briefly mentioned in the reproductive aging section, one hypothesized mechanism for the decline in oocyte quality with age is mitochondrial dysfunction driven by mutations in mitochondrial genomes. This article makes the argument at length, and summarizing the key points of evidence in favor:
The authors subjected these ideas to testing, and found that, as expected, there was strong purifying selection against mutations in mitochondria beginning with fertilization.
An intriguing possibility is treating mitochondrial dysfunction through replacing mitochondria with mitochondrial replacement therapy, in which nuclear DNA from either an embryo or egg is extracted and placed into a donor cytoplast (oocytes that have had the nucleus removed) containing wild-type mitochondria.
This method has been used in some jurisdictions for treating mitochondrial disease. While legal in the UK and several other jurisdictions for this purpose, the FDA is currently barred from considering applications for MRT, so it is effectively illegal in the US. In 2019, the Senate came close to allowing the FDA to consider applications for MRT, but reversed course at the last minute, so it is still effectively illegal to run clinical trials involving MRT in embryos in the US. This Vox article provides an accessible summary of the regulatory issues up to 2018.
In the late 1990’s, a related technique involving a small-volume injection of donor cytoplasm into patient oocytes was trialed on patients who had experienced repeated multiple implantation failures. Of the 7 patients none had a live birth, though 4/30 embryos resulted in successful implantation (with later miscarriage). The authors frame this as a preliminary sign of success. However, the small sample size and lack of successful live birth, do not strike me as especially promising evidence. There were a few successes with a similar approach (1, 2) with later work in the 2010’s using more sophisticated methods.
A promising case study along these lines published in 2016 treated a woman who had previously had two IVF cycles in which all of her embryos arrested at an early (two-cell) stage. In the third IVF cycle, her embryos’ pronuclei were transferred to enucleated donor oocytes, subsequently producing 5 apparently health embryos for transfer, and resulting in a pregnancy– though tragically, the three embryos that successfully implanted failed to produce a live birth. The mitochondrial DNA of the embryos matched the donor mtiochondrial DNA, implying absent or low levels of parental mitochondria. A later case study published in 2017 by the lead author (Dr. John Zhang) on the previous paper used MRT to prevent the transmission of a mitochondrial disease, resulting in the birth of an apparently healthy boy. A Ukrainian clinic that made headlines in 2018 for producing a 3-parent baby later presented data from 30 women showing that MRT did not improve fertility in older women– with the caveat that the study was small-scale and tried 5 different methods of MRT, implying that an optimal technique has not yet been developed.
Researchers at OSHU’s Center for Embryonic Cell and Gene Therapy, led by Shoukhrat Mitalipov and Dr. Paula Amato, have recently published some promising work on rhesus macaques (nonhuman primates) demonstrating that their MRT technique appears effective and safe. The same center is also pursuing an IVG method that induces haploidy in somatic cells through transplanted somatic nuclei in mature oocytes.
With the caveat that I did not explore this issue in-depth, my overall impression is that the significant uncertainties regarding the contribution of mitochondria to aging remain unanswered, but that this area of research seems promising. My key uncertainties, in order of importance:
Figure 18.
From here.
In-vitro gametogenesis is the production of gametes from somatic cells through laboratory techniques instead of natural developmental processes. Such a technology would address many different causes of infertility at once and also synergize extremely well with other reproductive technologies like embryo editing and embryo selection. It would also enable cross-sex gamete production for same-sex couples. IVG has been achieved in mice and has resulted in live (apparently) healthy offspring with the ability to have offspring naturally themselves. Several different academic labs are focused on achieving human IVG, as well as at least 3 different startups as of 2022: Conception, Gameto, and IvyNatal, and another, Renewal Bio, appears to be aiming at a similar goal.
The idea behind IVG is to take a somatic cell, transform it into an induced pluripotent stem cell (iPSC), transform that into a primordial germ cell-like cell (PGCLC) and then differentiate it in-vitro into the desired germ cell. IVG is usually divided into the production of PGCs and the subsequent differentiation of PGCs into sex-specific gametes (oocytes and spermatozoa). In mice, oocyte IVG is currently more advanced than sperm IVG, since the latter still seems to require an in-vivo step, such as transplantation into a mouse testis, for full maturation.
A reason to be optimistic about IVG being successful eventually is that there are at least two promising techniques being used: one involves genetic manipulation of transcription factors; the other only various signaling factors and chemical inhibitors being added to a culture medium (eg, this paper generating functional oocytes from adult granulosa cells). IVG should be distinguished from in-vitro maturation, which is concerned with maturing primordial follicles from ovaries into fertilization-competent oocytes.
In addition to the human ART applications, IVG would accelerate animal breeding efforts, especially if full iterated embryo selection could be achieved in-vitro. It would also aid conservation efforts for endangered animals, and more speculatively deextinction efforts . Thus, it seems likely that even if human IVG is more difficult than expected, such that current efforts fail, there will be substantial scientific and economic interest in IVG technology as a whole.
IVG in mice has resulted in the production of healthy fertile offspring, who have themselves had healthy offspring. Both oocytes and sperm have been successfully produced from somatic cells, though sperm maturation (as far as I know) required transplantation into an in-vivo testis. Mouse oocytes have been successfully produced in combination with fetal ovarian somatic cells, which are required for proper maturation of oocytes; recently, however, mice fetal ovarian somatic cells have been successfully generated from somatic cells, theoretically removing the need for any fetal ovarian tissue at all for mice IVG. A recent preprint accomplished something similar in human cells, producing granulosa-like cells (which surround oocytes in-vivo) from human iPSCs, though this technique did not succeed in advancing primordial germ cells into later stages of maturation. Another recent preprint by the same group[32] developed a faster method to produce human oocyte-like/oogonia-like (pre-meiotic) cells from iPSC’s and related cells. There are also some related reproductive “tricks” that have succeeded recently, such as inducing parthenogenesis in a mammal.
Brief review of in-vivo IVG in mice, paraphrased/copied from here:
Figure 19.
Work in the 2010’s used embryonic ovarian somatic cells to transform PGCLC’s into oocytes, which resulted in apparently healthy offspring in mice. An important advance, published in 2021, was the generation of fetal ovarian somatic cells from embryonic stem cells, potentially eliminating the need for using fetal ovarian somatic tissue. The advantage is that the primordial germ cells generated don’t have to be placed within embryonic mouse tissue to properly differentiate, because equivalent gonadal somatic tissue, which is needed to stimulate proper differentiation of primordial germ cells, can be generated from pluripotent stem cells, termed fetal ovarian somatic cell–like cells (FOSCLs).
Figure 20.
From here.
An important caveat is the efficiency of IVG techniques, so far. From the same paper as above:
We then used mature COCs from rOvarioids for in vitro fertilization (IVF) using wild-type sperm from ICR mice. In IVF followed by in vitro culture, oocytes were fertilized, and 30.2% (301/996) of oocytes used in the IVF became two-cell embryos (Fig. 4D and table S2). Then, 25.8% (24/93) of the two-cell embryos developed to blastocysts (Fig. 4D and table S3). This developmental rate from twocell embryos to blastocysts was comparable to that observed in embryos derived from reaggregates using E12.5 gonadal somatic cells in our previous report (2) (31.8%, 44/138; P = 0.397 by Pearson’s chi-square test). When the two-cell embryos were transferred into pseudopregnant females, 5.2% (11/212) of the embryos gave rise to offspring and all of them developed to adult mice
This method resulted in a 5% rate of live births per embryo transferred at cleavage stage. For comparison, an IVF cycle using donor eggs[33] had a live birth rate per embryo transferred (per Table 2 of this paper) of 50-70%, depending on if PGS was used. For a more realistic estimate, per Table 2 of a different paper, the pregnancy rate[34] embryo transfer for women >40 was ~20%. Even if we assume a miscarriage rate of 30%, that would result in a live birth rate for women >40 of ~14%, far better than the 5% achieved above.
However, the above comparison may not be fair, since the mouse embryos transferred were two-cell embryos, not blastocysts. Using data from this paper:
Figure 21.
A 5% implantation rate for two-cell (cleavage stage) embryos is comparable to the per embryo implantation rate for women over 41. Presumably transfer of later-stage embryos would increase the per-embryo success rate, and reduce the number of failed transfers. Regardless, the overall point is that if IVG methods produce embryos with very low implantation rates, they will need to produce them in large quantities, and relatively cheaply, for it to replace IVF for most people. Some customers may not have other alternatives, such as same-sex couples or women with certain ovarian issues, so a 5% success rate may be acceptable for them.
IVG has also produced fertile offspring in rats, though it required substantial changes in the process.
Apart from low efficiency, clinical use of IVG in humans faces three other important challenges:
As part of germ cell development, gametes undergo genome-wide epigenetic reprogramming. If this process does not occur correctly offspring can be born with imprinting defects. I am unsure how powerful our forensic methods are for detecting epigenetic abnormalities and thus unsure how well this could be detected prior to clinical trials. While some sequencing methods can track methylation patterns (eg, MethylSeq), it does not appear to be in use in pre-implantation genetic testing. There is a case report on using PGT to prevent an imprinting disorder, but it does not appear to have used methylation sequencing. A reason for optimism re: epigenetic reprogramming is that a recent study inducing parthenogenesis in a mammal, which resulted in viable offspring from female gametes, was accomplished through targeted DNA methylation editing.
The 2nd challenge is somatic mutations. Organisms accumulate de-novo mutations as a result of errors in the DNA replication process in their parent’s germline. Estimates of how many de-novo mutations germ cells carry relative to their parents, per generation, vary, and there are also likely individual differences in germline mutation rates. Following fertilization, an organisms’ cells also accumulate somatic mutations. Compared to an organism’s germline, somatic cells have substantially more mutations.
There are no firm estimates of how much disease burden de novo mutations are responsible for, but there is substantial evidence that they play a role in many cases of intellectual disability, sudden infant death, and other genetic disorders. This is corroborated by studies finding higher rates of autism and other disorders in offspring of older parents, as well as whole-exome trio studies on children with unexplained disorders and their parents.
Thus, a large increase in the number of mutations an organism is expected to have is a cause for concern.
Compared to nearly all tissues, germline cells have a much lower mutation rate per year. From this review, table 1:
Figure 22.
Assuming a 35 year-old patient using IVG and a similarly aged partner using their own sperm, and the following values for germline mutation rates:
This would result in:
Under these assumptions, an embryo generated from IVG would have approximately 400 more de-novo mutations in their germline than an embryo generated naturally from equivalent aged parents. If a more mutation-prone tissue than a skeletal muscle satellite cell is used, the difference would be even larger. I am very unsure what impact 400 extra germline mutations would have, on average, but my initial guess would be substantially higher rates of disorders that correlate with higher parental age.
Somatic mutations do not arise completely at random in the genome and are also subject to natural selection[35]. De-novo germline mutations also have a bias[36] towards certain mutations. Thus, the estimated 400 extra mutations that an IVG-generated embryo would carry would likely be substantially different, on average, than 400 extra de-novo mutations generated through the non-IVG (natural) process. One person I consulted on this question thought the mutations present in somatic cells would be more likely to be damaging than those in germline cells, under the following reasoning, paraphrased:
Generating embryos with high rates of de-novo mutations can be fixed, theoretically, through a combination of embryo selection and editing. Embryo editing could fix mutations directly, while embryo selection could be used to select for embryos with fewer mutations and/or mutations that are predicted to be less damaging or neutral.
Gene editing efficiency with current technology is moderate, so not every edited embryo will have the desired edit and some will have off-target edits (low accuracy). Given that, multiple embryos will have to be generated, and subsequently edited, to produce an embryo with the desired changes. Assuming that cost scales with the number of embryos produced, this will raise costs. Embryo selection, as well as confirmation of desired edits, would presumably require embryo sequencing, which is currently performed through trophectoderm biopsies on embryos that are 4-6 days old. Each editing step, as well as each sequencing step, adds to IVG costs.
Theoretically, IVG generated embryos, even with higher rates of mutations than naturally generated embryos, might still generate apparently healthy offspring, since IVG-generated mice and rats are apparently healthy[37]. Another reassuring datum is that cloned polo horses (apparently from a skin sample, and presumably suffering from higher rates of somatic mutations as IVG derived embryos would...) can perform at very high levels, which implies impressive physical and mental performance. It thus seems very unlikely to me that the number of mutations that IVF derived embryos carry would preclude healthy development, with the caveat that cloning tends to have a low success rate– likely implying a high rate of attrition of embryos carrying especially damaging mutations. My uninformed guess is that regulators might demand approximately similar numbers of mutations between IVG-generated and naturally-generated embryos, or strong evidence to show that the mutations they carry are likely to be low-risk.
While current IVG methods in mice do not appear to cause chromosomal instability, one potential problem with embryo editing (which might be required to fix somatic mutations) is that it requires[38] the culture of embryonic stem cells for prolonged periods of time, especially in the case of multiple edits. This prolonged culture seems to cause chromosomal instability through large-scale rearrangements. Since large-scale rearrangements are likely incompatible with embryo implantation, this is an important obstacle. However, there is some recent work by the Serrano lab (and likely other groups I’m not aware of) that shows proof-of-concept that human naive pluripotent stem cells can be cultured for a prolonged period of time while preserving genomic stability.
A point made to me repeatedly by two subject-matter experts in the IVG space was that there was a diversity of approaches to IVG, which raises the probability that at least one succeeds. One method is a chemical reprogramming approach, while the other is a genetic reprogramming approach. They also saw the success of IVG in multiple different animal species as another reason for optimism.
Jeff Hsu, CEO of Ivynatal, identified the following problems as the most central to clinical use of IVG in humans:
A PHD student working with Gameto, Merrick Smela, identified the following as problems slowing human IVG research:
One challenge that seems important, though I have not brought up to subject-matter experts, and may have an easy solution, is where to obtain the Y chromosomes for females who wish to generate sperm and how to transplant it in– the latter seems like a more difficult technical challenge. There are men who have the XX karyotype (typically found in females) with the SRY region (usually found on the Y chromosome), who are phenotypically male. This might suggest that merely editing in an SRY copy would go a long way towards producing sperm, but in fact there are several regions on the Y-chromosome that are important for sperm production. It seems likely that a full Y chromosome would be required for sperm production.
Using a generic Y chromosome might not be hugely concerning, because there aren’t many genes on it outside of the sexual differentiation region, but I’m not sure how well-studied it is. My understanding is that standard GWAS doesn’t capture sex chromosomes, so our understanding of them (X and Y chromosomes) lags substantially. Thus, we may be somewhat underestimating how important they are for traits.
When I spoke with Prof. Haiqi Chen, who studies spermatogenesis in his lab, he had longer timelines (10 years for in-lab success in sperm development, 20 years for clinical trials in humans) than Matt Krisiloff of Conception or Jeff Hsu of IvyNatal for IVG. A recent paper he published:
Dissecting Mammalian Spermatogenesis through spatial transcriptomics
He generally thought we needed much better understanding of gametogenesis before it would be advisable to do so in humans. He pointed to evidence of higher rates of imprinting disorders in offspring born from IVF as proof that we need better understanding before moving forward. A project idea he was excited about was scaling up the sperm atlas work he had done in his lab.
IVG would enable several unique applications:
If IVG could be done on a large-scale, it would make embryo selection substantially more effective, as is outlined in this article and in the academic literature. Embryo editing, to the degree it is limited by the number of embryos available[39], would become more practical.
The most impactful, but also most speculative, application of IVG, would be iterated embryo selection (IES). IES would require the generation of gametes in-vitro, fertilization in-vitro, and then the production of gametes from those embryos. This would enable multiple generations of selection to occur in-vitro. A sketch of such a scenario can be found here, and a detailed exploration here. The TL;DR is that IES, in combination with even mediocre genotype-phenotype prediction methods, would enable very large changes in traits, equivalent to many generations of selective breeding. The large numbers of embryos and general positive manifold between socially desired traits would reduce the possibility of having to make substantial tradeoffs on traits.
There are some important caveats to IES: over time, recombination would break up the tagging SNPs that current PRS scores are based on, though this could be remedied through more fine-grained GWAS’s; the genetic variance to select on will theoretically eventually be exhausted, and the unknown unknowns of in-vitro IES.
Practically, IES would require achieving IVG and fixing the problems associated with culturing embryonic stem cells for prolonged periods of time. The costs associated with multiple generations of in-vitro culturing are likely to be very substantial relative to one cycle of IVG, and one natural way to reduce the per-unit cost of IES for gamete generation is to create a “stock” of optimized embryos to generate gametes from, instead of generating gametes from each customer. However, this comes with the same downside of using donor eggs/sperm, eg, reduced relatedness.
Even if clinical use of IVG in humans takes much longer than anticipated, being able to generate human oocyte-like cells in-vitro would be an important research advance. Per 3 of the subject-matter experts I spoke to on IVG, limited availability and high costs of obtaining human fetal tissue and human oocytes slow fertility research. Being able to generate oocytes and human fetal tissue more cheaply, and without the ethical issues that some view as accompanying naturally derived oocytes and embryos, would accelerate research downstream of that input. Depending on how well they function, these oocyte-like cells, even if they are incapable of fertilization on their own, might be suitable material for somatic cell nuclear transfer, which would be a substantial advance by itself.
There are cases of male infertility that cannot be bypassed through existing methods. Primarily these are genetic cases where sperm production does not occur at all or stops at a very early stage, such that these immature sperm cannot successfully fertilize an egg. Because the causes of these sperm developmental failures are heterogeneous, only an intervention like IVG, which sidesteps this stage, seems likely to fix all these issues at once– and help same-sex female couples as well, who currently must use sperm donors.
Caveat: The IVG space is a rapidly advancing field and I don’t think I achieved sufficient subject-matter knowledge to give confident recommendations on rate-limiting steps. I think a person with more reproductive biology knowledge could come up with better recommendations. That said, the ideas below seem sensible and/or came from people directly involved in the field.
Here are a few important recent papers on human IVG, courtesy of Jeff Hsu (CEO of Ivynatal), that were especially helpful in this section, and could be useful further reading:
Men usually produce hundreds of millions of sperm in their ejaculate, only one of which fertilizes an oocyte. Fertility physicians have long been interested in choosing the “best” sperm for fertilization when using methods like IVF and ICSI that remove the need for natural fertilization.
Sperm can be measured at-scale on a variety of traits, like motility and shape (“morphology”). If there is even a minor correlation between an easily measured sperm phenotype and offspring outcomes, the large number of sperm would allow for significant improvements in offspring quality. Gwern outlines this in quantitative fashion here. If sperm selection could reduce miscarriage rates or fertilization failure, and increase live pregnancy rates, that would be an additional incentive for its routine use in IVF.
Several sperm characteristics have been investigated:
Many of these sperm selection methods seem to be evaluated on the basis of their effects on surrogate measures instead of results like live birth rates, measures of offspring health, or miscarriage rates.
One surrogate measure that is examined frequently is sperm DNA fragmentation (SDF). There are numerous ways to test for SDF, such as the sperm chromatin structure assay (SCSA), Acridine orange test, Sperm Chromatin Dispersion (SCD) Assay, Aniline blue staining, Terminal deoxynucleotidyl transferase dUTP nick end labeling (TUNEL), and more. Advocates acknowledge that SDF has a mixed evidence base, though it seems to be useful in predicting some cases of unexplained infertility and identifying men who may be good candidates for some surgeries. The 2021 WHO laboratory manual on sperm examination classifies SDF as “not necessary for routine semen analysis but may be useful in certain circumstances for diagnostic or research purposes”.
Overall, however, many of the surrogate measures that sperm selection has so far been evaluated on are not reliably tied to clinically important outcomes like live birth rates, miscarriage rates, etc. A recent proposal to apply machine learning to sperm selection, though something I’m generally excited about, relies heavily on these surrogate measures, with only a brief mention of measuring the relevant clinical outcomes.
There is some evidence from animal studies that non-destructive sperm selection based on sperm phenotype may have beneficial effects on offspring. A series of experiments in zebrafish showed that selecting for sperm longevity (how long after activation sperm can fertilize) can change the phenotype and genotypes of offspring. Similar work has been done in other animal breeding work, though I did not investigate it in-depth. However, my overall impression of the literature in humans, which was confirmed by people in the ART field but not in sperm biology specifically, is that there is no obvious best way to select sperm, and the quality of the evidence is low.
A Cochrane review of the evidence in 2019 came to a similar overall conclusion:
“The current evidence suggests that advanced sperm selection strategies in assisted reproductive technologist (ART) may not result in an increase in the likelihood of live birth. The only sperm selection technique that potentially increases live birth and clinical pregnancy rates is Zeta sperm selection, yet these results were of very low quality and derived from a single study, therefore we are uncertain of the effect...evidence gathered was of very low to low quality. The main limitations were imprecision associated with low numbers of participants or events”
A similar conclusion was reached in a 2020 Cochrane review on IMSI, a modification of the original ICSI technique in which a much higher magnification (6000x) is used to select sperm instead of 200-400x magnification used in ICSI. The higher magnification enables a more fine-grained analysis of sperm morphology than the usual ICSI methodology.
If sperm could be non-destructively sequenced, that would enable direct selection along similar lines as embryo selection. One speculative possibility is capturing spermatogonia before they undergo meiosis (or making spermatogonia with IVG) and destructively sequencing 3 of the 4 sibling gametes, and then inferring the genetics of the remaining gamete. I am unfamiliar with how feasible this project is, but I suspect we would need substantial advances in sperm maturation and culture methods to be able to keep spermatogonia alive in culture, as well as advances in microfluidics to capture sperm. This would also involve substantial sequencing costs. However, as Gwern outlines in his piece, gamete selection can result in larger gains than embryo selection, and combining the two is even more powerful.
A technology that might enable some degree of embryo editing, whether for disease prevention or for other traits, is “engineered embryonic stem cell nuclear transfer”– using an edited embryonic cell as the nuclear donor in nuclear transfer. This method was described to me by Max Berry, a bioengineer. It is in contrast to He Jiankui’s method, which relied on a single application of CRISPR editing to an early-stage embryo via microinjection, resulting in substantial mosaicism and potential off-target mutations.
Here is his sketch of this idea:
He proposed extraction of cells from an early-stage embryo (potentially, one pre-selected from a batch of embryos using PGT), and growing those ESCs in vitro. Extensive editing can be performed on cells in a dish, which can be kept stable and growing in tissue culture for months. After modification, they can be seeded monoclonally and expanded so that several hundred ‘colonies’ of modified cells are derived, each from a different ‘parent’ cell. Genome sequencing a portion of each colony will confirm the correct edits and lack of off-target effects for all cells in the colony, as they are all genetically identical.
When a colony of cells possessing all the correct edits and no genetic damage is identified, one cell from the colony has its nucleus transplanted into an enucleated egg cell. This procedure is identical in principle to somatic cell nuclear transfer, the difference being that in SCNT the egg host must reprogram a terminally differentiated nucleus 100% correctly. In this technique, by contrast, the embryonic stem cell donor nucleus is already 99% of the way to having a correct epigenome for becoming an embryo. Thus the extremely low efficiency of SCNT is bypassed.
The last hurdle to implementing this technique was extended in vitro ESC culture, specifically the maintenance of epigenetic imprinting fidelity. This was recently overcome (see this paper from the Serrano lab), meaning that there are no major technical breakthroughs required for this technique to produce viable modified human embryos. In addition, recent advancements in de novo embryogenesis (seen here) may mean that nuclear transfer can itself be skipped, and the modified ESCs can be cultured to form an entire viable embryo on their own.
This technique would potentially be far superior to CRISPR microinjection, albeit with somewhat more lab work involved. However, at scale the expense should not be wildly more than that of traditional IVF, especially considering that microinjection or PGT also require the expense of IVF regardless.
There are two main benefits over microinjection:
[author’s note: everything from “He proposed....creating humans” is Max’s]
To contextualize/caveat the above, Merrick estimated it would cost about 10,000$ to edit via HDR (homology-directed repair) a single specific variant into a stem cell line, including verification of edits with whole-genome sequencing and labor costs. Single-base modifications, as opposed to the HDR above, would be cheaper, he thinks. IVF is probably about $20,000, with about $7,000 per additional cycle. At scale, and with substantial capital costs to pay for automation, costs would likely decrease substantially below $10,000 per edit. However, the $10,000 estimate does not take into account the extra stringency that FDA oversight (eg, CLIA) would bring, so that estimate is more of a rough guess for embryo editing done without FDA supervision (in other words, illegally in the US, or abroad in jurisdictions that are more friendly to germline editing). Note that sequencing does not need to be performed after every single edit, so multiple modifications could theoretically be made in parallel.
Also relevant: a recent paper in mice placed nuclei from somatic cells into oocytes in metaphase II and succeeded in induction of haploidization (generating a haploid genome, like that contained in gametes, from a diploid genome), generating an oocyte which could then be fertilized with sperm and produce live offspring. While donor human oocytes are expensive, using artificial oocytes produced from IVG might reduce the cost, as the authors’ papers note. It is unclear to me if chromosomal crossing over and recombination occurred in this process.
Critics have argued that PGT can prevent practically any genetic disease from being transmitted and thus, that heritable germline editing for genetic diseases is not necessary. There are specific cases where PGT does not work, such as one of the parents being homozygous for dominant diseases, or in older women where the number of embryos is low, but it is true that heritable germline editing would only be truly necessary for disease prevention in a relatively small number of patients.
Human embryo editing in the US is currently illegal, since FDA approval would be required to perform a clinical trial, and the FDA is banned by Congress from considering any clinical trial applications that propose doing so. In June 2019 Congress again voted to ban the FDA from considering any heritable germline editing applications, though some Democrat House members had urged Congress to instruct the FDA to consider the issue instead of banning it outright. Obtaining regulatory clarity on heritable germline editing, ideally for severe genetic diseases with no alternative treatment, would theoretically allow heritable germline editing to proceed. I am not sure that an advocacy campaign centered on this would work given the potential for backlash if it became highly salient.
There is some polling on these and related issue, eg, Pew polling from 2021 on heritable gene editing to reduce disease risk, which shows roughly a somewhat favorable public with many uncertain, though the proportions differ with different wordings. However, I am uncertain how reliable issue polling is– see David Shor on problems with issue polling in general. My overall guess is that the more that heritable/germline embryo editing resembles prevention/treatment of disease, instead of human enhancement, the more the public will be in favor; also, avoiding a high rate of discarding embryos seems important for alleviating abortion-related concerns for some religious groups.
An anonymous colleague who is an early-career human geneticist (henceforth “Hayt”) has written a section (“the road to causally sound embryo selection”) focusing on the limitations and challenges of current methods of embryo selection on complex traits with polygenic scores, and outlines a number of ideas to improve those scores. I have also edited that section to reflect feedback from relevant subject-matter experts– in cases of controversy, assume the more sensible opinion is Hayt’s, while the errors are mine. While embryo selection with polygenic scores has only recently entered clinical practice (Genomic Prediction was founded in 2017, while Orchid was founded in 2019), selection of embryos based on monogenic diseases (known as Preimplantation Genetic Testing, PGT-M) has been part of clinical practice for more than 30 years. I will briefly describe this method, drawing from this book on PGT-M.
Preimplantation genetic testing for monogenic diseases was first performed in humans in 1990 through selecting for female embryos from a couple with an X-linked disease. Screening for autosomal diseases became possible in the mid 1990’s, and current PGT-M techniques can detect a variety of single-gene polymorphisms and chromosomal rearrangements.
While PGT-M was originally performed only for highly penetrant and deleterious diseases, its use has expanded to variants that convey increased but not guaranteed risk for disease and has been performed in 100k+ cycles globally. In one major center, about 13% of all PGT-M cases were for variants that confer increased risk for cancer. Newer techniques for PGT-M have been extended to screening for multiple single-gene disorders at a time, useful for families or populations that carry multiple disorders. In one large center, PGT-M has been used to screen for 45 different inherited cancer syndromes (pg 126 of this book), such as BRCA1, BRCA2, Li-Fraumeni syndrome, and Familial Adenomatous Polyposis. These syndromes carry cancer risks ranging from nearly guaranteed (lifetime risk > 90% for Familial Adenomatous Polyposis to merely very high (~40-60% lifetime risk of breast cancer in BRCA2).
The lifetime costs of monogenic diseases are very high, and it is likely that offering IVF + PGT-M for free to prevent the transmission of monogenic diseases is cost-effective for many diseases: eg, for BRCA1/2 and for sickle-cell disease.
Another use-case for pre-implantation genetic testing is aneuploidy testing. Apart from embryos with Trisomy 21 or Turner Syndrome (45X0, missing an X chromosome), who often survive pregnancy (though at lower rates than chromosomally normal embryos), the vast majority of embryos with aneuploidy either do not implant successfully or result in miscarriage. To prevent the disappointment and trauma of miscarriage for parents, and to increase the success rate of per-embryo transfers, IVF clinicians introduced pre-implantation genetic testing of ploidy status (PGT-A).
Likely due to technical limitations of most commonly used PGT-A methods, as well as the possibility of embryo mosaicism, PGT-A does not perfectly predict aneuploidy status, and hence, implantation status. Supporting this limitation, a recent trial comparing IVF with PGT-A versus conventional IVF without PGT-A found non-inferiority of conventional IVF, with a higher cumulative live birth rate in the conventional IVF group. That is, the group not performing PGT-A had a higher live birth rate.
Even if PGT-A correctly prioritizes the embryos with the highest chance of implantation success, embryos that are called as “aneuploid” by current methods, and especially mosaic, still have a chance of implantation and live birth. Better prioritization of embryos likely does reduce implantation failures, but since it doesn’t increase the number of embryos available, it can't increase the cumulative life birth rate. This popular press article does a good job summarizing the controversy. There are reasons to think[40] that more sophisticated genetic testing may do a better job correctly calling ploidy status, which would reduce false calls of “aneuploidy”
Editor’s note: mostly written by Hayt, subsequently edited by Willy with feedback from various other experts
Human embryo selection is a promising direction to improving the next generation’s health and well-being. As is, much current research associating genetic variants and predisposition to complex disorders is observational and cannot make firm causal conclusions. Here, I outline the basic principles and methodology underlying current embryo selection approaches and argue that in a number of ways they are lacking with regards to predictive power, accuracy, and causal claims. I then propose future research directions that will pave the way to a more scientifically rigorous and effective approach to embryo selection based on principles of improved biological modeling and causal learning.
Evolution, genes, and reproduction
Evolution (on single genetic variants) operates via selection: negative selection removes damaging mutations from the population (e.g. lethal monogenic disorders), positive selection increases the frequency of favorable variants (e.g. lactase persistence in European populations), and balancing selection maintains multiple alleles present in the population (e.g. the sickle cell anemia causing recessive mutation that in the heterozygous state leads to malaria resistance). Under more complex scenarios where there are multiple genetic variants impacting a trait under selection, other evolutionary dynamics emerge such as stabilizing selection, whereby genetic variants that alter a trait are balanced in the population to achieve an optimal trait value.
Selection shapes the mutational landscape of our genome, but there are also stochastic mechanisms that introduce phenotypically neutral mutations into a population’s gene pool. These mutations are not undergoing selection but are rather silently tagging along through random chance. These neutral variants become more or less common through the process of genetic drift (which also affects non-neutral variants). Changes in the environment change the fitness consequences of variants.
Current approaches to embryo selection and their limitations
Scientific advances are making the genetic optimization of a child's health possible. It is already common practice to select embryos based on monogenic disorders and chromosomal abnormalities, which provide no advantage, cause substantial harm, and are reasonably well-understood. But many traits do not operate via a single gene. So how can we select on these complex, polygenic traits? Currently, statistical geneticists are using quite simple approaches with various degrees of success. The procedure they use to derive genetic predisposition is generally based on the following with minor variation:
This procedure is a useful simplification, but does not perfectly model biology. A GWAS measures marginal effects of variants independently, but there are two potentially important caveats:
PRSs today show predictive power in analyzing population level data, but our confidence in individual prediction should be lower. In addition, when comparing polygenic scores that perform similarly on an aggregate sample, different PRS of the same trait, constructed in slightly different ways, can vary in trait prediction for individuals. In the most well studied polygenic trait, height, a recent paper argued that some assumptions of the PRS model were significantly violated which led to some systematic (though minor) errors in estimations– and per a subject-matter expert, these errors could be fixed by a monotonic transformation and would not affect embryo ranks.
In addition and perhaps most importantly, the explanatory power for most traits is lackluster for individual level predictions, with optimal current polygenic screening technologies increasing the mean IQ of a selected embryo by an average of 2.5 points and height by under an inch, under certain assumptions. Worryingly, genetic prediction of cognitive traits such as educational attainment, the most well powered cognitive trait studied, has proven to be significantly confounded by nondirect correlates of education attainment, suggesting selection on educational attainment using current technologies would be less than half as powerful as one might naively predict predict, discussed here[41]. On the other hand, other cognitive traits, such as IQ, may display less confounding of that type than educational attainment.
Figure 23. Meta-analysis estimates of direct and population effects of PGIs.
Confounding
GWAS is observational in nature, which leads to confounding that is difficult to control with current approaches. There are four (potentially overlapping) sources of confounding, briefly summarized below:
A classic thought experiment is the “chopstick gene”: imagine you want to find the variants that are responsible for making someone better at using chopsticks. You can take a random sample of people across the world and conduct a GWAS. You would find dozens of strong associations, but did you recover anything biologically, causally meaningful? No, you just found genetic variants that differ between East Asians and the rest of the world due random genetic drift induced by geographic proximity. Clearly, we are not interested in these spurious correlations, though using current approaches they are pervasive in naively performed GWAS studies– though the field is well aware of these problems. On the other hand, per RM, ancestry confounding can be well corrected for in a GWAS with the inclusion of Principal Components. Indirect/parental genetic effects are still picked up in a regular GWAS, but these can be teased apart with family-based GWAS, where multiple siblings and/or parents are examined.
This gets at an additional issue of polygenic score transferability. Per a subject-matter expert:
issues with polygenic score transferability stem primarily from differences in linkage disequilibrium and allele frequencies. Other forms of confounding likely play a role in educational attainment and related phenotypes. There are many people working on methods to ameliorate these issues too. It could be worth mentioning that as well: it's not a completely intractable problem.
To control for the confounding described above, researchers select “genetically homogenous” groups of people to include in their GWAS. These have been overwhelmingly white European individuals. Polygenic scores have reduced utility in individuals that have different ancestry from the GWAS sample, with the reduction increasing with genetic distance from the GWAS sample in which the polygenic score was developed. In one instance, a polygenic score for schizophrenia trained in Europeans correlated much more strongly with ancestry than the condition itself, in other groups. While polygenic scores trained in one ancestry have some degree of transferability to others, the overall reduced predictive power and unintended consequences of selecting against certain ancestries[43] of the selected embryo complicates current approaches.
Putting aside the limitations of GWAS and transferability, another important consideration (“pleiotropy”) in selecting an embryo’s trait is that not only do variants not act independently, but the same variant may impact traits differently. Sometimes a variant increases disposition to multiple desirable traits of interest, but there are some cases where a variant that increases predisposition to one favorable trait decreases it for another. Pleiotropy is not well understood, and some notable and worrying examples emerge upon investigation. For example, a single variant in the metal transport gene SLC39A8 confers decreased risk for hypertension and Parkinson’s disease, but increased risk for schizophrenia, Crohn’s disease, and cognitive performance. Searching variants in the Finngen browser (https://r6.finngen.fi/gene/) illuminates the pervasiveness of pleiotropy and illuminates thousands of examples of single variants having discordant effects on disease risks. Work is being done to quantitatively describe pleiotropy in these cohorts, and the results will shed more light on this issue for PRSs. Another striking example is bipolar disorder, where current GWAS approaches show that on aggregate, genome wide variants that decrease risk for bipolar disorder would also decrease disposition for higher education attainment. One possibility raised by a subject-matter expert is that many of these results are inflated or spurious due to assortative mating and population structure. His best guess was that overall, genome-wide genetic correlations are low and positive between most diseases– in other words, most of the time, a given genetic variant did not have discordant effects.
In addition, there is risk with current in vitro fertilization (IVF) techniques which must be weighed against the disease risk reduced by embryo selection. The absolute risk reduction as it stands is low and current embryo selection approaches would yield many false positives. As an example, imagine you select against an embryo in the top 10% of genetic risk for schizophrenia. With the current best PRS and assuming PRSs are completely causal predictors, that individual would have had a 5% chance of developing schizophrenia, compared to ~1% for any average embryo in the population (which would most likely be an underestimate for the parents that would produce an embryo in the top 10% in the first place). Contrast this with an important risk of IVF, ovarian hyperstimulation syndrome (OHSS). Exact numbers are difficult to find, but probably about 3-6% of women experience moderate OHSS and 0.1-2.0% experience severe OHSS, which in some rare cases, can be fatal.
In short, we are 1) not modeling biology correctly (though a subject-matter expert countered that selection in agriculture has done very well without much understanding of mechanisms) and 2) relying on confounded observational data to make causal claims. We need to innovate our approach such that we can confidently make concrete claims regarding genetic causality, model genetic interactions more realistically, and avoid selecting embryos without taking pleiotropy into account.
We need to approach this problem by taking principles of causality and corroborating evidence into account, using 1) new computational approaches, 2) family study designs, 3) deep phenotyping, and 4) diverse population cohorts.
On a similar note, genetic studies need to be communicated well to the public and efforts should go into public information campaigns to increase both participation as well as acceptance of such studies. Many individuals are worried about eugenics and social preferences against their own characteristics. As an example, an unnamed autism cohort that would have been one of the largest autism cohorts in the world was halted due to severe backlash from the autism community, claiming that the researchers are eugenicists that are trying to eradicate people with autism (an unfounded claim). Society and researchers will benefit from better education on these topics.
Importantly, there is an entire class of genetic variation that is being left out by current PRS methods (which rely on genotyping) that can be much more readily applied to embryo screening: rare coding variants. It is rare variants that typically have the largest per-variant effect on traits. We already understand certain types of rare variants (especially those in specific parts of the genome that code for proteins) and we can start exploiting this understanding in the context of embryo selection, which has not been done except for some monogenic disease-causing genes. A simple measure of rare variant burden that is biologically interpretable and most convincingly causal has been shown to be associated with reduced fertility, cognitive abilities, and other undesirable traits, with no positive associations with desirable traits. Some of these harmful genetic variants are mutations introduced de novo, or in the most recent generation (i.e. are not present in the parents), potentially making them a great target for embryo screening as pre-screening of the parents is not possible for these variants. Unfortunately, there are some practical issues associated with attempting to screen for de-novo (though not rare) variants, principally that most whole-genome sequencing technologies cannot currently reliably detect de novo variants[44], as they cannot be distinguished from sequencing errors, with the exception for germline mosaicism that recurred in siblings.
Selection through rare variants may be the closest we are to selecting embryos in a biologically informed manner. Computational tools to predict the effects of rare variants can and should be improved, but papers like this, that can rank genes according to how “tolerant” they are of inactivation, are a good start. Increasing size would help as well: some recent work from the UK biobank that obtained whole-exomes of 500k individuals predicts that with sample sizes in the several millions, loss-of-function variants will be found in nearly all genes.
It is an exciting time in human genetics where we must start causally learning from the data so we can improve the health and well-being of future generations and society as a whole.
From an anonymous researcher & very early-stage (not public) start-up founder in this space
Some comments:
As you pointed out, epistasis is empirically irrelevant for the diseases and traits we care about. It is exponentially harder for evolution to select for a combination of variants together rather than a single variant.
Stratification can be tested after-the-fact: Hsu's height predictor tested for stratification bias and found none. Most GWASs test for 10+ stratification dimensions (PCs) in addition to only using a homogenous sample. A PGS would not be computed for a GWAS that didn't properly address population stratification.
The 2.5 points estimate is essentially a lower bound due to the assumptions made. Check out Gwern's estimate on this topic. It should also be noted that for all polygenic scores, poor on average benefit can still have very large outlier detection benefit. An r^2 of 0.1 is sufficient for very good outlier detection (e.g., top 10% and bottom 10%). For example, even if your neuroticism predictor can only move neuroticism by 1/20th of a standard deviation on average with embryo selection, it still can be very good at detecting embryos that are extreme outliers in neuroticism.
It is commonly said that pleiotropy is not well understood--this is true if we're talking about the biological pathways resulting from specific alleles, but I personally would say that pleiotropy is very well understood. There are many hundreds of genetic correlations published in the literature, including those for important traits and diseases. If we were to select for one trait or disease, we generally know the pleiotropic effects it will have.
Issues with polygenic score transferability arise from differences in linkage disequilibrium if you're talking about transferability to other ethnicities or to CRISPR. You can safely ignore LD for cohorts/individuals with similar ancestry to the training population. LD is accounted for both on the GWAS level and the PGS calculation level.
Causal associations are very overrated, as our goal is prediction, not editing specific variants. You can tell that a car has wheels even if you see only the top half of the car. A variant which wasn't very correlated with the true causal variant wouldn't show up as significant in the GWAS. [editor’s note: I largely agree with this point for polygenic embryo selection, but getting causal variants would presumably help a lot with translating PRS to different groups, where LD decay is a problem. ]
Pleiotropy is not that big of a problem. Let's say there is a positive genetic correlation between bad trait A and good trait B. This occasionally occurs, though the correlation is very weak. You can easily find an embryo with low A and high B even though the two are statistically correlated; the correlation doesn't mean that B always increases A. You can select for both at the same time. It should be noted that pleiotropy usually causes good things to be correlated with other good things (see Okbay et al. 2022 for example), so pleiotropy is usually good and causes a synergistic effect.
The uterus is the organ where the embryo implants and later grows. It is a hollow muscular organ that can enlarge substantially during pregnancy. From inside to outside, the uterus has 3 layers, endometrium, myometrium, and perimetrium.
Figure 24.
The endometrium is the site of implantation of the blastocyst. It is a very dynamic tissue that changes during the menstrual cycle, depending on hormone levels, growing and then shedding. Abnormalities in the endometrium, such as Asherman Syndrome, can cause infertility, as can large benign growths of the uterus. There is some variation in the shape of uteri and uterine malformation likely plays a role in some cases of infertility: Uterine abnormalities occur in 7-10% of women, 25% of women with uterine abnormalities have poor pregnancy outcomes; major anomalies are 3x more common in women w/ recurrent miscarriages. Though gestational surrogates are expensive, they do provide a workaround for uterine causes of infertility.
For women who are determined to carry a baby to term themselves but have uterine issues, there are surgeries that can fix some problems and as a last resort, uterine transplantation.
The cervix, the lowest part of the uterus that connects it to the vagina, is important for fertility because cervical abnormalities can threaten pregnancies. Weakness of the cervix can cause miscarriages or preterm births. Cervical cerclage can address some of these problems, as can exogenous progesterone administration, and close monitoring of cervical length during pregnancy in individuals with a history of cervical insufficiency.
The endometrium must be decidualized (“ endometrial fibroblasts transforming into specialized secretory decidual cells”) for a successful pregnancy, which is controlled by progesterone. In most animals, this is controlled mostly by the fetus; in humans, there is more maternal control.
Most blastocysts do not successfully implant, with a 40% chance of successful implantation in optimal conditions. The current understanding of failed implantation puts some of the blame on “uterine factors” and some blame on fetal abnormalities. Researchers have defined “recurrent implantation failure” in a variety of ways, but the basic findings on predicting implantation failure (from here) are as follows:
Distinguishing between fetal or uterine causes of failed implantation is important, since it can guide treatment. Unfortunately, a review of treatments for women with repeated implantation failure noted the generally poor quality of evidence on many treatments, stating “we witnessed the emergence of a number of RIF treatment options of simple execution but characterized by weak rational bases....their introduction into current clinical practice occurred rapidly without waiting for adequate evidence of efficacy and safety”. Many of these treatments with uncertain evidence have already been introduced into clinical practice, a recurring problem with IVF treatment add-ons.
The best controlled research on risk factors for failed implantation comes from data on donor egg implantation. Donor eggs are healthy[45], and more importantly, unrelated to the age of the recipient. There is some indication that beginning in the late 30’s, the age of the recipient begins to reduce success rates, but also some evidence that recipient age does not reduce success rates. Regardless, the effect is small compared to the effect of age on donor egg quality and number. A related development is uterine transplantation, which has been successfully carried out in a number of different centers and countries, with at least 18 live births as a result. Surrogacy, though expensive, is also an option for women with uterine factor issues. From the CDC report on IVF, illustrating the relative stability of success with donor eggs as carrier/parent age increases, versus the clear decline in success with parental eggs:
Figure 25.
After a successful implantation, there is still the possibility of pregnancy loss. Recurrent pregnancy loss, defined as 2 pregnancy losses prior to 20 week, occurs in about 2-3% of couples. Risk factors for recurrent pregnancy loss are similar to the above, with higher female age as the most consistent risk factor. While many cases (perhaps up to 50%) of recurrent pregnancy loss will remain unexplained after a diagnosis, previous unexplained pregnancy losses are a risk factor for future pregnancy losses, implying a stable underlying trait. There may be some maternal “rejection” of genetically abnormal embryos, but is unclear if this really occurs in humans. From Speroff:
An intriguing finding from the COVID-19 pandemic was a drop in prematurity rates, without an increase in stillbirths, in Denmark, during lockdown. This finding was later replicated in high-income but not low-income countries. Some proportion of stillbirths and extremely premature births (which have severe health consequences for the baby[46]) are likely associated with infections, and the authors of the Denmark study viewed reduced maternal infections, as well as reduced exposure to air pollution, as possible causes of this drop.
One feared complication of pregnancy, preeclampsia, is also related to abnormal implantation:
“this invasion process is limited in pregnancies with preeclampsia, and this is the fundamental cause of the poor placental perfusion associated with preeclampsia and intrauterine growth retardation.”
One possible reason that preeclampsia is still poorly understood is that we lack a good animal model for human pregnancy. Only in great apes does the embryo completely invade the endometrium. Since the invasion process is part of what appears to go wrong in preeclampsia, and experimentation on great apes presents ethical/regulatory challenges, this may limit our understanding.
Identification of a logistically easier animal model or appropriate organoid model, might improve our understanding of implantation. Overall, I view improving implantation rates as an important target. Relative to a goal like “improve our understanding of the genetics of complex traits”, which has a straightforward mechanism of increasing GWAS sample size, improving phenotyping, increasing the use of exomes and whole genomes in large-scale genetic studies, and more within-family studies, improving implantation rates is less straightforward.
As a general disclaimer for this section, here is a thread by Lyman Stone exploring how different definitions of maternal mortality in different countries can change results. For that reason I have avoided cross-country comparison. With that caveat out of the way, I will briefly address two specific questions, with the US as the focus:
In 2020 in the US, the rate of maternal death per 100,000 live births was 23.8, with race (non-Hispanic Black women have ~ 2-3x risk vs non-Hispanic white women) and age (older women have higher risk) predicting higher mortality rates. As some context, some people use micromorts (1/1,000,000 risk of death) to compare different risks of death to each other. With that metric, a maternal death rate of 23.8/100,000 is 238 micromorts, about half as dangerous as base jumping.
However, maternal mortality is not randomly distributed: some women are at predictably higher risk. The term for this in obstetrics-gynecology is “high-risk pregnancy”. I have not found risk calculators[47] (akin to the ASCVD risk calculator based on Framingham data) for estimating maternal risk, but certain conditions are known to increase risk, to varying degrees: autoimmune diseases, high blood pressure, obesity, higher maternal age, previous C-sections, hypercoagulability, and more. However, it seems very likely that the risk of maternal death for women free of most or all those conditions is substantially lower than the 23.8/100,000 estimated above.
A variety of risk prediction models for pregnancy-related severe illnesses have been developed, though some focus on predicting mortality of obstetric patients that are hospitalized and require laboratory values. For example, the CIPHER model uses: “10 predictors: maternal age, surgery in the preceding 24 hours, systolic blood pressure, Glasgow Coma Scale, serum bilirubin, activated partial thromboplastin time, serum creatinine, potassium, sodium and arterial blood gas pH”.
A CDC report examining all maternal deaths from 2011-2015 argued that about 60% of all pregnancy-related deaths were preventable. I did not investigate this question in-depth enough to guess if that estimate was reasonable or not.
The reduction in neonatal and infant mortality is responsible for a large increase in life expectancy over the 20th century. Interestingly, mortality rates for infants and children seem to have been consistently high[48] across a wide range of historical societies and hunter-gatherer groups. Thus, from a long-term perspective, the dramatic reduction in child mortality is a world historical accomplishment. Per a 2005 WHO report, the majority of high young child (age < 5 years) mortality at that time was driven by communicable diseases, which is roughly consistent with data from hunter-gatherer groups as well– with the caveat that determining the precise cause of death in infant deaths is still occasionally a challenge with modern diagnostics. Congenital defects, violence or accidents against infants, and infanticide/abandonments were also important causes of historical infant mortality.
In high-income country settings, preterm births make up a significant portion of infant mortality. Specifically, per the American College of Gynecology, births before the 3rd trimester are about 0.5% of births but account for 40% of infant deaths. They also account for a significant amount of childhood morbidity through the long-term harm of preterm births (eg, cerebral palsy, etc.) There has been progressive improvement in the outcomes for preterm infants[49].
I will very briefly outline trends in fetal viability, the physiological limits on fetal viability, and the possibility of exogenesis to improve fetal viability. As a caveat, the approximate fetal viability numbers described here are in high-income countries.
Defining fetal viability involves some degree of ambiguity and controversy. The TL;DR is that infants as young as 22 weeks can achieve, in some highly specialized centers, survival rates of about 5%, though with very severe long-term complications and at very high hospital/long-term costs. At 26 weeks, survival rates reach about 80%, though with high rates of long-term complications. For these reasons, deciding to treat extremely preterm infants is controversial, and below a certain threshold, many centers will offer palliative care only.
A 2020 review of extremely preterm infants’ prognosis provides some useful figures, reproduced below:
Figure 26.
Some important takeaways from this figure:
Extremely premature infant survival appears to still be improving, at least from the 1990’s into the 2010’s, as data from England from the 1990’s into 2014 shows.
Figure 27.
Figure 28.
It is unclear to me why the survival rates above, which reach about 35% 22-23 week old infants, are substantially higher than the 5% reported by the ACOG report (which was last updated in 2021, but written in 2005). Differences in inclusion criteria and outcomes measures seem like the most likely candidates, as well as some improvement in prenatal care over the last decade. For this reason, I am only somewhat confident in the finding that preterm viability appears to be improving for extremely preterm (less than 26 weeks) infants. I would caution that future in-depth research into this area should first try to obtain clarity on the following:
A recent article, as well as many on twitter, have raised the possibility of artificial wombs, mostly as a means of extending fetal viability and as a means to reduce the burden of childbirth on women (in the hypothetical case that artificial wombs could completely replace natural pregnancy) . The scientific paper that started the discussion is this one, in which fetal lambs were gestated in a liquid environment, with better results than previous efforts. Some of the same authors have published a recent review on the challenges of translating research on artificial wombs into humans, whose points I will briefly summarize:
The studies mentioned so far have focused on improving the survival of extremely preterm infants, not on extending the period of time from fertilization that embryos can be cultivated for. An additional challenge is that the process of transferring a fetus from the mother to an artificial womb might necessitate a C-section, which is more dangerous earlier in the pregnancy[50].
From an ethics paper by one of the authors of the lamb paper:
A maternal burden of AWT is that fetal extraction via C-section (as is currently described in all successful AWT models) entails a higher perioperative risk for maternal complications (such as bleeding, complicated extraction, higher risk of uterine rupture in future pregnancies) at earlier stages of pregnancy. Of note, C-section is currently often used as a method of delivery for extreme premature infants in distress, and AWT following vaginal delivery may become possible in the future.
There is at least one conservation biology startup, Colossal, that has floated the possibility of trying to develop artificial wombs for wooly mammoths, as part of a roadmap for de-extinction. Such efforts may advance human artificial womb research as well, though there are likely very substantial differences in biology that will make translation from animals to humans challenging. In addition, a person close to some of the team noted their research was very preliminary and doubted they would actually do this [work on artificial wombs].
Due to time constraints I did not investigate this topic in-depth– tentatively, my impression is that neonatal care for extremely premature infants is so expensive and far from optimal that even highly expensive artificial wombs could be justified on those grounds, though I have no strong sense of how likely to succeed efforts on that front are.
Doing the reverse, and attempting to culture human embryos in culture for as long as possible, is another approach. From a regulatory perspective, this is more difficult, due to the 14-day rule (now repealed), beyond which many countries prohibit, whether formally or informally, culturing human embryos. This rule was developed in the late 1970’s and early 80’s, perhaps in reaction to progress in in-vitro culture of embryos (IVF was first performed successfully in 1978).
With the caveat that I have little expertise in this field, and did not investigate this in-depth, my impression is that this 14-day rule was not an important barrier to research until somewhat recently, when embryo culture methods improved. Before then, culturing embryos much beyond the time they would normally implant (around 5-7 days after fertilization) was not successful.
There have been recent advances in embryo culture methods which likely prompted the International Society for Stem Cell Research to relax their guidelines, obviating the 14-day rule. The new guidelines state:
Should broad public support be achieved within a jurisdiction, and if local policies and regulations permit, a specialized scientific and ethical oversight process could weigh whether the scientific objectives necessitate and justify the time in culture beyond 14 days, ensuring that only a minimal number of embryos are used to achieve the research objectives.
Instead, they now have a tiered system of research regulation, in which research involving cultivation beyond 14 days requires review by a “specialized oversight process”, but is not blanket banned. The categories of research oversight are shown below:
I am unsure what the current state-of-the-art is capable of achieving or how quickly it will likely advance. Some notable examples:
I have marked project ideas that I find especially promising with either ⭐ or ⭐⭐. I have also marked projects that I think (about 60% confident) I could find a champion/executor for with 🚀, and close to “funder-ready” with 🚀🚀.
Overall, I have tried to prioritize projects by my guess at how positively impactful they seemed likely to be, though without any pretension at a formal calculation of expected value. My reasoning of which projects were especially promising often hinged (not exclusively) on the following considerations:
While I am skeptical that environmental pollutants have a large impact on infertility, there are ancillary benefits of better pollution management that may make it a smart idea overall. With that in mind, some infertility add-ons to a pollution-focused project may be wise. I would defer to Daniel Goodwin’s ideas on this.
Steve Hsu proposed a project focused on coordinating many IVF centers to try different tweaks to the IVF protocol. Jack raised a similar idea, focusing on embryo culture mediums, which vary between centers and have not been rigorously evaluated. Specifically, he proposed cluster RCTs, randomizing different centers to receive different culture mediums, which reduces the administrative burden of running trials for clinicians.
DM me on Twitter or email me at wjchertman@gmail.com.
Thanks to Isabel Juniewicz, Lyman Stone, Mackenzie Dion, Steve Hsu, Jack Wilkinson, Daniel Goodwin, Simon Dadoun, Haiqi Chen, Robert Gilchrist, Paula Amata, Matt Krisiloff, Marco Demario, Max Berry, Jeff Hsu, Dean Spears, Aria Babu, Alexander Young, Reza Nosrati, Ruxandra Tesloianu, Noor Siddiqui and Merrick Pierson Smela for being generous with comments/feedback/interviews. To the pseudonymous contributors: thank you too!
Even bigger thanks to Milan Cvitkovic (awesome blog) for lots of feedback, continued encouragement, and prodding me to do this in the first place!
And thanks to Gwern for writing an Embryo Selection FAQ all the way back in 2016.
Summary: We spoke about the possible impact of pollutants on fertility. He pointed me to two companies in this space: Maximus and Millionmarker. Daniel is working on a whitepaper focused on small molecule pollution. He relayed a conversation he had with a senior ex-FDA official that getting a drug approved to remove toxins that are EPA approved, would be a tough sell in the FDA. He points to changes in testosterone and sperm levels as reasons to think pollution may be causing some infertility. He also thinks some large-scale changes in behavior might have some relationship to pollutants. I am convinced that pollution is underexplored, and impressed by his ideas on how to better understand and address it, but not convinced that pollution is causing a large increase in infertility per se. I think his pollution –> behavior–>reduction in fertility idea is more plausible. My skepticism for pollution having large effects on fertility in high-income countries, besides through behavioral change, comes from the following reasoning:
Summary: The consensus of the literature is that reviews conclude pro-natalist policies work, but the magnitude of effect isn’t too large; eg, a change from 1.4 to 2.2 is unlikely; 1.6 to 1.8 may be feasible with expensive policies; .05-.2 fertility boost; 100k-400k per additional US birth, much cheaper than US statistical life; baby bonuses=edge of Overton window, most cost effective, front-loading the $ helps. Something he thinks is underrated is doing better messaging that names the programs in a more pronatalist manner, calling something a “baby bonus” directly; along re: Overton window, nobody has ever tried paying a woman to have kids in at such high rates that it becomes more profitable than having a job;
On ART and TFR: ART in total accounts for ~ 6-7% of births in high-income countries. In the US it's like 4% (IVF + ART drugs). So it can’t be a huge effect, but it could be a moderately sized effect (eg, 4% of 1.7 is .068, which is comparable to lower estimates of pronatalist policies). However, if reproductive technology just extends reproductive lifespan, it may just push fertility to later in life, so the net effect is unclear to him. We’re also not close to the limits on natural fertility, so its not the rate-limiting step.
What’s a good lever to push on? There’s good data arguing the mommy wage penalty (Henrik Kleven) is from childcare, not pregnancy, by comparing adopted vs biological mothers, so childcare may be a better lever. Parenting norms are less intense in higher fertility areas like Utah. He has unpublished data (which he blogged about here) showing that a change in Georgian Orthodox Church rhetoric/policy that raised the prestige of parenting causes a big jump in fertility (1.5 to 2.2) without any government expenditure/change.
On expected vs realized fertility and fertility preferences: There’s a robust gap between realized and preferred fertility. Surplus labor like older relatives and siblings can reduce the cost of childcare. Preferred fertility has predictive power, even if people consistently undershoot their preferred fertility, and stated child number preference at 18 predicts TFR at 40. Higher fertility groups have higher preferences. Cost of childcare and opportunity cost have increased because as income rises, the scope of leisure opportunities has increased. Different attitudes towards career satisfaction and family preferences do predict fertility somewhat;
Wider social attitudes and fertility: Changing social script on child bearing seems important; his research is focused on this; So many TV shows have basically 0 children, nobody has any kids; Eg, The Expanse; [editor’s note: there’s some econometric evidence that Brazilian telenovela exposure reduced fertility, particularly in lower SES women. ]
When/why did demographic transition happen: Best evidence is that the first transition of fertility (France, Massachusetts) comes with secularization; fertility transition happens before infant mortality rates fall; France had fertility transition 1 century before Germany; Paper: Censorship and birth control censorship in UK, apparently had a big effect on fertility rates; Culture is really the spark;
Exporting smaller family sizes: “Developmental idealism” Arnold Thornton is the culprit for fertility transition today; development experts and mass media will basically propagandize about smaller family sizes, individuals associate development with small family sizes; countries will tackle the correlates of development instead of the core stuff like good institutions; The development industry arose with 1940-60’s institutions and politics, which he thinks neutralized their ability to say X institution is good, not just Y correlate of development/growth is good; World Bank and IMF are sometimes prohibited from saying “in order to get economic development you need to expand voting franchise and have competitive elections and you need good property rights regime”; tied up with anti-colonialism; Good author on this; Book recs on British colonial legacy: James Ferguson anti political machine; Mathew Lange legacies of despotism and development
Summary: Dean and colleagues at UT Austin are starting a group, and one of their core ideas is that increasing fertility rates, at current TFR levels, is good even for average utilitarians. This is because of positive returns to scale that seem to hold in the modern economy.
A specific project along these lines is combining the Nordhaus model of climate change with Romer’s model of endogenous growth theory, and realistic TFR projections, to show that because of population momentum, even with a rapid TFR rebound to replacement or above-replacement, the critical Q of “will climate change be a big deal” will be baked in / dealt with (or not) by the time higher TFR increases population size. Basically, his group is arguing that even with climate change in mind, increasing TFR rates on the margin is a good idea. Their preliminary results are that paying up to 1 million dollars for an extra child today has positive returns.
Along with this goal, he also thinks the Second Demographic Transition, in which TFR drops below replacement, is best explained by changes in preferences and an increase in the opportunity cost of children, as opposed to constraints (ala Becker’s quantity-quality) tradeoff. In other words, having a child is a larger opportunity cost in a world with lots of entertainment that a child competes with. The case for this is that we clearly live in a much richer world than 50-100 years ago, and fertility rates are much lower- what is the material constraint? In addition, there are places with female labor force participation of ~25% where desired family size is still around 2, so the constraint of female time can’t be the explanation there.
Some project ideas he had: a pilot project trialing very large baby bonuses (not a few hundred dollars, but something like 50k for a few years), ideally with a few different incentive sizes to get a sense of the demand curve; project focused on improving childcare technology, like making equivalents of a baby snoo.
He is very skeptical that the extant range of pro-natalist policies will change things on the scale required to move TFR to 2. Sweden has generous parental leave policies and other policies, but its TFR is only 1.76. He also flagged that the empirical demography field may have something of a file-drawer problem, such that positive effects are reported while negative results are not, and advised some skepticism of smaller studies. Getting TFR to replacement would require, in his view, policies that are far outside the range of current pro-natalist policies, something he seemed to agree with Lyman about.
He also emphasized that surveys on intended fertility, at least outside of East Asia, show desired family size is higher than is achieved, so we would be helping women achieve desired fertility, not burdening them with children they don’t want.
Somewhat contra to Kaufmann, he thinks that religious groups retaining high fertility rates will likely not be enough to stave off below-replacement TFR. He argues that given retention rates of 50%, groups would need fertility rates above 4 to continue growing, which is quite high. Another important point he made was that even if we are somewhat convinced by Kaufmann’s argument, we should hedge our bets against the possibility of religious groups TFR declining by pushing for higher TFR of everybody. Working paper his group wrote, under certain assumptions, heritability of fertility and existence of high fertility subgroups still doesn't fix declining populations unless fertility is quite relative to defection rates. This is similar to the conclusion that Isabel Juniewicz comes to in a recent blog post.
They come at ART from a market failure/externality problem; people’s choices are shaped by preferences/incentives; difficult to make policy that can really change stuff like that; when people make private decisions; fertility is a quantitatively important instance of market failure; they’re starting a new group that thinks of the marginal social benefit of extra people as going in the other direction because of larger economy is better for everybody, eg, returns to scale; a lot of this is driven by higher rates of economic growth; their first project takes off shelf components 2018 nobel prize + 2018 Nobel prize Romer, put them together and see if climate externalities or effects of innovation are more important; most important story is the positive externality of more economic growth; population momentum is part of this story, basically climate change story will be played out by the time fertility differences make a big difference; their best guess is one extra person today will be worth ~ 1 million $ in making world richer/better etc. ;the good news is that many women in low-fertility populations say they want more children than they’re having; goal of their group is to fill in the details; basically per-capita standards still improve, gets around aggregation utility problem;
Notes from the papers he recommended
Note: RM is a pseudonym
Summary: endorses more within-family studies as a way to get better within-family prediction (which is what embryo selection is), thinks that personality and facial attractiveness are understudied and thinks that linking genetic data + social media could fix that (lots of pictures of faces, lots of output on social media that indicates personality) but is pessimistic this will happen in the near/medium-term; thinks that rare variants are generally underrated, there’s plenty of evidence rare variants can have big effects on polygenic traits (eg, FBRN variant in Peruvians that causes substantial decrease in height, data on rare variants that disrupt protein function have large effects on phenotype (usually negative)).
A challenge with understanding spermatogenesis is that it takes place in a spatially ordered manner, with different sections of the seminiferous tubule corresponding to different stages of development. I spoke to Prof Haiqi Chen, who focuses on spermatogenesis and applied Slide-seq, which is a method of performing transcriptomics that preserves spatial information, to mouse and human testicles. While learning more about spermatogenesis will surely improve our ability to treat male infertility, the heterogeneity of hard-to-treat male infertility makes me somewhat pessimistic that this is an especially scalable solution. However, if this method enables IVG for sperm, in a mostly disease-agnostic way, that would sidestep the heterogeneity issue.
One of his papers:
Dissecting Mammalian Spermatogenesis through spatial transcriptomics
Summary: thinks that we need a comprehensive understanding of how spermatogenesis happens before we can attempt to copy it; to rule out subtle issues like imprinting issues we would need lots of testing. Optimistic scenario: Guess 10 years for an in-lab human spermatogenesis, 20 years to do thorough testing. His general feeling on in-vitro fertilization is that it's somewhat crude and the harms of it are understudied. We don’t understand most unexplained infertility, so we’re not actually fixing the core issue in most cases of IVF that don’t involve aging. He thinks we need a precision medicine approach to male infertility– different treatments for different disorders of male infertility. Thinks there are some subtle long-term issues with IVF we don’t fully understand (eg, imprinting issues).
Summary: Merrick was optimistic that IVG could be in clinical trials in 5-10 years and that human oocyte-like cells could be achieved in-vitro in 2-3 years. He is optimistic because multiple groups are working on this and it has been successfully achieved in multiple mammals. He thinks specific details of protocols (that work in animals) will have to be modified substantially. When I told him about Haiqi Chen’s work on single–cell transcriptomics that preserves spatial information, he was unsure how useful it was for IVG in oocytes, but thought it might be needed for sperm development.
He agrees that improving IVM knowledge would help with IVG research. He attributes the delay in human IVG and IVM success to human fetal tissue being much harder to obtain and mouse development timelines being much shorter. He is optimistic regarding somatic cell nuclear transfer, since the pig xenotransplants were made with that method– however, he noted this was quite expensive. He agrees that IVF research and labs are inconsistent, and is somewhat optimistic that uterine preparation could be improved.
He is less optimistic re: embryo editing, thinks there are many challenges with it. In animals, the F2 generation is what’s usually used in research to get around the editing efficiency issue, which is problematic in humans–something that might work better is editing a stem cell line to make edited gametes. He doesn’t know a lot about sperm selection, but he thought that non-destructive sequencing would be quite tough, and would require something like immobilizing gametes during meiosis and sequencing some of the sibling(?)-cells of a resultant gamete, and then inferring the gamete genotype. If there really is a reliable correlation between an easily measurable sperm phenotype (eg, motility) and some other genetic or phenotypic trait, then that would be ideal. He recommends looking into the animal breeding literature, since he thinks they have probably explored that question in more detail.
In the scenario in which complete IVG in humans is very difficult or impossible, being able to induce meiosis in addition to somatic cell nuclear transfer would still be highly impactful, since that would permit some degree of iterated embryo selection, though with the requirement of needing oocyte acceptor cells. Oocyte acceptor cells (also known as “ovarian supporting cells”) have been generated from pluripotent stem cells in mice, and if this could be achieved in humans, could reduce the bottleneck of requiring natural human oocytes, which are expensive and scarce.
Some project ideas he was excited about:
Summary: Matt is the CEO of Conception, which is likely the most late-stage IVG company. He was optimistic that in 5-10 years, there could be IVG clinical trials in humans and recently tweeted that a human egg could be generated in labs by 2023. Our conversation generally steered clear of detailed scientific discussion. He viewed regulatory caution on the part of the FDA as a barrier. He cited the example of mitochondrial replacement therapies (MRT), which are technically under FDA jurisdiction, but cannot legally proceed because Congress has prohibited the FDA from accepting clinical applications related to genetically modifying human embryos with heritable modifications. Since there’s heterogeneity in jurisdictions on laws relating to embryos (eg, UK and Australia both allow MRT), he is optimistic that other jurisdictions, in the scenario where the US initially prohibits it, would allow it.
He is optimistic re: reproductive aging approaches to fertility but thinks the gains would be incremental relative to IVG. He thinks endometrial/uterine preparedness is understudied and could improve IVF outcomes, since a lot of IVF fails because of that. He agrees ART uptake right now is low, but thinks uptake could improve substantially if more convenient, cheaper, and better ART technology was available. He thinks IVM could benefit from better surgical tools. He is generally interested in artificial wombs, unsure how realistic it is.
Summary: CEO of Ivynatal, overall very optimistic re: achieving in-lab (not necessarily in clinical use) in-vitro gametogenesis in humans in 5-10 years. Some reasons for optimism: there are multiple labs working on this, at least three different startups, and substantial commercial interest from the agriculture and animal breeding and (a more minor contribution) de-extinction world. There are multiple approaches to IVG: somatic cell nuclear transfer and reprogramming. Reprogramming can be done through specific factors added to culture medium or through genetic manipulation. There have been substantial advances recently in finer control of methylation/demethylation.
Jeff Hsu, identified the following problems as the most central to clinical use of IVG in humans:
Other topics
Summary:
I spoke to Dr. Simon Dadoun, who was excited about artificial wombs as a supplement for current neonatal care for extremely preterm infants but pessimistic about artificial wombs as a total substitute for natural pregnancy any time soon. Some key points we discussed:
Summary: I spoke with Steve Hsu (co-founder of Genomic Prediction) about IVF usage rates, some of GP’s technology (particularly their aneuploidy screening), and IVF optimization. He agrees with me that large-effect size changes in the IVF protocol have probably been found already, but thinks there is substantial room for finding more small and medium-size effects. His ideas are similar to those of Jack Wilkinson’s: the sample sizes used in IVF studies are too small to reliably detect the likely effect sizes of interventions, and so many purported effects found probably don’t replicate. He recommends a project centered around coordinating many IVF centers to try different tweaks to the protocol. He thinks that IVF rates close to Denmark’s (approximately 2x current US) are a good proxy for US rates in the future, also US IVF experts generally predict a lot of growth. He also thinks that if IVF success rates improved substantially, that would change parental calculus re: IVF. He is somewhat optimistic re: more speculative ART technology, like in-vitro gametogenesis, but estimates that even under highly optimistic timelines, the clinical adoption of such technology would take a decade or more due to regulatory concerns.
One concrete improvement he proposed is using better techniques for aneuploidy screening. Per Hsu, current aneuploidy screening from embryo biopsies is woefully inadequate and has a high technical failure rate. Technical failures are from inconclusive test results, but are called (and reported back to patients) as “aneuploidy” so as to avoid the possibility of implanting aneuploid embryos. This results in a high rate of embryos being called as aneuploid, which results in unnecessary waste of embryos. Better aneuploidy testing, such as through GP’s technology, would thereby improve iVF cycle success rates compared to current methods of aneuploidy screening. GP has not tried generating embryo scores for implantation success, that’s something that might work, but nobody has tried yet. His GP co-founder Nathan thinks that “mosaicism” is partially a result of lab error / imperfect assays, and so that seems like another example of improvement in testing improving things.
He thinks that gulf states might be a good place for creative use of ART/screening because they have high rates of cousin marriage there, and are highly aware of possible issues from that.
[1] For instance, prominent behavioral geneticist James Lee has argued strongly against polygenic selection for most traits on the grounds that it would eventually fundamentally change “an aspect of our nature”.
[2] Of course, within countries, after the demographic transition has occurred, higher religiosity predicts higher fertility.
[3] Empty Planet: The Shock of Global Population Decline by John Ibbitson and Darrell Bricker
[4] Demographic Engineering: Population Strategies in Ethnic Conflict by Paul Morland
[5] Speroff’s Clinical Gynecologic Endocrinology and Infertility 9th Edition
[6] The better counterfactual is comparing the total number of children born with a shorter delay to ART versus a longer delay, which will surely be higher in the first case, but the difference attributable to ART will be smaller than [total number of children w/ shorter delay]-[total number of children w/ longer delay], since some of the children in the former would have occurred anyways with natural reproduction.
[7] This also doesn’t include extramarital/extra-couple relationships/affairs.
[8] My belief, derived from speaking/working with researchers in the genetics of male infertility, is that whole genome sequencing for men with idiopathic azoospermia will eventually increase the diagnostic yield considerably– perhaps in addition to polygenic risk scores for male infertility. This will necessitate large consortiums of infertile men that undergo whole-genome sequencing after the known genetic causes of male infertility are ruled out, such as y-chromosome microdeletions, aneuploidy, and other known mutations.
[9] It is likely that some andrologists would quote higher numbers for patients offered the TESE procedure, likely because of differences in patient selection or because many studies only report sperm retrieval rates (sperm retrieval is necessary but not sufficient for a live delivery).
[10] There are some male fertility procedures which are very invasive, eg,micro-TESE, micro-surgical testicular sperm extraction, in which a part of the testes are biopsied and sperm retrieved with microscopy.
[11] French TFR likely starting declining around 1790, so couples from 1670-1789 can reasonably be assumed to approximate a “natural fertility” populations.
[12] Having a lower than average age at menarche is associated with somewhat earlier age at menopause, but there does not appear to have been a cohort-level effect– women are not undergoing menopause any earlier now, even though on average they are undergoing menarche earlier.
[13] An important mechanism by STI’s can cause infertility is through pelvic inflammatory disorder (PID)
[14] This raises interesting population ethics questions, which, incidentally, were recently brought up in this SBF conversation with Tyler Cowen, implying SBF may be amenable to valuing pro-natalist interventions over life-extending interventions if the former is more cost-effective.
[15] A similar paper (H/T Lyman’s twitter feed...) showed a reduction in abortions in Italy but no rise in births.
[16] An uncharitable way to summarize this is that demographic idealism is a kind of “cargo-cult” development ideology, where epiphenomena of economic growth are taken to be causal factors in improving economic growth.
[17] Panhypopituitarism is a condition where the pituitary gland is damaged and reduces (or stops entirely) its production of hormones. Life expectancy may be somewhat reduced but is still close to normal with treatment.
[18] previously known as “intersex” or “hermaphrodites”, now a defunct term.
[19] There are important differences in mouse and human gametogenesis, but this summary applies relatively well to both.
[20] Speroff cites this study as evidence of this claim, which finds that a combination of male and female factors account for 39% of infertility in couples, female infertility alone accounts for 33%, and male infertility alone about 20%. Another way to support this claim is the following: 1) I argue elsewhere that delays in the age at which couples begin trying to conceive account for the majority of the decline in infertility; 2) female fertility declines much more with age than male fertility.
[21] One source estimates a cycle cost at between $ 15 and 30 thousand dollars; another source says ~ 500$/year for egg storage. There are some lower cost clinics that cost around four to five thousand.
[22] Best thought of as the extreme lower part of the bell curve of normal female reproductive aging, with the caveat that various insults (genetic, chemotherapy, radiation, etc.) effectively shift women to the left.
[23] A needle is used to aspirate oocytes from the follicles in the ovaries under ultrasound visualization, as shown here.
[24] Even if a single embryo is transferred, monozygotic twinning can still occur after transfer.
[25] A different metric than live birth rate per cycle, as the preceding paper used.
[26] In the fertility space, “add-ons” are often used to refer to treatments that can increase the chance of having a baby, so these would not be “add-ons” in that traditional sense.
[27] As I cover more in-depth in the Infertility by the Numbers section, this is using a loose definition of “difficulty with infertility”, and the proportion of men and women who are sterile if they begin trying to conceive in their early 20’s is closer to 2% than 10%.
[28] Jack and colleagues have a paper asking UK clinicians and embryologists their reaction to the traffic light system.
[29] This may already be standard practice– I have not looked deeply into insurance coverage for ART by state.
[30] Women with high antral follicle counts, who are good candidates for IVM
[31] Cancer is a common reason for ovarian transplantation.
[33] Which are high-quality, and represent the upper bound of IVF performance
[34] Which would overestimate the live birth rate
[35] See here for an accessible informal introduction to somatic mutations and evolution.
[37] However, I have not looked deeply into the level of testing scientists have subjected IVG derived organisms to.
[38] Per speaking with someone with an interest in this field– I have not verified this myself.
[39] as a result of imperfect currently available editing technology, likely only a fraction of embryos would be successfully edited
[40] Based on conversations with employees from two different genetic testing companies, who found their methods had fewer false aneuploidy calls and lower rates of technical failure than the conventional ploidy testing.
[41] Turley et al also bring up other problems, such as pleiotropy.
[42] Here is a lecture explaining confounding in GWAS
[43] This should theoretically only be an issue in embryos of admixed parents.
[44] The reasoning for this is as follows: while the proportion of infants having a pathogenic de novo mutation might be as high as ~1/300 (and the proportion with harmful but not quite pathogenic is likely much higher, depending where the threshold for “pathogenic” is set), the probability of a specific loci being mutated is much lower. Since we are concerned with calling a specific de novo mutation, which are very rare at any given location (though they are relatively common when considering the whole genome), we would need very accurate sequencing to accurately identify de novo mutations. Playing around with this calculator with reasonable values for sequencing accuracy (eg, 98%, 99%, 99.9%, 99.99%) and probability of a specific location having a mutation (perhaps 10-8), you need highly accurate sequencing to be confident in DNM calling.
[45] More precisely, unlike patients who generally use ART, they are unselected for fertility problems, and undergo an evaluation to make sure they are good candidates for egg donation.
[46] Among children and babies <5 years, prematurity is the leading cause of death
[47] A repository of risk calculators in medicine is here– click [specialty] –> [OB-gyn] to see all those available for obstetrics-gynecology.
[48] close to 25% for infantry mortality, and another 25% for child mortality, per Volk and Atkinson
[49] That is, conditioning on a given gestational age, outcomes have improved. The lower limit on viability has decreased at the same time, which is likely resulting in more infants of very low gestational ages being born with substantial morbidity.
[50] Per an OB-GYN, this is likely because the uterus is thicker the earlier the gestational age.
[51] Jack and colleagues have a paper asking UK clinicians and embryologists their reaction to the traffic light system.
[52] This may already be standard practice– I have not looked deeply into insurance coverage for ART by state.