Anything attributed to someone in an interview is my best paraphrasing of what they said– any error or awkward phrasing is mine alone.
Any errors are mine alone.
The people who were interviewed or who provided comments looked at only their specific section– therefore, their presence does not constitute endorsement of the rest of the whitepaper.
I have sketched out some reasons I find fertility and advances in assisted reproductive technology exciting below. Of course, any detailed ethics arguments against or in favor of all or specific ART would themselves constitute an immensely long work– so I ignored them entirely in this whitepaper. There is much less informed and detailed futurology on the likely consequences of various reproductive technologies.

A good starting point in that genre is The End of Sex by Hank Greely, which is fairly balanced and focuses on how society might respond and regulate to dramatic advances in fertility tech in the medium-term (say, 10-20 years) – here is a tweet thread summarizing it. More long-term advances, such as achieving iterated embryo selection (or large-scale embryo editing / genome synthesis) with highly accurate genetic predictors has received fewer scholarly treatments– an exception is Shulman and Bostrom on embryo selection, which touches on some possible consequences of it, though they largely ignore more amorphous objections to radical reproductive tech^[1]. Of course, there is plenty of journalism on the topic, probably best captured by searching for relevant terms in Antonio Regalado’s twitter feed, eg, Gattaca.

Motivation

Trying to articulate why fertility is important feels a bit like arguing that “suffering is bad”, or something similarly self-evidential, but the following stylized facts may be convincing to some:

People in general wish they could have more children than they end up having. Even more succinctly: Intended fertility < achieved fertility; nowadays, achieved fertility tends to be consistently lower than desired fertility, as Lyman Stone has documented on a global scale.
Many sensible moral theories agree that, on the margin, more fertility in today’s world would probably be good.

Dean Spears points out that Person-affecting utilitarians may still want more people at current margins, because there are positive returns to more people in an economy that grows through innovation. We are far from a Repugnant Conclusion, where additional people have progressively less “worth it” (though still net-positive) lives.
Total utilitarians prefer more people, as long as their lives are net positive.
Religious pronatalism is generally pro having more children.

However, many assisted reproductive technologies conflict with some (not all) religious teachings– principally, any ART that increases the production and especially the destruction of human embryos is very problematic for Catholics and some other Christian groups. Sperm donation is also problematic for some groups, and some forms of collecting sperm (even if from the woman’s husband) are also verboten, depending on circumstance. However, there are some more futuristic fertility technologies which may not be as objectionable– for example, editing sperm precursor cells, which are reimplanted back which later produce sperm carrying specific mutations. I have not explored these questions much, as this whitepaper is focused on technology/applications much more so than morality/ethics.

Separately from pronatalist reasons, fertility technology is especially promising as a general-use health-improving technology. The reasoning is as follows:

Many diseases, not just “genetic diseases” (the popular name for monogenic diseases), are partially (often substantially) heritable.
We will eventually unravel the genetic architecture of most heritable diseases, and reliably identify most of the genetic variants that confer increased or decreased risk.
Disease risk will mostly either be somewhat positively correlated, or uncorrelated. The result will be that instead of disease tradeoffs being ubiquitous, and current humans being approximately “as good as it gets”, humans that are healthier on many dimensions will be possible. We will eventually be able to safely reduce genetic risk for disease in our offspring through some or all of the following:

Selection on embryos
Selection on gametes
Direct embryo editing

This will result in offspring that have substantially lower genetic risk for most diseases, improving health in a durable fashion. Some diseases already have effective prevention and/or treatment already. However, for diseases that have no effective treatment or prevention, and dim prospects for short-term success (eg, Alzheimer’s disease, or many other neurodegenerative conditions, such as Huntington’s disease), this approach, which is largely agnostic to the underlying (extraordinarily complicated) molecular biology, may be our only short/medium-term hope.

Glossary

TFR: the total number of children born to each woman, if they were giving birth at the age-specific fertility rates at that point in time, extrapolated out to the end of her child-bearing years.
Fecundability: the probability of conception (eg, pregnancy) in a given month
Live birth rate: number of live births, out of 100 embryo transfers, that results in the successful delivery of a live baby.
Gametes: cells that contain (usually) half of an organism’s chromosome number, which fuse with other gametes in reproduction. There is substantial diversity even within animals on the details of this process, but at a high-level, in mammals, gametes contain half of an organism’s chromosome number, and all other cells contain the full (“diploid”) number.
Gametogenesis: the process of generating gametes.
Polygenic: usually used to refer to a “polygenic trait”, a trait that is controlled by 2 or more genes– often many thousands more.
Monogenic: usually used to refer to a monogenic disease, a disease that is caused by a mutation in a single gene.
Heritable: colloquially, something passed down from parents to children. In genetics, “heritability” has a few definitions, all aiming at quantifying the proportion of phenotypic variation in a given trait, within a population, that is due to genetic variation. Two important types are broad-sense heritability and narrow-sense heritability.
IVF: in-vitro fertilization: a reproductive technology wherein fametes are combined outside of the body, usually involving oocyte (“egg”) retrieval, and subsequent embryo implantation.
IVG: in-vitro gametogenesis: a technology to generate gametes outside the body.
IVM: in-vitro maturation: used variously to refer to a (currently) hypothetical technology that would mature immature germ cells into fertilization-competent gametes and a particular kind of IVF that uses a lower dose of hormones (“mini IVF”).

Fertility in sequence

As a way to organize this document, I’ve decided to proceed in a mostly chronological order, from the factors that influence reproductive choice (demography) to producing gametes and embryos, relevant technological interventions (IVF, IVM, IVG, and more), choosing between embryos (embryo selection), and pregnancy.

At the end of the document, Potential Opportunities, I gather all the funding opportunities I identified in my research, which are also scattered throughout in their relevant sections.

The following topics are covered in each section:

Demography: demographic transition, population momentum, transition speed of the demographic transition, infertility by the numbers, infertility over time, hormonal birth control, policy interventions on TFR
Producing gametes: biochemistry and hormones, development of gonads, gametogenesis, ovulation, reproductive aging
Interventions: IVF, other ART, IVF success over time, IVF/ART use trends, IVM, IVG, sperm selection, somatic cell nuclear transfer,
Choosing between embryos: embryo selection for monogenic diseases, PGT-A, polygenic selection, improving embryo prediction, incorporating rare variants
Pregnancy: uterus and endometrium, implantation, child/infant mortality trends, neonatal care trends, artificial wombs, in-vitro embryo culture

Demography

Summary

The TL;DR is that global fertility rates are converging.

US, Europe, much of Latin America, and most of Asia are below replacement, but with substantial heterogeneity in TFR. Parts of Africa, the Middle East, and Central Asia are still substantially above replacement but will probably converge soon.

A condensed summary on the decline in fertility rates and their causes, from Our World in Data: women’s empowerment, economic development, declines in religiosity, access to contraception, elite and media driven change in norms.

Figure 1.

Demographic transition

The demographic transition is the transition from a demographic regime with high infant mortality + high number of children to low infant mortality + low numbers of children. During the intermediate phase, as infant mortality rates fall but the number of children being born is still high, population growth rates are very high. Birth rates have continued to fall, and in some countries, are low enough (in combination with lower or negative population momentum ) to cause population decline. Strong economic growth increases fertility rates somewhat, with the post-WWII baby boom in the US as the prototypical example (though aided by higher religiosity).

From Empty World, the demographic transition is robust to differences in contraception technology/access, religion^[2], and ethnicity.

Population momentum

An important concept to understand in addition to total fertility rate is population momentum. For a period of time after TFR falls below replacement, a population can still grow because of the relatively young age structure of the population.

Another way to understand this: fertility rates (TFR) are normalized to total population size, but the same TFR has different consequences depending on the % of a population which is currently in reproductive age. If only 10% of your population is reproductive age versus 20%, the same TFR of 2, will produce different population growth rates. The % of a population that is made up of people of reproductive age is affected by past population growth rates and death rates. An especially young population with a TFR of 2 will grow for a time; an especially old population with a TFR of 2 will shrink. Thus, young populations “lock in” some growth even with below-replacement TFR and conversely, an old population, even with a TFR at replacement, will experience population decline. Populations will eventually stabilize at a new equilibrium if a replacement rate fertility is maintained for long enough.

There are a variety of UN population projections available here with different rates of birth rate convergence, life expectancies, and other changes in parameters.

Transition Speed

Another important wrinkle in the demographic transition is that the timing of it has changed over time. While Europe underwent the demographic transition over many decades (starting in the 1800’s, with France as the earliest) and multiple generations, Iran and China halved their fertility rates in just 10 years. Per Empty Planet^[3], early 2000’s UN population projections, in the "medium" scenario, assumed that countries would follow similar timing as previous countries' transitions and overestimated the resulting fertility. Empty Planet also argued that Chinese TFR may be lower than is reported. Large Chinese urban centers, like Shanghai and Beijing, have fertility rates from 1-1.3, and the reported ideal family size is around 1. The latter may be overstated due to social desirability bias pushing respondents to state lower fertility preferences than they actually desire, but is in any case much lower than the approximate ideal of 2 in Europe.

Figure 2.

Differences in how quickly different ethnic or religious groups transitioned from growing to stable population sizes have played a large role in ethnic conflict (eg, Catholics in Ireland), as Morland reports in Demographic Engineering^[4]. Morland also states that ethnic conflict is more common in the 20th century than previous centuries (unsure how this was operationalized, so I’m not very confident in this fact.)

Immigration can make up for population shortfalls for some time, but since even high-fertility countries are generally converging to replacement fertility, this cannot continue indefinitely.

In addition, immigration restrictionist politics in some regions may prevent the high levels of immigration that would be needed to offset projected population decline. East Asian countries (eg, South Korea, Japan, China) have, so far, not accepted immigrants in large enough numbers to offset their expected fertility decline. Eastern Europe seems to be following a similar path. Depending on the electoral success and subsequent policies of anti-immigration parties in other parts of Europe and the US, immigration may slow or expand in those areas. Guest worker programmes, used in parts of the Middle East (like Dubai), are another possibility.

Infertility by the numbers

The goal of this section is to give a quantitative sketch of infertility.

Couples

What is the per-cycle fecundability for heterosexual couples?

For healthy young heterosexual couples, their chance of achieving pregnancy (not a live birth) after 1 year of trying without ART is probably about 85% and about 93% after 2 years.

Per-cycle fecundability is the probability of successfully achieving conception (pregnancy), not live birth, in a given menstrual cycle. A classic study found that in a cohort of couples trying to conceive without assisted reproductive technology (ART), the probability was 29% in the 1st menstrual cycle the couple began trying to conceive; the 2nd cycle per-fecundability rate was 29%, the 3rd was 16.8%, and subsequent were lower. Over a whole year, the cumulative probability of achieving pregnancy for a couple was 82%.

Live birth rates per cycle are probably somewhat lower, with other studies suggested that about 15% of pregnancies end in miscarriage in the 1st trimester, and more pregnancy losses occur later on. The overall proportion of conceptions that result in a live birth is not known with precision – here is a study examining these issues in more depth.

Ignoring those complications, and using “exposure to unprotected intercourse over time” instead of per-cycle fecundability, we get the following data, from the Speroff Clinical gynecologic endocrinology and infertility textbook^[5]:

Figure 3.

These rates are approximately the same as the fecundability data above, though live birth rates are expected to be somewhat lower. My best guess is that these inconsistencies are driven by differences in study population, like age, fertility, etc.

What % of couples will eventually be able to have a child, with and without ART?

The proportion of couples who achieve at least one live birth (a higher bar than a pregnancy) depends critically on the age at which they start attempting to conceive. Thus, as people delay childbirth, it is likely that the proportion of couples who will succeed in achieving pregnancy without ART will decline.

This simulation, using data derived from France before the Demographic Transition, implies that if couples begin trying to conceive at a young age (between ages 20-24), about 96% can expect to have at least one child:

Women who married at age 20–24 years between 1670 and 1789 had 7.0 children on average and 3.7% remained childless. Women who married at age 25–29 years had a mean of 5.7 children and 5.0% remained childless. Women who married at 30–34 years had a mean of 4.0 children and 8.2% remained childless.

I have not found a simulation addressing the specific question of “how much does increasing the average age at which couples start trying to conceive by x years affect per-couple fecundability”.

This simulation does address some of this question. Some of the assumptions:

using fertility parameters derived from the French population pre-demographic transition,
Assuming two cycles of IVF use

The results without ART:

final proportions of women who deliver a live baby reach 94% for women starting at age 30 years, 86% for those starting age 35 years and 65% for those starting at age 40 years....

While with ART:

In both cases, ART only partly reduces the gap. If a woman postpones an attempt to become pregnant by 5 years, from age 30 to 35 years, her chances of conceiving will be reduced by 9% (91–82%) and ART will make up for only 4%. If she postpones from age 35 to 40 years, the chances will be reduced by a further 25% (82–57%) and ART will make up for only 7%. In other words, ART makes up for only half of the births lost by postponing an attempt to become pregnant from 30 to 35 years (4.2/8.5), and <30% of the births lost by postponing from 35 to 40 years (7.1/25.2)....More optimistic results might be reached by encouraging women aged 35–40 years to turn to ART faster than assumed in the model, after 3 and 2 years respectively. Note, however, that this delay includes the time to decide to visit a doctor plus the time to make the necessary medical investigations, plus the time to start ART. It does not mean that the woman is not doing anything before 2 or 3 years.

That simulation assumes a relatively long interval between childlessness and seeking ART, and a longer delay will reduce the number of children born through ART^[6].

The net impact of ART, per those simulations is:

Our results show that the chance of giving birth to a live baby decreases between ages 30 and 35 years, and even more so between ages 35 and 40 years. In both cases, ART only partly reduces the gap. If a woman postpones an attempt to become pregnant by 5 years, from age 30 to 35 years, her chances of conceiving will be reduced by 9% (91–82%) and ART will make up for only 4%. If she postpones from age 35 to 40 years, the chances will be reduced by a further 25% (82–57%) and ART will make up for only 7%. In other words, ART makes up for only half of the births lost by postponing an attempt to become pregnant from 30 to 35 years (4.2/8.5), and <30% of the births lost by postponing from 35 to 40 years (7.1/25.2).

It is important to point out that in some cases of infertility, such as tubal infertility or many cases of male infertility, IVF +/- can effectively turn sterile couples into normal-fertility (for their age) couples. In the above paragraph, the decrease in fertility is largely (female) age-driven, which is only somewhat amenable to IVF.

What % of couples will ever have infertility?

The CDC estimates that about 12% of women from the ages of 15-49 have ever used infertility services in 2015-2019. Other sources, like this simulation, have found the following numbers for heterosexual couples who are unable to conceive^[7] (which is not the same as having a live birth, and is a lower bar), ranging from 1% at age 25 to 5% at age 35 to 54% at age 45:

Figure 4.

The numbers above, since they derive from inability to achieve conception as opposed to achieving a live birth, are likely a conservative lower bound on infertility at different ages.

What % of infertility comes from male vs female factors?

The data here are messy but Speroff estimates that male factors account for about 20% of infertility and play a role in another 20-40%, with estimates deriving from this study.

How much of TFR reduction is driven by a delay in childbearing, as opposed to other factors, such as reduced desired number of offspring?

A simulation study that tried to keep other factors constant found the following for six European countries:

Our results suggest that by delay of first motherhood, the incidence of permanent involuntary childlessness rose from 2 to 3% in 1970/1985 to 6 to 7% in 2007 for the countries studied (Fig. 1). In other words, 3–4% of the population of women who wanted to have at least one child did not succeed in fulfilling this wish because they had postponed too long...In spite of the massive delay of parenthood, TFRs recovered in almost all European countries since the 1980s and the 1990s (Goldstein et al., 2009). This trend is also obvious for the six countries studied where recoveries varied from 0.08 in Austria to 0.41 in Sweden (Table I). These recoveries are mainly due to the fact that after a period of marked postponement during which temporarily less children were born and consequently TFRs dropped, many couples still tried to realize the number of children they had previously envisaged, after years of delay. Most of them succeeded in doing so but some waited too long as the data of Fig. 1 demonstrate. Apart from this so-called tempo or timing effect (Bongaarts and Feeney, 1998), part of the recovery is also explained by more structural determinants such as the level of economic stability and unemployment, the cultural background and also by policy measures aiming to have a more woman- and child-friendly society (Goldstein et al., 2009; ESHRE Capri Workshop Group, 2010; Mills et al., 2011). Apparently, the positive effect of TFR recoveries is much larger than the negative effect of postponement (Table I).

This effect works out to a TFR reduction of between 0.03 to 0.06.

Why did I ignore same-sex couples?

An important caveat is that the data examined here are derived from heterosexual couples. I chose to focus on heterosexual couples due to time constraints and because same-sex couples make up a relatively small portion of total parents.

However, same-sex couples do use ART at relatively high rates, and some of the technology profiled later on seems especially attractive for them, such as in-vitro gametogenesis (which would enable cross-sex gamete production) and artificial wombs.

Male

What % of men will ever be infertile?

Per this paper, about 5% of men are sub fertile or infertile. Some possible causes are shown below, from Speroff:

A careful reader will note that the male infertility prevalence presented above exceeds the prevalence of couple infertility presented earlier, which seems logically impossible. This is because “infertility” and “subfertile” are often defined in different ways depending on the context. In the former case, couple infertility would more precisely be called “couple sterility”, while “male infertility” would include any delay in achieving a pregnancy as well.

What % of male infertility is treatable by current ART/IVF techniques?

As a broad generalization, mild cases of male subfertility characterized by low sperm counts are straightforwardly treatable by current ART techniques, ranging from gonadotropins for some hormonal causes of low sperm counts to surgeries to repair varicoceles and more. About 10-20% of infertile men have azoospermia, diagnosed when there is no sperm in the ejaculate, and generally considered the most severe form of male infertility.

Within this category, clinically, andrologists distinguish between obstructive and non-obstructive azoospermia. The former includes conditions that disrupt sperm transport and/or ejaculation, like cystic fibrosis, congenital absence of the vas deferens, or nerve damage that prevents ejaculation. Men with obstructive azoospermia can usually achieve fertilization with a variety of techniques that retrieve sperm directly, whether from the epididymis or the testicles directly. With the use of ICSI, which directly injects a sperm into an oocyte, even very small numbers of sperm (in one case, a single sperm) can be used successfully. In addition, men with a variety of sperm abnormalities that impair sperm motility can still achieve fertilization with ICSI.

Men with nonobstructive azoospermia (NOA), which are likely about 1% of the male population, can have a variety of causes for their infertility: cryptorchidism, mutations, chromosomal abnormalities, trauma or illness, radiation, chemotherapy, disorders of sexual development, and more (~50% have no identifiable cause^[8]). In this group the problem is disrupted sperm development. These group of men have the worst outcomes– a cohort study found a success rate of 13.4%^[9] (where outcome=live delivery) undergoing testicular sperm extraction followed by IVF and ICSI.

Putting all that together, perhaps 87% of men with NOA, which is about 1% of the male population, will not be treatable with current ART, giving a final figure of 0.87%.

In addition, sperm donation prices are much lower than egg donation costs. Sperm preservation to preserve fertility is substantially cheaper than egg freezing and usually much less invasive^[10]. Since male factor infertility is also a smaller fraction of total infertility cases, it may be less impactful to focus on overall.

Female

What % of women will ever be infertile?

Note: some of this content overlaps with content in the female reproductive aging and diagnosing female infertility section.

There are several ways to answer this question. With data from women in rural France who were married between 1670 and 1789, assumed to be naturally fertile^[11] this simulation showed that 3.7% of women married between ages 20-24 years remained childless, 5% of women married between ages 25-29 years, and 8.2% of women married between ages 30-34 years. Women with access to modern medical technology, all other factors being equal, should have substantially lower infertility rates, since tubal infertility and STD related infertility are now easily treatable, as well as a significant proportion of ovulatory disorders like hyperprolactinemia and PCOS.

Infertility over time / environment

Summary

If we want to put an upper bound on how important environmental causes of infertility could be, we need an estimate of how prevalent environmental-induced infertility is and a sense of how it has changed over time. To clarify, “infertility” as it is used in this section refers to an inability to achieve a live birth when it is desired. This is less precise than the medical definition, which includes a specified time period.

More bluntly (h/t Milan): how much of the problem of people having fewer kids than they want is because of infertility?

Infertility Increase?

A speculative possibility is that widespread circadian rhythm derangement from less sun exposure and widespread artificial lighting may influence fertility. There is some evidence that the pineal gland affects fertility and humans exhibit seasonal variation in conception rates. However, since infertility rates overall have not increased substantially despite large disruptions to circadian rhythm over the last century, it is unlikely this can play a large role in fertility rates overall.

Another concerning trend is the change in pubertal timing. The age at which girls begin puberty has been decreasing since the beginning of the 20th century, and the precise causes(s) are not well-understood. So far, the consensus points to better nutritional status, higher rates of obesity, lower levels of physical activity (since high levels of physical activity can delay puberty), and perhaps endocrine disruptors. As far as I know, there is no evidence that this trend has caused infertility^[12], and it should instead be viewed as evidence that we don't understand puberty and fertility very well, and perhaps make us more uncertain as a whole.

On the other hand, increases in obesity have probably reduced fertility somewhat as well. An ASRM practice bulletin summarizing the effects of obesity on reproduction focused on the relationship between female obesity and PCOS (which often causes anovulation), female obesity and pregnancy complications, and male obesity on sperm function. I have not seen quantitative estimates of the effects of obesity on infertility overall– eg, what % of infertility globally/nationally is caused by obesity?

Advances in fertility preservation have somewhat reduced the burden of infertility caused by cancer (in both men and women), though increased survival has very likely increased the proportion of the population with cancer-related infertility.

Infertility Decreases?

My best guess is that infertility rates, once adjusted for delays in reproduction, have not substantially increased and may have decreased. From 1982 to 2002, infertility rates appear to have declined in the US, which continued into 2015 (Speroff cites CDC data on this). Globally, infertility appears to have been stable, with some decline in infertility in low-income countries (primarily Sub-Saharan Africa and South Asia). A caveat with the above data is that it uses as its denominator “proportion of women of reproductive age (20–44 y) who are exposed to the risk of pregnancy...desire a child”. It seems that if women who are infertile also don’t desire a child, they could get systematically undercounted in those surveys, and thereby cause underestimation of infertility rates. However, I did not look very deeply into these studies, so this may not be an important risk.

Age-adjusted infertility rates may have decreased somewhat since the early 20th century, primarily due to better treatment of STDs^[13], post-birth complications, and advancements in medical care for infertility. As an example of the potential impact of STI’s on infertility, consider the “infertility belt” in Central Africa, which suffers from poor treatment of STI’s, as well as poor treatment of post-birth complications (which can sometimes cause infertility).

Both STI’s and post-birth complications are treated better in high-income countries relative to historical norms, implying a reduction in infertility, as long as there hasn’t been a large rise in the prevalence of STI’s that might cancel out better treatment. However, I have not been able to find a review trying to answer this question, so I’m very unsure about this conclusion. A recent CDC study found that PID rates had decreased from 2006 to 2017, but I’m unsure of the long-term trajectory, eg, what PID rates were in the 1900’s.

A decline in smoking in the US has probably slightly reduced infertility as well, since it has a consistent correlation with infertility that seems at least partially causal. Speroff cites this study to argue that “up to 12% of female infertility could be related to smoking”.

Sperm Count decline?

A much-publicized meta-analysis from 2017 found a decline in sperm counts of 59.3% in men from Western countries, since 1973. One of the same authors has published papers arguing that animal data and some human epidemiology suggests that a common ingredient in plastics, phthalates, has anti-androgenic effects.

The opposite side of this debate, summarized in a NYTimes article argues that this apparent decline may be an artifact of changes in measurement technique or not all that important if it is real. Since there is not a strong relationship between sperm count and fertility above a certain threshold, it is unlikely that a moderate decline in sperm counts would substantially increase male infertility rates. An paper using simulations by a respected French demographer came to a somewhat similar conclusion, stating:

A decline in fecundability by 15% implied a decrease in fertility by 4%, and an increase in the proportion of couples eligible for infertility treatments by 73%. An increase in the mean age at initiation of first pregnancy attempt by 2.5 years from 25 years entailed a decrease by 5% in fertility and an increase by 32% in the proportion of couples eligible for infertility treatments...A relatively important decrease in fecundability and an increase by 2.5 years in age at first pregnancy attempt are likely to have only a limited impact on fertility. However, they may have a large impact on the proportion of involuntarily infertile couples, likely to resort to assisted reproduction techniques.

Overall skepticism

A more high-level reason to be skeptical that biological infertility per se is currently a large constraint on TFR is the following:

Populations that explicitly aim for high fertility, such as Orthodox Jews, Hutterites, and Amish achieve comparable or higher fertility than historical populations with very high fertility, such as the Quebecois. A counter-argument to this might be that Hutterites are exposed to very different levels of pollution than the average American, but Orthodox Jews live in urban environments that are comparable to the average urban-dwelling American. This places a ceiling on how large an effect pollution could be having on fertility.

The data is sparse, but one point in favor of Orthodox Jews having similar exposure to the environment (in spite of strict dietary laws) is the presence of relatively similar levels of obesity in the Orthodox Jewish community relative to non-Jewish English neighbors.

Isabel Juniewicz has written a more in-depth blog post on this topic, and comes to roughly similar conclusions that biological infertility per se is not yet an important factor in declining TFR.

I spoke to Daniel Goodwin, who is working on a project for managing small molecule pollution. He argued convincingly that on a societal level, we take too long to recognize the harmful effects of novel chemicals, but I was not convinced that biological infertility per se is impacted significantly by pollution. He pointed at evidence that testosterone levels and sperm counts are dropping. I am less skeptical that pollution may be having subtle effects on behavior, which may in turn be reducing fertility, but this seems especially difficult to study in humans– mice, of course, could be fed a diet rich in pollutants and checked for behavioral dysfunction.

Another approach is to look for multiple markers of environmental disruption, instead of a single measure like sperm count. This paper does that, and finds multiple examples of markers of sexual development dysfunction are all moving in the same direction, which makes me (and some of the people I spoke to) somewhat more willing to believe this idea than before.

Solutions?

While I am skeptical that environmental pollutants have a large impact on infertility, there are other benefits of better pollution management that may make it a smart idea overall. With that in mind, some infertility add-ons to a pollution-focused project may be wise. I would defer to Daniel Goodwin on project ideas for this. Some ideas:

The NIH’s AllofUs program, while a genomics focused biobank, is interested in adding an environmental exposure component. Some fertility metrics that it might be interesting to propose as an add-on:

antral follicle count (AFC) as a endophenotype for fertility;
blood levels of pollutants vs various fertility phenotypes;

Identifying genetic variants that track environmental pollution can be used as instruments for Mendelian Randomization studies. Eg, are there any variants that cause lower or higher blood levels (whether through changes in behavior or metabolism) of some candidate pollutant that's thought to influence fertility?

Birth Control

My colleague Mackenzie Dion has a more extensive discussion of birth control following this section, so I will only sketch my impressions of the science here, summarized from Speroff.

Summary

There are many variations on hormonal birth control which vary in dosage, timing, and method of delivery. There are some known risks of hormonal birth control, such as increased clotting risk, but it seems generally safe, and likely has some positive health benefits over the longer term related to reduced risk of ovarian cancer. In my view, the most relevant aspect of hormonal birth control to the whitepaper is that there is some contradictory research on its effect on libido and sexual/partner preferences. Mackenzie disagreed with this take, citing research linking hormonal birth control to autoimmune disease, some changes in brain activity, increased antidepressant use, and links to vitamin/mineral deficiency.

Of course, hormonal birth control relates to TFR in a more prosaic way: reducing unwanted births, should, all else being equal, reduce fertility. This probably has some effect, but there are enough ways to control fertility that even countries with less access to birth control have undergone the demographic transition. At the extreme end, France underwent the demographic transition in the 1700’s, well before reliable contraceptive technology was available.

Conversely, widespread availability of contraceptives probably speeds up the demographic transition, at least per Empty Planet, and may reduce abortions (since they are sometimes used as a form of birth control), but likely does not have a strong effect on fertility overall.

The widespread availability of LARCs (long-acting reversible contraception), and their promotion in the US beginning in the 1990’s, may have contributed to changed fertility timing by reducing teen pregnancies, as this study on Colorado finds attributes a 5% reduction in teen pregnancies to them specifically. The reduction in teen pregnancies may have reduced fertility overall, or simply pushed some teen births into 20’s and 30’s, changing fertility timing.

A minor positive effect that hormonal birth control may have on fertility is that they may reduce rates of STD infections. STDs can cause infertility (primarily in women), and in some regions are a leading cause of infertility (eg, the “infertility belt” in Africa). Through this mechanism, hormonal birth control might reduce infertility.

Birth control and Partner Preference

For the reasons described above, it seems that the most plausible path through which hormonal birth control could affect TFR would be through changed behavior. For that reason, my colleague (Mackenzie Dion) has focused on the possible effects of birth control on partner preferences.

There has been persistent speculation about how hormonal birth control use may affect factors related to fertility such as altering partner preference and which could have social implications as far as contributing to increased divorce rates. The literature in animals has found some evidence that MHC similarity affects health outcomes.

In a CDC survey polling women from 2017-2019 ages 15-49, 65% were on some form of contraception with 14% taking the oral contraceptive pill, 10.4% using long-acting reversible contraceptives (ie IUDs, arm implants), 3.1% using Depo-Provera, contraceptive ring, or patch. This amounts to about 27% of women in the US taking some form of hormonal contraception. The data did not distinguish between women on hormonal and non-hormonal IUDs.

Much of the literature about hormonal birth control and partner preferences speculates whether taking hormonal birth control changes partner preferences. The mechanism often cited is that people are attracted to potential partners who have differing major histocompatibility complex (MHC) alleles and that the use of hormonal contraception is associated with preferring MHC-similar partners. MHC genes code for proteins on the surface of cells that bind to pathogens for T cells to then recognize.

Research on the association between MHC similarity and partner choice is conflicting. A 2020 metanalysis found no significant effect of MHC preference on mate selection whereas a 2017 metanalysis did. A recent genetic analysis of 3691 couples found that MHC similarity between couples did not differ from chance, and hormonal contraception use when the relationships were initiated also had no effect. A study that instructed women to smell t-shirts worn by men found a significant preference shift toward MHC similar men after initiating pill use that was not found in the control group. A preference for MHC dissimilar mates has been found in mice.

It seems possible that the effect of hormonal birth control on MHC preference shifts can be detected in a controlled research setting, but given the complex nature of human partner selection, this effect is then swamped by other variables in uncontrolled environments.

There appear to be reproductive advantages to MHC heterozygosity in mice. For example, MHC heterozygous mice were also found to have higher rates of reproductive success than MHC homozygous mice.

There is some suggestive evidence that MHC heterozygosity may produce immune benefits in mice and humans: for example, HIV-infected people with MHC heterozygosity had less viral replication than HIV-infected people with MHC homozygosity, and MHC heterozygote mice had higher survival rates and larger weights than MHC homozygotic mice when infected with multiple strains of Salmonella and Listeria.

There have not been any studies on whether administering hormonal contraception to mice causes a preference shift from MHC-dissimilar mates to MHC-similar mates. Given the conflicting evidence in humans and the demonstration of MHC-dissimilar mate preference in mice, studies along these lines may further elucidate the nature of this effect.

Although the results are mixed and often contradictory, hormonal contraception may have effects on female sexuality beyond MHC-preference including sexual function and desire. Many hormonal contraceptives, namely the combination pill (and inconsistently, the progestogen-only “minipill”), the patch, and NucaRing, hormonal IUDs (although mostly just during the first year), suppress ovulation which may effect sexual behavior and self-perception. Women feel more attractive and desirable when ovulating and men find their female partners more attractive and themselves when their partners are ovulating.

While there appears to be weak evidence that hormonal contraception effects self-perception of attractiveness and desirability, the direct link to fertility is not clear. One possible path by which perceived attractiveness could affect fertility might be frequency of intercourse or different levels of interest in having children. Hormonal birth control may subtly impact fertility. Given the various confounding factors and the previously weak effect sizes, further research will likely not be high-impact.

Policy Interventions

At various times, different countries have tried different pronatalist policies. From speaking with Lyman Stone of the Institute for Family Studies and Demographic Intelligence on this topic, within the extant range of policy interventions a change in fertility rates of 0.05 to 0.2 (where replacement TFR = 2.1) is about what is realistically achievable for fiscal pronatalism.

His guess for the most cost-effective fiscal intervention is a single large cash payment like a “baby bonus” that front-loads the incentive. His guess was that 100k-400k is the approximate cost for incentivizing an additional US birth, which is much cheaper than the statistical value of life used by US government agencies^[14]. A meeting with Dean Spears and his team, who are starting an interdisciplinary economics and demography group at UT Austin, generally corroborated these claims.

Lyman emphasized that these estimates are derived from policies within the Overton Window(ideas considered acceptable by the mainstream population) –eg, nobody has ever tried paying women to have children at rates that are comparable to a regular job.

Lyman was pessimistic about new fertility technology having large effects on TFR. He estimated that in high-income countries, ~6-7% of all births involved ART (assisted reproductive technology) and that we are not close to the limits of natural fertility, even for older women. That is, biological infertility per se is not the main constraint behind below-replacement TFR. He also thought that if ART could fix reproductive aging, this might not boost TFR all that much, since it might just push child-rearing to later in life. Finally, he pointed to a paper on the “child penalty” to mother wages that suggests the work of parenting, not pregnancy per se, is the main “cost” of having a child. To the degree that childcare per se, and not pregnancy and childbirth, is the main cost of having a child, this suggests that new ART would not radically change the current decision calculus.

Lyman thought changes in culture/religion/norms could have much more powerful effects, though effecting cultural/norm change is easier said than done. He has a working paper (not yet published) arguing that a change in the Georgian Orthodox Church, raised the status of parenting, boosted TFR from 1.5 to 2.2 without substantial change in government spending. Another neat example along cultural lines: “inviting the Pope to do a speaking tour to all the Catholic churches in your country...”, presumably referring to this paper in Brazil^[15].

He also pointed towards data showing that secularization in France caused a decline in fertility, as well as data showing that reforms in censorship laws in the UK had similar effects, as evidence that values –>fertility. As evidence for fertility preferences being important drivers, he pointed to evidence that stated fertility preferences at 18 are predictive of TFR at 40 and that such preferences are higher in high-fertility groups.

Along similar cultural lines, Lyman argued that some of the intense focus among development experts on population control and contraception was driven by “developmental idealism”. Basically, instead of focusing on hard-to-export/copy institutions like rule of law, private property, etc. development experts emphasized the demographic transition more than they should have, under the mistaken assumption that declining family size per se had a large causal influence on economic development^[16]. He argues that this focus on exporting small family sizes may be somewhat responsible for low TFR in some countries, but it is unclear to me how much he attributes to this. For further reading he recommended work by Arland Thornton, William Easterly and the Anti-Political Machine and Legacies of Despotism and Development.

Lyman also used the example of breastfeeding rates rising over time as an example of values driving behavior more than fiscal incentives:

A hugely time intensive element for moms is breastfeeding, and yet breastfeeding rates have RISEN dramatically even as women's wages have risen! Breastfeeding time has risen even as pumps have become more common! Why??? Simple: because the last few decades have seen a change in how parents conceptualize health, chemicals, nature, and children, such that today parents see formula as inferior and breastfeeding as what "good parents" do. This, despite the fact that formula has gotten tons better over time, the health benefits of long-term breastfeeding are empirically shaky, and the opportunity cost of breastfeeding has risen dramatically! It's values all the way down. Values, values, values.

Another topic that Lyman brought up was the cost of childcare. High-fertility groups rely on surplus labor from grandparents and older siblings, which lowers the cost of childcare. He recommended I speak with Samuel Hammond of Niskanen Center and Patrick Brown at EPPC about childcare, its effects on fertility, ways to reduce the cost, etc. Another point he raised along these lines was that Utah’s laws on children are the most “free-range” of any state, and it also has the highest TFR.

This modeling paper on the effects of IVF on TFR, given certain reasonable assumptions, largely agrees with Lyman’s pessimism regarding IVF boosting national fertility. The assumptions:

We assume that all couples want two children. Thus, all couples who have achieved a first child try for a second one, except those who have two or three children from the first LBD...IVF delivery gives on average 1.26 children. The twin and triplet rates in natural pregnancies add to an average of 1.01 children per delivery...assume that after 1 year of non-conception, a diagnostic fertility work up is performed, by which couples with an absolute or severe cause of infertility, such as two-sided tubal blockage or very poor semen quality, are identified and treated by IVF without delay. [they also assume 100% uptake and access to IVF services]

The authors then model two different scenarios: requiring 1 year or 3 years of waiting before IVF services are offered (the latter of which was a reasonable stand-in for European healthcare provision of IVF). The results:

Figure 5.

That is, under the unrealistic optimistic assumption of 100% uptake of IVF by women who are having trouble conceiving, and assuming every couple wants 2 children, no IVF access versus IVF access after 1 year of trouble conceiving would result in a TFR boost of 0.11. More realistic uptake values of ~50% uptake of IVF would halve that difference, and further adjustments, such as some couples only desiring 1 child, would further reduce that difference. In addition, part of the advantage is driven by the higher average number of children in an IVF pregnancy, which has likely converged to natural pregnancy rates as single-embryo transfer (described later in this document) has become the norm.

Solutions?

Dean Spears and his team proposed some project ideas:

a pilot project trialing very large baby bonuses (not a few hundred dollars, but something like 50k for a few years), ideally with a few different incentive sizes to get a sense of the demand curve.

Lyman was excited about this project as well.

A project focused on improving childcare technology, like making more technology like the automatic baby rocker (SNOO).

A good start to this project would be researching what parts of childcare are most costly and most time-intensive as a way to prioritize tech in those areas.
Lyman was not optimistic about this project, for the following reasons:

Automating one part of a multi-step process tends to increase demand for other steps in the process. Analogously, automating baby-rocking may simply increase parenting inputs at other steps– eg, perhaps snoo-using parents will invest even more effort in reading to their children or after-school programs.
It is unclear if an auto-rocker is “a valid developmental substitute for skin-to-skin touch with a parent”
His tongue-in-cheek idea was “an arms control treaty...for parents to de-escalate the child-investment competition and agree to let all our kids be middling-to-fair.”

General source when something is unsourced:

Our World in Data,

Producing gametes

Biochemistry and hormones

This section focuses on the basics of fertility, biochemistry, hormones, etc. that are necessary to understand the rest of the whitepaper.
As a general sourcing note, all unsourced information is from Speroff’s Clinical Gynecologic Endocrinology and Infertility 9th Edition, a textbook published in 2019.

Hormones are the signaling molecules used to coordinate biological activity on a large scale. In fertility, the relevant hormones are mostly steroid hormones and peptide hormones. Steroid hormones are variations on a three 6-carbon ring joined with a 5-carbon ring, and are sorted into 21-carbon ring, 19-carbon ring, and 18-carbon ring, with varying functional groups making up the rest of the variation. They are derived from cholesterol.

Because steroid hormones are not water-soluble, the majority of steroid hormones are carried in the blood by proteins. For sex steroid hormones, sex-hormone-binding globulin, which is mostly made in the liver, carries them. However, the free fraction of a hormone, which is not carried by carrier proteins, is the biologically active component.

Figure 6.

A normal human ovary can produce all three sex steroid classes: estrogens, progestins, androgens.

My general impression is that the main actions of sex steroids are well-understood: their structure, their receptor transduction pathways, their degradation pathways, etc. This knowledge has translated into a variety of synthetic hormone analogs/drugs with varied effects, eg, Tamoxifen, which has estrogenic effects on some tissues (endometrium, bone), and anti-estrogenic effects on breast tissue. There are estrogen analogs, SERMS, anti-estrogens, aromatase inhibitors, anti-progestins, and the equivalents for androgens (though SARMS are not clinically used). There is also an equivalent level of knowledge for the trophic (pituitary-produced hormones which regulate the actions of other hormone-producing tissues) hormones- eg, GnRH, FSH, LH, hCG-- and synthetic equivalents for all of them as well.

Measuring hormone levels is routine, though the currently used methods are not perfect. Some problems include autoantibodies causing hormone clumping and slightly different hormone isoforms having substantively different biological activity but showing up as the same on immunoassays. This is likely responsible for some diagnostic “fuzziness” and heterogeneity. A possible takeaway is that better hormone measurement techniques may yield unexpected fruit by improving diagnostic precision. Because anti-Müllerian hormone levels, aside from age, are the best predictor of ovarian reserve, improving hormone assays, if AMH levels are currently imperfectly measured, might improve ovarian reserve prediction, and is something I’m somewhat interested in.

Endocrinology (field of medicine focused on hormones) is sufficiently well understood that people with practically all varieties of hormone deficiency can be adequately sustained with synthetic hormones, though not perfectly^[17]. There are secondary effects of hormones that are less well-understood. For example, vasopressin, whose main effect is regulating kidney function and blood pressure, may have some important CNS/behavioral effects. Similar caveats apply to CNS effects of androgens and estrogens/progestins, which are real but not well-understood in humans. The onset/timing of puberty is also not well-understood, though leptin and kisspeptin likely play a role, and rising obesity (which increases leptin levels) rates likely play a role in secular changes in pubertal timing.

Hormone effects can vary substantially with the timing and duration of dosing. The best example is GnRH, which in pulsatile fashion initiates puberty, stimulates sex hormone production/release, and ovulation, but if given continuously has the opposite effect, eg, delays puberty, shuts down sex hormone production, etc. My impression is that precise hormone timing is slightly less well-understood, but it is understood well enough to induce ovulation, safe and effective birth control, and increase uterine receptivity to implantation.

Development of gonads

Gonads are the organs that produce germ cells (gametes) and sex hormones. The knowledge of how they develop embryonically comes in large part from various disorders of sexual development (DSDs)^[18]. DSDs are relatively rare, with an estimated prevalence of ~ 1/4500, though a much more expansive definition of genital anomalies (including cryptorchidism and hypospadias) yields an estimate of 1/200. If the definition is widened further to include late-onset congenital adrenal hyperplasia (the majority of whom may be completely asymptomatic), Turner Syndrome, and Klinefelter, then estimates of up to 1.7% can be obtained, though the majority of those affected individuals are not at all sexually ambiguous.

Depending on the exact diagnosis, current assisted reproductive technology (ART) can sometimes assist people with DSD. My sense is that the “long tail” of specific reproductive disorders in both men and women will be very difficult to address without a technology that sidesteps/fixes gametogenesis wholesale like IVG. This is because they are very heterogeneous, both in outcomes and in causes, and a specific treatment would likely address only a small subset of fertility issues. Current ART can effectively address inability to carry a pregnancy (surrogacy), hormonal issues that make pregnancy difficult (ovulation induction), and somewhat address moderately low quantities of viable gametes (IVF + ICSI). Individuals who cannot make any viable gametes will also be helped by IVG.

An important distinction between male and female fertility is that newborn females start off with about 500 thousand to 2 million germ cells, which are constantly undergoing follicular atresia (a form of programmed cell death). At puberty some undergo ovulation (~400-500 mature in a lifetime). There is some debate about stem cells possibly generating new oocytes after birth, but my impression is this didn’t pan out as a research direction. Males are continuously producing new gametes beginning in puberty, though de-novo mutations increase with paternal age, and sperm counts and male fertility do decline somewhat with age.

Gametogenesis

Brief review of natural (in-vivo) gamete formation (gametogenesis) derived from a mix of work in mice and humans cells^[19], paraphrased/copied from here:

Primordial germ cells (PGCs), derive from the pluripotent epiblast, on embryonic day 6.5

These are not sex-specified yet
This differentiation is spurred by bmp4, and also governed by prdm1, prdm14, tfap2c, nanog

These PGCs migrate towards the embryonic gonads (which will become the testis or ovary), and proliferate. This proliferation involves genome-wide epigenetic reprogramming.
Once they reach the gonads, they undergo sex-dependent differentiation, expressing sex-dependent factors like dazl, ddx4 and others.

In the testis, PGC’s continue to reproduce, and then they arrest at G1 and become prospermatogonia, and male-specific epigenetic modifications occur
In the ovaries, PGC’s stop reproducing and enter meiosis, and become primary oocytes.

Some of the spermatogonia become spermatogonial stem cells that can renew and also differentiate
At the perinatal stage, 70% of the primary oocytes apoptose, and the remaining oocytes form primordial follicles with the surrounding somatic cells, termed “squamous pregranulosa cells”.
At puberty, some of the primordial follicles are periodically activated, and the oocytes then undergo oocyte growth, which involves storing lots of maternal protein/RNA in cytoplasm and undergoing female-specific epigenetic modification.
During this oocyte growth, they have a large nucleus, called a “germinal vesicle”. Once oocyte growth reaches a plateau, oocytes resume meiosis, signaled by GV breakdown , and then arrest again at MII

Ovulation

The most fertility-relevant hormones are: GnRH, FSH, LH, HCG, prolactin, estrogen, progesterone, testosterone, Anti-Mullerian Hormone, activin, and inhibin.

Ovulation is a tightly organized hormonal sequence whose fundamentals (FSH/LH surge, rise of estrogen, etc.) are well-understood, as they form the basis for the medical induction of ovulation, as well as hormonal birth control. There are likely still some possible improvements to the precise timing of some medications (GnRH, FSH, LH), since there is substantial interindividual variability of timing and some changes in timing that occur with age. A review of tailoring FSH dose for IVF based on biomarkers did not show any benefit for live birth rate, though it might reduce the incidence of ovarian overstimulation.

Because FSH and LH have different glycoforms with different levels of biological activity, and the timing of the FSH and LH surge matters for IVF, better hormone assays might improve our understanding of ovulation. A schematic of the key hormonal cycle is shown below, with the difference between the two stemming from the moment that estrogen switches from inhibiting to stimulating FSH/LH production (“FSH LH surge”).

Schematic of pituitary and sex hormones before FSH/LH surge

Schematic of pituitary and sex hormones after FSH/LH surge

An important takeaway is that natural ovulation in humans generally results in a single dominant follicle, with the others undergoing atresia. Crudely put, IVF consists of giving enough hormonal support that more than one follicle becomes “dominant”, which can then be extracted in the egg retrieval procedure.

FSH increases the number of LH receptors and itself prepares follicles for further maturation. Follicles consist of a single oocyte and support (“granulosa”) cells. Progesterone augments pituitary secretion of LH and is responsible for the FSH response to GnRH. As progesterone levels keep increasing, this eventually feeds back and inhibits GnRH secretion. The number of follicles that grow each cycle depends on the “residual pool of inactive primordial follicles”.

The follicle that eventually gets recruited to undergo maturation/growth has actually begun recruitment 85 days before. The cohort that goes through follicular growth undergoes atresia unless they become the dominant cycle. Having high levels of FSH receptors allows the dominant follicle to survive the later drop in FSH. AMH inhibits primordial follicle growth but is also associated with higher ovarian reserve.

Lots of other local factors are involved in follicle maturation and survival, some of which are used in in-vitro maturation (IVM) experiments, like BMP, NGH, BDNF, NT-3/4/5, inhibin, activin, IGF-1. The process of follicle maturation requires angiogenesis of the dominant follicle. Some of these factors may not be totally necessary, since Laron Dwarf women, who don’t produce IGF-1, are still fertile. Oocytes depend on neighboring granulosa cells to feed them pyruvate, synthesize cholesterol, etc. and manage a lot of basic metabolic functions, and are given those through gap junctions. Genetic mutations in the growth factors, the gap junction proteins, etc. can all cause varying degrees/kinds of ovarian failure/infertility. The heterogeneity of possible genetic causes of female infertility represents a situation where a “side-stepping” approach like in-vitro gametogenesis seems likely to fix many causes at once.

In young women, each cohort of follicles that gets recruited is ~3-11 follicles per ovary. With high FSH levels, estrogen is the dominant follicular fluid substance, which is necessary for the follicles. Steroid hormone levels in the follicular fluid are orders of magnitude higher than in the blood, such that administration of estrogen into the blood would not influence local concentrations much. LH is important for the final stage of maturation because it simultaneously speeds up androgen production in the dominant follicle (which can then be converted to estrogens), but speeds up regression of other follicles.

The key to selection of 1 dominant follicle is that high FSH sensitivity within the dominant follicle (through local estrogen causing more FSH receptor production) combined with negative feedback from high levels of systemic estrogen causes all other follicles to lose gonadotropin support, because FSH levels drop. Decline in FSH causes decline in FSH-dependent aromatase activity, which leads to a decline in estrogen, which causes the androgen-estrogen balance to swing towards androgens, which leads to atresia. It seems like FSH is much more important to follicle maturation, since you can effectively eliminate LH activity in primates and just use FSH alone to simulate ovulation; the same thing has been done in gonadotrophin-deficient women. For more on ovulation, pages 363-367 of the Speroff textbook have a clear and more detailed explanation of this.

Reproductive Aging

Female Reproductive Aging

Female infertility is more often^[20] the rate-limiting step in couple fertility than male infertility. Of all the causes of female infertility, reproductive aging is the most common and has the fewest available treatments. In addition, as people in high-income countries continue to delay child rearing, it will likely become more important going forward. For all those reasons, reproductive aging seems like an especially high-impact area to focus on. On the other hand, if interventions for reproductive aging were substantially more successful, it might lead to a compensatory rise in more delayed child training, which might reduce the net benefit in fertility terms.

I present a quantitative summary of age-related declines in female fertility below. Some important takeaways:

The number of oocytes in the ovary decline with age, as do the number of oocytes collected per IVF cycle
The quality of an oocyte (as measured by the probability of live birth per embryo implanted) declines with age
The decline is gradual until after 35, and then accelerates. From here:

“biological ALB curve demonstrates that the average chance of involuntary childlessness slowly increases to 12% at 35 and 20% at age 38. From there this chance sharply rises to 50% at about 41 and reaches almost 90% at age 45.”‘

There is probably some decline in uterine function with age, but the decline is less dramatic than oocyte decline, and in cases of high risk to the mother, or evidence of severe uterine dysfunction, has an expensive but efficacious treatment in gestational surrogacy.

This high-quality source on the decline of female fertility with age is drawn from a dataset of natural fertility historical populations, who do not restrict their fertility. The graph below illustrates the ALB (age at last birth) for women in these populations, the age at which a woman is recorded to have had her last birth.

Figure 9.

There is substantial individual variation in this pattern and it is substantially heritable, with a moderate correlation between menopausal age (which follows after ALB by a few years) of mothers, daughters, and sisters. Though there have been some substantial changes in the environment that might affect ALB, such as better nutritional status and likely lower rates (and/or better treatment) of STI-related infertility, these data mostly match modern data well.

There are other lines of data showing that fertility declines with age in women, even after controlling for factors like reduced intercourse frequency and increased male partner age. These include fertility data on extant modern populations that avoid birth control (Hutterites), women trying to conceive with donor sperm (which eliminates the older male partner effect), and rates of egg retrieval and success with IVF cycles.

Molecular mechanisms of female reproductive aging

The pathophysiology of female reproductive aging is an active area of research, but likely involves several mechanisms. This article provides an overview:

Follicular depletion
Progressive decline of meiotic competence
AGEs (advanced glycation end products)
DNA damage

I am more convinced that DNA repair is involved with reproductive aging than the other listed mechanisms since a GWAS confirmed an association between genes involved with DNA repair/maintenance and menopause timing, in addition to immune system function.

Mitochondrial dysfunction

More on this later.

Proinflammatory cytokines and “inflammaging”
Oxidative stress
Telomere shortening

Apart from follicle counts that decrease with age, a key observation that these mechanisms must explain is the rising rates of aneuploidy with age, which likely account for higher rates of miscarriage in older woman, as well as the higher rates of Trisomy 21 in children of older mothers. ‘

These mechanisms are all associated with aging in general and many of the proposed treatments, like rapamycin and dasatinib/quercetin, are being investigated for general anti-aging purposes. There is some promising animal data showing rapamycin can extend reproductive lifespan in mice, but no human data on most of these interventions, with CoenzymeQ as a minor exception. Many of these treatments, like rapamycin, would have to be trialed before conception, since they likely have some harmful effects on fetal development.

One potentially promising intervention is NAD+ repletion using NMN. There is very promising mouse data showing this can rescue female fertility in aged mice. However, there are no ongoing clinical trials on reproductive aging using NMN.

There are hormone changes that occur with age that are likely not causally linked to lower fertility, such as a rise in FSH and a decline in inhibins. A rise in FSH partially compensates for reduced FSH sensitivity.

Some more notes on timing: At the onset of puberty, from the 300k-500k remaining units, 400-500 end up undergoing ovulation. Follicular depletion speeds up with time. FSH rises and Inhibin-B, IGF-1, and AMH all decrease. The increase in FSH causes follicular growth to begin sooner during late luteal phase and then later when anovulation becomes more common.

The number of follicles that mature are dependent on fSH levels and sensitivity to FSH. Control of ovum maturation are very complex, per Speroff:

“Events that yield an ovum for fertilization....are the products of essentially every regulating mechanism in human biology...classic endocrine signals, autocrine and paracrine/intracrine regulation, neuronal input, and immune system contributions.”

Though the increase is much less dramatic compared to the increase in de-novo mutations with paternal age, oocytes from older mothers probably have more de-novo mutations on average, which likely has a very small negative effect on offspring. This is in addition to the large increase in chromosomal abnormalities seen in oocytes from older women.

Current approaches to treating reproductive aging that are in active clinical use do not address the underlying pathologies and instead focus on “increasing the density of gametes”-- eg, using IVF to increase the number of oocytes and sperm that meet– or using donor eggs. The latter approach is very effective. IVF does improve pregnancy rates in older women compared to natural reproduction or other ART (like IUI) in older women, but the cost is high and the outcomes are still far from ideal. In addition, after the early 40’s, many IVF centers will not offer IVF at all, since outcomes become even worse. Because the number of healthy follicles is the rate-limiting step, simply increasing the dose of IVF hormones does not help with diminishing fertility, and has higher rates of side effects.

As part of the normal variation in reproductive aging, some women have substantially lower fertility even by their mid 30’s. At the extreme, if a woman undergoes menopause before the age of 40, which occurs with a prevalence of ~1%, this is termed primary ovarian sufficiency. About 10% of women are menopausal by 45, which tends to follow reduced fertility by about 13 years. These women have an especially hard time achieving pregnancy without donor eggs.

One possible solution to the problem of reproductive aging is increasing the proportion of women who use egg retrieval and cryopreservation earlier on in life, but this is very limited by:

costs of IVF and egg storage^[21];
availability of IVF centers (who may have long wait times at current levels of demand);
the risks, side-effects, and time-cost of undergoing IVF for women.

Diagnosing female infertility

How reliable are our methods for determining a woman’s ovarian reserve, and hence, her likely fertility? The high-level summary is that doctors have a variety of biochemical tests, imaging modalities, and genetic testing that can accurately diagnose specific causes of female infertility or subfertility. I will cover a few below.

However, our methods for accurately determining a woman’s ovarian reserve are much more crude.

A brief note regarding sensitivity/specificity: any test that is imperfect will incorrectly call some normal people “abnormal” and incorrectly call some abnormal people “normal”. Using a test with the same characteristics in different situations will affect how correct it is. If you use a test in a population with a high prevalence of a disorder, it will correctly call people “abnormal” more often. Since ovarian reserve and fertility diminish with age, the accuracy of prediction of those two traits will change with age.

Tests are either biochemical or imaging. The important biochemical tests are FSH and AMC; the important imaging is antral follicle count.

Biochemical: FSH & AMH

High FSH levels means high probability of poor response to ovarian stimulation, but normal FSH levels don’t mean much, from Speroff:

With current assays (using IRP 78/549), FSH levels greater than 10 IU/L (10–20 IU/L) have high specificity (80–100%) for predicting poor response to stimulation, but their sensitivity for identifying such women is generally low (10–30%) and decreases with the threshold value.222

AMH levels correlate with residual follicle pool. From Speroff:

The performance of AMH as a screening test of ovarian reserve has been examined in the general IVF population and in populations of women at low or high risk for DOR. Overall, lower AMH levels have been associated with poor response to ovarian stimulation and low oocyte yield, embryo quality, and pregnancy rates,242,243,267,268,269 but studies correlating mean AMH levels with IVF outcomes have not yielded threshold values that can be applied confidently in clinical care,226,243,245,267 and more recent studies failed to show an effect of low AMH levels on pregnancy rate, when corrected for age.270,271 In the general IVF population, low AMH threshold values (0.2–0.7 ng/mL) have had 40–97% sensitivity, 78–92% specificity, 22–88% PPV, and 97– 100% NPV for predicting poor response to stimulation (≤3 follicles or ≤2–4 oocytes), but have proven neither sensitive nor specific for predicting pregnancy.242,272,273,274 AMH is a very promising screening test for DOR but is likely to be more useful in a general IVF population or in women at high risk for DOR than in women at low risk for DOR. Low threshold values have good specificity for poor response to ovarian stimulation, but not for predicting pregnancy.

Clomiphene Citrate challenge: if you administer FSH, women with lower inhibin levels have an exaggerated FSH response. But it probably doesn’t work better than just basal FSH, so it is being phased out.

Imaging is another approach, primarily antral follicle count (AFC) –> count the number of follicles with transvaginal ultrasound; also high specificity and low sensitivity.

Other more complex tests of ovarian function have been tried but the results have been similar to antral follicle count alone.

From Speroff, summarizing the literature:

as different tests of ovarian reserve are highly correlated, using more than one measure in a prediction model does not necessarily improve its performance.228,244,286 The use of combined tests will not only increase the cost of testing but also complicate clinical decision-making. A combination of AMH, inhibin B, AFC, and ovarian volume was not found to be a better predictor of response to stimulation than only AFC and AMH alone.289 A meta-analysis of cohort studies investigating the performance of various combinations of tests concluded that models combining tests do not perform significantly better than individual tests such as the FC.309

Predicting female infertility

Better prediction of which women will have earlier-onset subfertility would be useful for advising earlier pregnancy in those women or offering fertility preservation. One approach that seems somewhat promising is developing polygenic risk scores for early menopause and related phenotypes. Some work on this has been done already.

This recent study developed a polygenic risk score (PRS) for Primary Ovarian Insufficiency that, in the top 1% of women, confers a 4.5x risk equivalent to canonical monogenic causes of POI, like FMR1, though FMR1 is about 2.5x more rare (occurring with a 1/250 prevalence). However, many of these women likely have a family history of early menopause, so it is unclear to me how much extra utility there is in current polygenic scores relative to family history alone. Larger sample sizes, more diverse cohorts, and deeper phenotyping (if possible– it's unclear how realistic obtaining Antral Follicle Counts for a few thousand women in a biobank would be...) would all likely improve these PRS scores.

There are also numerous specific conditions that can cause female infertility. The TL;DR is that the standard workup for female infertility will identify women with hormonal causes of infertility, such as hyperprolactinemia, hypothyroidism and PCOS and structural causes of infertility, such as an obstructed fallopian tube obstruction. In most cases, these can be treated relatively easily, a demonstration of the maturity and utility of current ART.

Figure 10.

When the cause is localized to the uterus, as a last resort, gestational surrogacy (and now, increasingly, uterine transplantation) is an (expensive) possibility. Some structural issues can be surgically repaired or bypassed with IVF. PCOS can be treated with ovulation induction (though they run a higher risk of ovarian hyperstimulation syndrome), while other hormonal issues (like hyperprolactinemia) are addressed differently, but as explained in the endocrinology section, can generally be treated well.

By contrast, Premature Ovarian Failure^[22], whether through genetic causes (eg, FMR1), radiation or chemotherapy exposure, does not have good treatment options besides using donor eggs. Women without ovaries or without ovarian follicles, are about in the same situation, though the former also require hormonal support during pregnancy and for general health. There is a possibility that the residual follicles found in women with premature ovarian failure (from whatever cause) might eventually be useful with some future form of in-vitro maturation (IVM) technology.

There is at least 1 successful case study involving in-vitro maturation of immature oocytes and another approach involving intentional fragmentation of some ovarian follicles combined with drug treatment (Akt stimulation) in a woman with POF resulting in a live birth. While I think IVM might be somewhat promising for patients with some residual follicles, it seems very unlikely to work for patients who have undergone menopause already.

Another category of infertility is “unexplained infertility”, which overlaps with age-related infertility. There are good diagnostic tests available for ovulatory function, ovulatory reserve, uterine function, tubal patency. However, we lack good predictors of gamete function and implantation ability, so they likely explain a big chunk of unexplained infertility. Some possible causes then include: recurring genetic defects in gametes, endometrial function abnormalities.

There are some uterine abnormalities that reduce fertility somewhat, and there are now low-risk surgeries (hysteroscopic surgeries) that can fix those problems, so fixing them is recommended. Myomas have better evidence than anatomic uterine differences which may simply be normal variation in uterine shape. From Speroff:

In sum, the accumulated body of evidence indicates that submucous myomas reduce IVF success rates by approximately 70% and intramural myomas by approximately 20–40%, and subserosal myomas have no adverse impact on outcomes. Submucous myomas increase risk for miscarriage after successful IVF at least threefold and intramural myomas by more than half.

Younger women who want more kids are better candidates for surgical treatments of uterine issues since those surgeries are a 1-time cost and IVF is a per-cycle cost.

The treatment for unexplained infertility is similar to age-related infertility: “increase gamete density”; bring together more eggs and sperm and hope that a healthy embryo will eventually successfully implant. Per Speroff, infertility causes by numbers are:

The major causes of infertility include ovulatory dysfunction (20–40%), tubal and peritoneal pathology (30–40%), and male factors (30–40%); uterine pathology is relatively uncommon, and the remainder is largely unexplained.

Male reproductive aging

Males experience reproductive aging. This consists mostly of decreasing sperm counts with age and increasing rates of de novo mutations, though this is somewhat complicated by heterogeneity among older men. Sperm quality as measured by motility and other phenotypes also falls with age.

Rates of live pregnancy do decrease somewhat with paternal age, even after controlling for maternal age, but the effect is small. Thus, male fertility does decline somewhat with age, but in contrast to female fertility, which faces a hard limit years before menopause, men have had biological offspring well into their 80’s.

The number of de novo mutations in sperm and offspring increases with paternal age, which likely has consequences on offspring phenotype. There is some evidence that men differ substantially on this trait, such that some men produce sperm with many more de novo mutations at similar ages.

Potential projects in reproductive aging

Develop better predictors of embryo implantation success. This could focus on better prediction of embryo quality or uterine receptivity.

As described above, we lack good predictors of gamete function and it seems likely that differences in gamete function explain a large proportion of unexplained infertility. There are several possible approaches to this:

Developing large datasets of embryo genetic data, perhaps through multiple genetic testing companies/labs pooling data, and subsequent live birth rate, and trying to develop polygenic risk scores for embryo implantation success. (this idea is indebted to Steve Hsu)
Using machine learning on video of embryos in quality to see if any ML models can be developed that can predict embryo implantation success. This has been tried, but I’m unsure if the groups doing this had sufficiently large sample sizes and ML expertise. Because this is something that benefits significantly from scale, a large-scale project could be a natural Schelling point for many large consortiums to pool data and expertise.

Uterine receptivity seems poorly understood as well. Better hormone monitoring and/or more experimental methods of monitoring uterine function (eg, single-cell transcriptomics of uterine tissue, focusing on immune function/rejection) might yield data on what predicts uterine receptivity and how to increase it.

Interventions

In this section on IVF and IVG, I’ve deviated from a pure chronological sequence, since explaining IVG is best done with the context of IVF and IVM.

In-Vitro Fertilization

What is IVF?

In-Vitro Fertilization is the fertilization of an oocyte outside of the body, as opposed to natural fertilization. The first step is retrieval of one or more mature oocytes (“eggs”) from the woman, followed by fertilization, culture in a laboratory setting, and subsequent implantation.

As additional context, a successful live birth necessitates:

production of gametes (gametogenesis),
maturation of gametes,
bringing gametes together (fertilization),
implantation of the resulting zygote in a uterus,
carrying a pregnancy to term,
delivery.

IVF intervenes at steps #2, #3, and #4 by causing increasing maturation and decreased destruction of already extant gametes in females, physically retrieving those follicles, bringing them into direct physical contact with sperm, and then reimplanting the resulting zygote in a uterus.

A brief note on outcomes– the desired end goal of couples undergoing ART is a live birth of a healthy child. Because tracking live births involves around 9 months of waiting after an ART intervention, some ART studies do not report live birth rates. Instead, they may report related outcomes, such as pregnancy. Depending on when pregnancy is measured, this has a moderate or strong relationship to a live birth. Early on, pregnancies have a high rate of failure– pregnancies closer to delivery, however, are more likely to result in a live birth. I will make a note of which outcome is being reported when appropriate.

Egg retrieval takes place shortly before ovulation and is performed via transvaginal aspiration^[23] with ultrasound guidance. The majority of IVF cycles involve exogenous hormone administration, termed “ovarian stimulation”, which increases the number of oocytes available for retrieval substantially, but has some risks, namely ovarian hyperstimulation syndrome and much higher rates of twin (or higher order) pregnancies, though this can be largely prevented with single-embryo transfer. Other IVF cycles are natural IVF cycles (also referred to as in-vitro maturation), which don’t use any exogenous ovarian stimulation, but have a lower per-cycle success rate, mostly because it results in much fewer oocytes being retrieved per cycle.

The IVF process may confer additional risk for preterm birth and some other perinatal conditions, though the data are not totally clear on how much of higher risk in IVF offspring is a selection effect (couples who will have worse perinatal outcomes using IVF) versus treatment effect. The data including sibling controls still show a mildly harmful effect of IVF, and seems solid.

Oocytes can be cryopreserved for later use, fertilized immediately and transferred “fresh”, or fertilized and the subsequent embryos frozen for later use. Fertilization can occur through either incubating the oocyte(s) with many sperm, and letting a more natural fertilization process occur, or using intracytoplasmic sperm injection, which can effectively treat many types of male infertility. Embryos are generally grown for either 3 (cleavage stage) or 5 (blastocyst stage) days before being transferred back to the woman for implantation.

A schematic of the IVF process is shown below.

Figure 11.

Source

Before IVF

There are a number of ARTs available before full-fledged IVF. These involve optimizing the frequency and timing of insemination, exogenously stimulating ovulation, or inserting sperm directly into the uterus, as well as a variety of surgeries that are performed infrequently.

Timing intercourse

For in-vivo fertilization to occur, sperm must contact oocytes within a certain time frame. Since sperm have a lifespan of about 3-5 days and unfertilized oocytes a lifespan of about 12-24 hours, insemination must occur within 3-5 days before and ~ 1 day after ovulation, when the oocyte is released from the ovary. As Speroff notes, different methods of estimating ovulation timing will yield different results. Timing intercourse is a low-cost intervention, but a Cochrane Review meta-analysis found meager benefit to doing so.

Induction of ovulation

For women whose sole fertility problem is irregular or absent ovulation, inducing ovulation is reasonably effective. Precisely quantifying the benefit of inducing ovulation over no treatment in anovulatory women is difficult, since ovulation induction is now considered standard of care. However, Speroff states that per-cycle fecundability rates of about 15-22% can be achieved with clomiphene, which is close to that achieved by normal fertile couples.

The rates of twin births, which are riskier for both mother and babies, are substantially higher in clomiphene-induced births than natural births, at rates of about 7-10% and 1.25% (though note the regional variation in twinning rates), respectively. Higher-order births are also more common, though they are still rare in absolute terms.

There are several different medications that can be used to induce ovulation: HCG, often used in combination with clomiphene. Letrozole, an aromatase inhibitor, is also used, particularly in women with PCOS, where it may result in higher live birth rates. FSH and LH, and some analogs with different pharmacokinetics are also available. Some protocols also use GnRH agonists or antagonists to suppress endogenous gonadotropin production.

Compared to IVF, the major downside of inducing ovulation alone is the high risk of multiple pregnancies. This can be mitigated (not eliminated^[24]) with single embryo transfer in IVF, in which only one embryo at a time is transferred for fertilization.

Intrauterine insemination

Intrauterine insemination is the placement of sperm directly in the uterus and may be the oldest form of ART:

Before this, in 1770, John Hunter described the first case of human intravaginal insemination because of severe hypospadias. In the mid-1800s J. Marion Sims reported on 55 intravaginal inseminations. Only one pregnancy occurred..

Because of the low cost, it may also be the most widely used, particularly in lower-income countries. The evidence for IUI’s superiority over natural insemination is low-quality, per the WHO report above, but is a convenient option for same-sex female couples or single women who are using donor sperm. It is used for unexplained infertility and for some mild cases of male infertility.

IVF success over time

Success rates for IVF (as measured by pregnancy rates) were around 6% per cycle in the early 1980’s (the first IVF cycles did not use any hormone stimulation) and reached about 30% by 1983 as hormone stimulation became routine and more oocytes were retrieved per cycle. Apart from refinement of the hormone protocol, advances in embryo cryopreservation, which led to improved rates of embryo survival after thawing, likely helped as well. Better understanding of appropriate culture media for embryos probably helped as well, and perhaps the move towards transferring embryos at day 5 instead of day 3 helped too.

The less invasive transvaginal ultrasound method of egg retrieval, as opposed to the laparoscopic approach, made IVF an outpatient procedure that could be performed in an office setting in about 15 minutes of procedure time, instead of a 1-2 hour operation in a hospital requiring anesthesia. A more recent change has been a move towards single embryo transfers, in which only one embryo is transferred for implantation at a time–this brings the risk of twin (or higher-order) pregnancies to natural conception levels. At least in the US, single embryo transfer has become the norm, moving from 18% in 2010 to 77% in 2019, per CDC reporting (derived from data reported by all IVF clinics in the US):

Figure 12.

IVF success rates globally as measured by live birth per cycle appear to have peaked around 2009 at 30%, and declined to about 22% by 2016. There are several likely possibilities :

Changes in IVF practice that reduce live birth rates overall. Some possibilities here include:

The widespread use of PGT-A, which reduces the number of embryo available to transfer
Using lower-dose or “natural-cycle” IVF, which has fewer side effects and risks but may result in fewer oocytes retrieved per cycle, and therefore, available for transfer.
Widespread use of “all-freeze” cycles, in which no embryo is transferred fresh (and some embryos are lost in the freezing and thawing process).

Changes in IVF practice that reduce per-cycle live birth rates, but don’t change cumulative live birth rates (eg, transferring embryos 1 at a time)

This study from the UK appears to support this view, finding that cumulative live birth rates were higher in 1999 to 2007 vs 1992 to 1998.

Changes in patient selection, such that older women are using IVF more.

Figure 13.

Source

On the other hand, data from Sweden shows an improvement in cumulative live birth rate per oocyte^[25] retrieved from 2007 to 2017, coinciding with increasing use of newer methods of embryo freezing (vitirification instead of slow freezing) and prolonged embryo culture methods. CDC (USA) data appears to show an improvement in IVF success rates from 2010 to 2019, as measured by % of ART cycles that result in live-birth deliveries:

Figure 14.

Some US data appears to support some decline in IVF success rates: from a CDC report, Figure 6, as live-birth deliveries has not increased as much as the number of ART cycles:

Figure 15.

On the other hand, this might be better explained by more banking cycles which have not yet translated into live births.

I have not investigated this question in enough depth to be confident, but my guess is that if the decrease in IVF success is real, which I’m not sure about, changes in patient population are the most important factor, followed by widespread use of PGT-A. There may be better quality data that can firmly answer this question, but I could not find a definitive answer.

IVF Use Trends

Regarding ART in general, there was an increase in the use of infertility services among women aged 15-44 years from 9% in 1982 to 15% in 1995, which declined to 12% in 2002, increased to 16.8% in 2010 and was 14.3% in 2019. This was largely because of delays in childrearing and an increase in availability of ART services. However, this includes all ART services, a broader category than IVF. In addition, from 2000 to 2014, the mean age of first-time mothers increased 1.4 years from 24.9 to 26.3.

For IVF in particular in 2010, in the US, infants born through IVF accounted for about 1.5% of all infants born that year, with considerable between-state variation, from “0.2% in Puerto Rico to 4.7% in Massachusetts”, which reached a nationwide average of 2.1% in 2019, with 0.5% in Puerto Rico and 5.5% in Massachusetts . Europe in 2010, had generally higher rates than the US, ranging from 0.6% in Moldova to 5.9% in Denmark. An update in 2017 (the latest available) found similar rates in Europe, but Spain had reached 7.9% of all infants born that year being born through IVF.

Since at least 1997, and until 2010 there has been a year-over-year increase in the number of infants in Europe born through IVF. In Israel, infants born through IVF went from 2.5% of all infants in 1997 to 4.1% in 2010.

I am very unsure what the long-term proportion of infants born through IVF will be. Spain’s rate of ~8% in 2017 may have been the highest globally (though Denmark reached 10% per a 2018 news article), and may have increased since, though more recent systematic data do not appear to be available. Several European countries are only 1-2% from Spain’s rate.

The most important factor that will likely increase the proportion of infants born through IVF is age at first birth continuing to increase. In the US, if more states adopt generous insurance coverage policies for IVF, that would also likely increase IVF rates.

All things being equal, substantial advances in IVF success rates and reductions in cost would also increase IVF uptake. More speculatively, some IVF-addons^[26], like embryo selection against polygenic diseases, may make IVF a more attractive option than natural conception even for couples that do not have infertility issues. For example, if IVF + embryo selection can reduce the risk of certain currently unpreventable diseases, such as Alzheimer’s disease or schizophrenia, some parents may choose to undergo IVF for the purpose of using embryo selection.

A drug that could reduce the burden of reproductive aging, particularly for women, might reduce IVF use, as it would reduce the number of women who underwent IVF for age-related infertility. A technology like in-vitro gametogenesis, if it was cost-competitive with IVF, would also likely reduce IVF use, as it would avoid the risks of exogenous hormone administration involved with IVF.

Insurance coverage for infertility treatment

One factor that likely accounts for large cross-national differences in proportions of infants born through IVF is insurance coverage and affordability.

About 11 percent of women and 9 percent of men experience difficulty with fertility^[27]. It’s estimated that 85 percent of IVF expenses are paid out of pocket. Only 17 states legally require insurers to cover or offer coverage for infertility diagnosis and treatment though to varying degrees. According to proprietary data from FertilityIQ, a digital database for information about fertility benefits and treatments, most patients spend $40,000-60,000 on IVF, the most common assisted reproductive technology (ART), and 56 percent of IVF patients have no insurance coverage for their treatment.

Given the high costs, if more states were to require insurers to cover fertility treatments, it seems plausible that their use would increase somewhat. Massachusetts requires insurers to cover ART and has the highest percent of babies born from ART in the US, reaching levels similar to those of Denmark– though 17 other states also require it, and I have not seen data systematically comparing coverage to IVF uptake rates.

A reason for future optimism

A reason for optimism regarding further improvements to the IVF process comes from the following observation: IVF research is generally underpowered for the effects it purports to detect. It is likely that some small and medium-size effect improvements to the IVF protocol have still not been identified. Some current add-ons to the IVF process, which add to the cost, are likely superfluous, or possibly even mildly harmful, so removing them could reduce IVF costs somewhat.

In conversation with Steve Hsu, he noted that many trials that clinicians use to justify add-ons are quite small and/or low-quality and recommended a “reproducibility center” that would focus on reliably improving per-cycle success rates. He is also optimistic that improvements in embryo screening can improve per-embryo implantation rates, since current aneuploidy screening methods are very crude and have high rates of technical failure (which clinical labs report as “aneuploidy”).

I spoke with Jack Wilkinson, a statistician who works on ART methodology, who echoed these concerns. He also raised other issues:

ART often involves multiple stages of treatments, which is trickier to evaluate from a statistical perspective (especially for clinicians without solid statistical training)
the use of many different outcome measures (live birth, per-embryo/cycle success rates), which introduces substantial researcher degrees of freedom;
heterogeneity in outcome reporting, which makes subsequent meta-analyses difficult to interpret, as different studies have used different definitions.
The use of surrogate measures that are not thoughtfully chosen (eg, intracytoplasmic injection of sperm (ICSI) reduces fertilization failure rates, but in couples that don’t have male-factor infertility, may not increase live birth rates)
Clinicians and funders who (wrongly) think that large-scale observational data (“Big Data”) combined with machine-learning, can replace RCTs

A lucid summary of these methodological concerns, as well as possible solutions, can be found here. Apart from the usual reasons for a “replication crisis” in a scientific field, Jack attributed this to a few things:

the incentives of IVF clinics, who benefit from marketing unproven add-ons to customers for extra income and to differentiate themselves, and the lack of regulator scrutiny on clinics.
A lack of deep knowledge of research methodology on the part of the physicians who design and run ART trials, as well as peer reviewers and editors in journals, even though these trials are as challenging to understand/design as any in research.

Some solutions

Over email and in conversation, Jack and I touched on ways to address these problems. All credit to him (and blame to me):

Education

Educating clinicians to be more skeptical consumers of research, eg, that “big data + ML” cannot replace RCTs. Doctors (at least in the US) are required to take a certain amount of CME credits for continued maintenance of their medical license, so developing a “research literacy” course that qualifies for CME credits might be one high-leverage place to intervene.

Regulatory

Regulating how clinics market add-on treatments with no strong evidence. A partial step in this direction was the UK regulator instituting a traffic light system to explain the levels of evidence behind IVF add-ons. Jack was pessimistic regarding this being a realistic goal because the reaction of clinicians and embryologists was mixed even to the traffic light system described above^[28].

Funding

Funders (government, eg, NIH) ought to award funding for large and simple RCTs on fertility interventions. Funding should be set aside explicitly for this purpose and awarded to teams with a track record of carrying out large RCTs.
Funders who are supporting RCTs should require a methodological collaborator on grants.
Funders should fund methodologists like Jack to investigate some especially tricky problems that fertility trials run into like multi-stage treatment and participants receiving multiple treatments

Journals

Pushing for the routine use of methodological peer reviewers (not just clinical peer reviewers) in prominent fertility journals
Trying to standardize outcome reporting in prominent fertility journals
Moving to pre-registration for more research beyond RCTs, eg, pre-registration of in-vitro studies.

Misc.

Steve Hsu proposed a project focused on coordinating many IVF centers to try different tweaks to the IVF protocol. Jack raised a similar idea, focusing on embryo culture mediums, which vary between centers and have not been rigorously evaluated. Specifically, he proposed cluster RCTs, randomizing different centers to receive different culture mediums, which reduces the administrative burden of running trials for clinicians.

Some more speculative ideas I’m interested in:

Funding a small group to more aggressively market the UK “traffic light” system by naming and shaming clinics who have especially aggressive/deceptive marketing of IVF add-ons. The UK government’s Competition and Market Authority has now issued directives for fertility clinics on how they can market and advertise their services.
Some US states mandate insurance coverage for ART, to varying degrees. Investigate how coverage policies were formed, and push to make insurance coverage available only for treatments with good evidence^[29].
Reaching out to influential and well-networked science policy leaders like Stuart Buck (who funded the Reproducibility Project, focused on psychology) and making them aware of the reproducibility crisis in fertility.
Helping organize and fund the equivalent of SIPS (Society for the Improvement of Psychological Science) for fertility, to raise awareness in the field.

Beyond IVF

In-vitro maturation in clinical application

A more incremental type of in-vitro maturation is already in clinical use, though less so in America. It is not capable of maturing implantation-competent oocytes from primordial follicles (eg, from slices of ovarian tissue), but can take immature oocytes that have not been primed by exposure of either high-dose exogenous LH or HCG and successfully result in live births. Practically speaking, the current protocol for IVM usually involves some exposure to either HCG or FSH, but only once, resulting in less exposure to exogenous hormones.

This is a useful modality for women who are either more likely to face side-effects from traditional IVF cycles (eg, women with PCOS who have high rates of ovarian hyperstimulation syndrome) or who require fertility preservation very urgently (women with some cancers) and can’t undergo a full-length IVF cycle. It is also more affordable per cycle since it uses less medication. However, it results in fewer embryos per cycle, which makes downstream ART that relies on embryo numbers (embryo selection and editing) less effective. It has a slightly lower or similar implantation rate and a higher miscarraige rate. For children with cancer, IVF is not an option, as they have not begun puberty. IVM offers the possibility of fertility for them as well, though this work is very preliminary, and as of 2020, no patients with pediatric cancers had live offspring through this method. This appears to be the result of lower oocyte quality in pediatric ovaries.

Figure 16.

A recent cost-effectiveness analysis of IVM vs IVF found “IVM is more cost-effective than IVF at a willingness-to-pay up to €18000 for an additional child. Above €18000 IVF became more cost-effective”, a finding driven by the lower cost of IVM but the higher effectiveness of IVF, as well as a lower rate of side effects for IVM.

A recent non-inferiority randomized trial comparing IVM to IVF in a select patient population^[30] found a lower cumulative pregnancy rate at 12 months for IVM vs IVF, driven by a lower number of embryos extracted with IVM. It is important to note that multiple rounds of IVM would of course increase the number of embryos obtained, and likely increase the live pregnancy rate but would erode the cost-effectiveness and convenience of IVM (IVM involves fewer hormone injections) vs IVF.

Figure 17.

Cumulative Pregnancy rate since randomization, IVM vs IVF, from here.

Per conversation with an expert in this field (Dr. Robert Gilchrist), he thought that substantially fewer resources had been invested in IVM vs traditional IVF, making naive comparisons unfair– that is, IVF is a relatively more mature technology that has been more optimized than IVM. Hypothetically, if IVM’s success rates were equivalent to IVF, the lower cost, reduced rate of side effects, and reduced number of injections would make it clearly superior. I did not investigate IVM in sufficient depth to be confident in this argument, but it does seem plausible. Before supporting IVM research, I would recommend research on IVM success rates in animal breeding to see if it has achieved success rates comparable to IVF in that setting.

Since competence in IVM is effectively required for IVG, I think better basic science understanding of IVM-relevant topics would help IVG as well.

IVM “all the way”

A more speculative application of IVM is maturing oocytes derived from slices of ovarian tissue, a technique also referred to as “ ovarian tissue oocyte IVM or OTO-IVM”, as well as “in-vitro culture”. This ties into the topic of ovarian transplantation, which I cover below with the assistance of my colleague Mackenzie Dion.

To summarize, one advantage of OTO-IVM from ovarian tissue is that avoids the possibility of transplanting back ovarian tissue which may harbor cancer^[31]. In addition, ovarian transplantation is far from a routine procedure. Other theoretical advantages of IVM over IVF include:

There many primordial follicles in slices of ovarian tissue, and any procedure that could obtain reasonable maturation yields could produce high numbers of oocytes.This has many implications, ranging from more efficacious embryo selection to reducing the number of procedures needed before pregnancy is achieved.
Like current clinical IVM techniques, could theoretically avoid/reduce the risks of hormone stimulation.

With reliable OTO in-vitro maturation, an oophorectomy (a simple and low-risk procedure) could be performed immediately, the ovary cryopreserved, and then oocytes matured in-vitro. With reliable autotransplantation, IVM would not be required.

This kind of IVM is explored in this review. Per this review, the number of successful live births from OTO-IVM documented in the literature is eight. Overall the authors view the results as favorable:

“3% live birth rate per oocyte. This is a promising figure when compared to 4.5–6.7% LBR per vitrified oocyte reported in oocyte donation programmes”

By contrast, the number of live births from ovarian tissue transplantation is at least 130 as of 2019. OTO-IVM is a very early-stage ART, with currently very niche applications. I did not investigate the cost of OTO-IVM, but given its experimental nature, it likely requires expertise found only in a handful of fertility centers, limiting its short-term spread, and is probably very expensive. I lack enough wet-lab expertise to have a strong sense of how promising this line of research is overall, but especially given its overlap with IVG, I think it should be funded. A PHD student in a relevant discipline (working in IVG) agreed that progress in IVM would help IVG as well. He also thought that the difficulty in obtaining human tissue for experimentation, relative to mouse tissue, was the biggest barrier to faster progress in IVM, followed by the faster development timeline in mice.

Ovarian Cryopreservation

At least 75% of follicles lost during ovarian tissue autotransplantation are lost (likely due to lack of oxygen, “ischemia”) following the transplantation procedure as the graft revascularizes (forms blood vessels) and regains homeostasis. It seems possible that improving post-transplantation procedures would improve follicle preservation rates and reduce the amount of tissue lost during autotransplantation. There may be some pharmaceutical treatments that could reduce ischemic damage. A 2021 study administered N-acetylcysteine (NAC) after human ovary transplantation into immunodeficient mice and found better outcomes relative to controls.

Another potential area of improvement is the cryopreservation method. Slow freezing is currently the dominant method of ovarian tissue cryopreservation. A meta-analysis compared vitrification (ice-free cryopreservation, also sometimes called “glassification”) and slow freezing , the two main methods of ovary cryopreservation, and did not find significant differences in follicle preservation. The meta-analysis did report that vitrified tissue had less DNA damage and better preserved stromal cells. The authors suggest that this may be indicative of vitrification being a better method for preserving fertility but that their findings need to be validated in studies “with healthy live births as the primary endpoint”, instead of laboratory-measured endpoints.

Ideas

Improving recovery rate of follicles per follicle aspirated, perhaps through funding the development of better surgical tools or imaging technology (eg, something like AI-enhanced ultrasound for follicle retrieval) for in-vitro maturation.

This was something Robert Gilchrist mentioned

Prize for successful human IVM from primordial follicle (through prize authority of federal government or private donors, if the goal is too controversial for public funding)
I don’t know enough about the research in IVM in general and OTO-IVM in particular to know if Jack WIlkinson’s general methodological critiques apply to it: my guess is that they do. In that case, some of the same ideas raised in that section could be applied here.
A more in-depth investigation of ovarian cryopreservation and autotransplantation, specifically to ask:

Are there any well-validated surrogate endpoints for live births?
What is the current state-of-the-art for organ cryopreservation in organ transplantation? Does ovarian cryopreservation follow those methods? Other organs are generally not frozen, merely kept a few degrees above freezing and I’m unaware if this has been a line of research pursued for other organs.

From Mackenzie: supercooling organs to a few degrees below zero has been done, which can extend transplantation viability to a few days, but this method isn’t compatible with long-term storage for months/years. The consensus is that you need to vitrify organs for long-term storage/recovery. The upshot is that cryopreservation vs extending organ viability are quite different problems and shouldn’t be compared.

How much ischemic damage is seen in other organ transplantation surgeries? Is the ischemic damage observed in ovarian cryopreservation particularly high relative to other organ transplantation outcomes?
Once those questions are answered, a more informed answer on whether it is worth funding further research in it would be possible.

Mitochondrial replacement therapy

As briefly mentioned in the reproductive aging section, one hypothesized mechanism for the decline in oocyte quality with age is mitochondrial dysfunction driven by mutations in mitochondrial genomes. This article makes the argument at length, and summarizing the key points of evidence in favor:

Mitochondrial genomes have higher mutations rates than nuclear genomes
During the process of oogenesis, oocytes are less dependent on oxidative phosphorylation, and hence, less vulnerable to mitochondrial dysfunction
This allows for the accumulation of oocytes with dysfunctional mitochondria (a form of “relaxed selection”)
Mitochondrial function becomes more important beginning with fertilization.
Older women appear to have higher rates of mitochondrial mutations and markers of higher mitochondrial activity appear to predict higher embryo implantation success.

The authors subjected these ideas to testing, and found that, as expected, there was strong purifying selection against mutations in mitochondria beginning with fertilization.

An intriguing possibility is treating mitochondrial dysfunction through replacing mitochondria with mitochondrial replacement therapy, in which nuclear DNA from either an embryo or egg is extracted and placed into a donor cytoplast (oocytes that have had the nucleus removed) containing wild-type mitochondria.

This method has been used in some jurisdictions for treating mitochondrial disease. While legal in the UK and several other jurisdictions for this purpose, the FDA is currently barred from considering applications for MRT, so it is effectively illegal in the US. In 2019, the Senate came close to allowing the FDA to consider applications for MRT, but reversed course at the last minute, so it is still effectively illegal to run clinical trials involving MRT in embryos in the US. This Vox article provides an accessible summary of the regulatory issues up to 2018.

In the late 1990’s, a related technique involving a small-volume injection of donor cytoplasm into patient oocytes was trialed on patients who had experienced repeated multiple implantation failures. Of the 7 patients none had a live birth, though 4/30 embryos resulted in successful implantation (with later miscarriage). The authors frame this as a preliminary sign of success. However, the small sample size and lack of successful live birth, do not strike me as especially promising evidence. There were a few successes with a similar approach (1, 2) with later work in the 2010’s using more sophisticated methods.

A promising case study along these lines published in 2016 treated a woman who had previously had two IVF cycles in which all of her embryos arrested at an early (two-cell) stage. In the third IVF cycle, her embryos’ pronuclei were transferred to enucleated donor oocytes, subsequently producing 5 apparently health embryos for transfer, and resulting in a pregnancy– though tragically, the three embryos that successfully implanted failed to produce a live birth. The mitochondrial DNA of the embryos matched the donor mtiochondrial DNA, implying absent or low levels of parental mitochondria. A later case study published in 2017 by the lead author (Dr. John Zhang) on the previous paper used MRT to prevent the transmission of a mitochondrial disease, resulting in the birth of an apparently healthy boy. A Ukrainian clinic that made headlines in 2018 for producing a 3-parent baby later presented data from 30 women showing that MRT did not improve fertility in older women– with the caveat that the study was small-scale and tried 5 different methods of MRT, implying that an optimal technique has not yet been developed.

Researchers at OSHU’s Center for Embryonic Cell and Gene Therapy, led by Shoukhrat Mitalipov and Dr. Paula Amato, have recently published some promising work on rhesus macaques (nonhuman primates) demonstrating that their MRT technique appears effective and safe. The same center is also pursuing an IVG method that induces haploidy in somatic cells through transplanted somatic nuclei in mature oocytes.

With the caveat that I did not explore this issue in-depth, my overall impression is that the significant uncertainties regarding the contribution of mitochondria to aging remain unanswered, but that this area of research seems promising. My key uncertainties, in order of importance:

Overall, how much does mitochondrial dysfunction contribute to age-related infertility in humans? Eg, how much does giving a 40-year old woman’s oocytes the mitochondria of a 20-year old donor improve her fertility?
Conditional on mitochondrial dysfunction playing a substantial role, how efficient can MRT techniques become? How many donor eggs would be required for a successful enucleation?
Could artificially generated oocytes be used to substitute for donor eggs as a means to reduce the cost?
How much do differences in mitochondria, within the normal human range, contribute to individual differences? If they play a large role, selection of the proper donor becomes more important.

In-Vitro Gametogenesis

Figure 18.

From here.

Overview

In-vitro gametogenesis is the production of gametes from somatic cells through laboratory techniques instead of natural developmental processes. Such a technology would address many different causes of infertility at once and also synergize extremely well with other reproductive technologies like embryo editing and embryo selection. It would also enable cross-sex gamete production for same-sex couples. IVG has been achieved in mice and has resulted in live (apparently) healthy offspring with the ability to have offspring naturally themselves. Several different academic labs are focused on achieving human IVG, as well as at least 3 different startups as of 2022: Conception, Gameto, and IvyNatal, and another, Renewal Bio, appears to be aiming at a similar goal.

The idea behind IVG is to take a somatic cell, transform it into an induced pluripotent stem cell (iPSC), transform that into a primordial germ cell-like cell (PGCLC) and then differentiate it in-vitro into the desired germ cell. IVG is usually divided into the production of PGCs and the subsequent differentiation of PGCs into sex-specific gametes (oocytes and spermatozoa). In mice, oocyte IVG is currently more advanced than sperm IVG, since the latter still seems to require an in-vivo step, such as transplantation into a mouse testis, for full maturation.

A reason to be optimistic about IVG being successful eventually is that there are at least two promising techniques being used: one involves genetic manipulation of transcription factors; the other only various signaling factors and chemical inhibitors being added to a culture medium (eg, this paper generating functional oocytes from adult granulosa cells). IVG should be distinguished from in-vitro maturation, which is concerned with maturing primordial follicles from ovaries into fertilization-competent oocytes.

In addition to the human ART applications, IVG would accelerate animal breeding efforts, especially if full iterated embryo selection could be achieved in-vitro. It would also aid conservation efforts for endangered animals, and more speculatively deextinction efforts . Thus, it seems likely that even if human IVG is more difficult than expected, such that current efforts fail, there will be substantial scientific and economic interest in IVG technology as a whole.

Animal Studies

IVG in mice has resulted in the production of healthy fertile offspring, who have themselves had healthy offspring. Both oocytes and sperm have been successfully produced from somatic cells, though sperm maturation (as far as I know) required transplantation into an in-vivo testis. Mouse oocytes have been successfully produced in combination with fetal ovarian somatic cells, which are required for proper maturation of oocytes; recently, however, mice fetal ovarian somatic cells have been successfully generated from somatic cells, theoretically removing the need for any fetal ovarian tissue at all for mice IVG. A recent preprint accomplished something similar in human cells, producing granulosa-like cells (which surround oocytes in-vivo) from human iPSCs, though this technique did not succeed in advancing primordial germ cells into later stages of maturation. Another recent preprint by the same group^[32] developed a faster method to produce human oocyte-like/oogonia-like (pre-meiotic) cells from iPSC’s and related cells. There are also some related reproductive “tricks” that have succeeded recently, such as inducing parthenogenesis in a mammal.

Brief review of in-vivo IVG in mice, paraphrased/copied from here:

Primordial germ cells (PGCs), derive from the pluripotent epiblast, on embryonic day 6.5

These are not sex-specified yet
This differentiation is spurred by BMP4, and also governed by Prdm1, Prdm14, Tfap2c, Nanog

These PGCs continue to differentiate, which involves expressing: Dazl, Ddx4, and other sex-dependent factors
These PGCs migrate towards the embryonic gonads (which will become the testis or ovary), and proliferate. This proliferation involves genome-wide epigenetic reprogramming.
Once they reach the gonads, they undergo sex-dependent differentiation

In the testis, PGC’s continue to reproduce, and then they arrest at G1 and become prospermatogonia, and male-specific epigenetic modifications occur
In the ovaries, PGC’s stop reproducing and enter meiosis, and become primary oocytes.

Some of the spermatogonia become spermatogonial stem cells that can renew and also differentiate
At the perinatal stage, 70% of the primary oocytes apoptose, and the remaining oocytes form primordial follicles with the surrounding somatic cells, termed “squamous pregranulosa cells”.
At puberty, some of the primordial follicles are periodically activated, and the oocytes then undergo oocyte growth, which involves storing lots of maternal protein/RNA in cytoplasm and undergoing female-specific epigenetic modification.
During this oocyte growth, they have a large nucleus, called a “germinal vesicle”. Once oocyte growth reaches a plateau, oocytes resume meiosis, signaled by GV breakdown , and then arrest again at MII.

Figure 19.

Work in the 2010’s used embryonic ovarian somatic cells to transform PGCLC’s into oocytes, which resulted in apparently healthy offspring in mice. An important advance, published in 2021, was the generation of fetal ovarian somatic cells from embryonic stem cells, potentially eliminating the need for using fetal ovarian somatic tissue. The advantage is that the primordial germ cells generated don’t have to be placed within embryonic mouse tissue to properly differentiate, because equivalent gonadal somatic tissue, which is needed to stimulate proper differentiation of primordial germ cells, can be generated from pluripotent stem cells, termed fetal ovarian somatic cell–like cells (FOSCLs).

Challenges

Figure 20.

From here.

An important caveat is the efficiency of IVG techniques, so far. From the same paper as above:

We then used mature COCs from rOvarioids for in vitro fertilization (IVF) using wild-type sperm from ICR mice. In IVF followed by in vitro culture, oocytes were fertilized, and 30.2% (301/996) of oocytes used in the IVF became two-cell embryos (Fig. 4D and table S2). Then, 25.8% (24/93) of the two-cell embryos developed to blastocysts (Fig. 4D and table S3). This developmental rate from twocell embryos to blastocysts was comparable to that observed in embryos derived from reaggregates using E12.5 gonadal somatic cells in our previous report (2) (31.8%, 44/138; P = 0.397 by Pearson’s chi-square test). When the two-cell embryos were transferred into pseudopregnant females, 5.2% (11/212) of the embryos gave rise to offspring and all of them developed to adult mice

This method resulted in a 5% rate of live births per embryo transferred at cleavage stage. For comparison, an IVF cycle using donor eggs^[33] had a live birth rate per embryo transferred (per Table 2 of this paper) of 50-70%, depending on if PGS was used. For a more realistic estimate, per Table 2 of a different paper, the pregnancy rate^[34] embryo transfer for women >40 was ~20%. Even if we assume a miscarriage rate of 30%, that would result in a live birth rate for women >40 of ~14%, far better than the 5% achieved above.

However, the above comparison may not be fair, since the mouse embryos transferred were two-cell embryos, not blastocysts. Using data from this paper:

Figure 21.

A 5% implantation rate for two-cell (cleavage stage) embryos is comparable to the per embryo implantation rate for women over 41. Presumably transfer of later-stage embryos would increase the per-embryo success rate, and reduce the number of failed transfers. Regardless, the overall point is that if IVG methods produce embryos with very low implantation rates, they will need to produce them in large quantities, and relatively cheaply, for it to replace IVF for most people. Some customers may not have other alternatives, such as same-sex couples or women with certain ovarian issues, so a 5% success rate may be acceptable for them.

IVG has also produced fertile offspring in rats, though it required substantial changes in the process.

Apart from low efficiency, clinical use of IVG in humans faces three other important challenges:

epigenetic/imprinting defects in offspring
Somatic mutations
Chromosomal instability

As part of germ cell development, gametes undergo genome-wide epigenetic reprogramming. If this process does not occur correctly offspring can be born with imprinting defects. I am unsure how powerful our forensic methods are for detecting epigenetic abnormalities and thus unsure how well this could be detected prior to clinical trials. While some sequencing methods can track methylation patterns (eg, MethylSeq), it does not appear to be in use in pre-implantation genetic testing. There is a case report on using PGT to prevent an imprinting disorder, but it does not appear to have used methylation sequencing. A reason for optimism re: epigenetic reprogramming is that a recent study inducing parthenogenesis in a mammal, which resulted in viable offspring from female gametes, was accomplished through targeted DNA methylation editing.

The 2nd challenge is somatic mutations. Organisms accumulate de-novo mutations as a result of errors in the DNA replication process in their parent’s germline. Estimates of how many de-novo mutations germ cells carry relative to their parents, per generation, vary, and there are also likely individual differences in germline mutation rates. Following fertilization, an organisms’ cells also accumulate somatic mutations. Compared to an organism’s germline, somatic cells have substantially more mutations.

There are no firm estimates of how much disease burden de novo mutations are responsible for, but there is substantial evidence that they play a role in many cases of intellectual disability, sudden infant death, and other genetic disorders. This is corroborated by studies finding higher rates of autism and other disorders in offspring of older parents, as well as whole-exome trio studies on children with unexplained disorders and their parents.

Thus, a large increase in the number of mutations an organism is expected to have is a cause for concern.

Compared to nearly all tissues, germline cells have a much lower mutation rate per year. From this review, table 1:

Figure 22.

Assuming a 35 year-old patient using IVG and a similarly aged partner using their own sperm, and the following values for germline mutation rates:

1-3 additional de-novo mutations for paternal age at conception, 1.5/yr as midpoint
0.24 additional de-novo mutations per maternal age at conception
Assuming no additional errors are introduced by the IVG process
Using skeletal muscle satellite cells as the reference, since they are more accessible than kidney tubules or bile ductules, but lower mutation rate than other tissues

This would result in:

IVG: (12 x 35) + (1.5*35) = 472.5 de novo mutations
no-IVG: (.24*35) + (1.5*35) = 60.9 de novo mutations

Under these assumptions, an embryo generated from IVG would have approximately 400 more de-novo mutations in their germline than an embryo generated naturally from equivalent aged parents. If a more mutation-prone tissue than a skeletal muscle satellite cell is used, the difference would be even larger. I am very unsure what impact 400 extra germline mutations would have, on average, but my initial guess would be substantially higher rates of disorders that correlate with higher parental age.

Somatic mutations do not arise completely at random in the genome and are also subject to natural selection^[35]. De-novo germline mutations also have a bias^[36] towards certain mutations. Thus, the estimated 400 extra mutations that an IVG-generated embryo would carry would likely be substantially different, on average, than 400 extra de-novo mutations generated through the non-IVG (natural) process. One person I consulted on this question thought the mutations present in somatic cells would be more likely to be damaging than those in germline cells, under the following reasoning, paraphrased:

CpG islands are enriched upstream of genes and used to regulate gene expression and differentiation. Methyl-CpG are CpG sites that have been methylated.
These methylated CpG islands (methyl-CpG sites) are (one of a few mechanisms) used by somatic cells to control tissue differentiation by silencing regulatory regions for genes of tissue types that the cell isn’t. Eg, a skin cell will silence genes specific to heart, lung, brain etc. by methylating these “CpG islands” upstream of the gene.
This also causes hypermutation (higher rates of mutations) at those sites, because DNA repair enzymes don’t recognize methyl-CpG as efficiently. These sites are thus prone to C–>T mutations, which similarly silence the affected gene but in a permanent fashion.
This is not an issue for differentiated cells, where those mutations mostly occur in genes that are not relevant for a given tissue type, and cells never need to be reprogrammed/de-differentiated in nature so it’s not usually a problem.
By comparison, mutations in germ cells are more randomly dispersed throughout the genome, because they have much less regulatory CpG methylation. Thus germ cell mutations are much more likely to be truly random errors, which are less dangerous on a per-mutation basis, often happening in “junk” DNA regions and not in critical CpG regulatory regions.
He thinks an important reason germ and early embryo cells have next to no methylation, is to prevent too much mutational burden from methyl-CpGs spontaneously deaminating. This is somewhat similar to how eggs will slow/stop their metabolism to prevent too much damage from occurring between generations.

Generating embryos with high rates of de-novo mutations can be fixed, theoretically, through a combination of embryo selection and editing. Embryo editing could fix mutations directly, while embryo selection could be used to select for embryos with fewer mutations and/or mutations that are predicted to be less damaging or neutral.

Gene editing efficiency with current technology is moderate, so not every edited embryo will have the desired edit and some will have off-target edits (low accuracy). Given that, multiple embryos will have to be generated, and subsequently edited, to produce an embryo with the desired changes. Assuming that cost scales with the number of embryos produced, this will raise costs. Embryo selection, as well as confirmation of desired edits, would presumably require embryo sequencing, which is currently performed through trophectoderm biopsies on embryos that are 4-6 days old. Each editing step, as well as each sequencing step, adds to IVG costs.

Theoretically, IVG generated embryos, even with higher rates of mutations than naturally generated embryos, might still generate apparently healthy offspring, since IVG-generated mice and rats are apparently healthy^[37]. Another reassuring datum is that cloned polo horses (apparently from a skin sample, and presumably suffering from higher rates of somatic mutations as IVG derived embryos would...) can perform at very high levels, which implies impressive physical and mental performance. It thus seems very unlikely to me that the number of mutations that IVF derived embryos carry would preclude healthy development, with the caveat that cloning tends to have a low success rate– likely implying a high rate of attrition of embryos carrying especially damaging mutations. My uninformed guess is that regulators might demand approximately similar numbers of mutations between IVG-generated and naturally-generated embryos, or strong evidence to show that the mutations they carry are likely to be low-risk.

While current IVG methods in mice do not appear to cause chromosomal instability, one potential problem with embryo editing (which might be required to fix somatic mutations) is that it requires^[38] the culture of embryonic stem cells for prolonged periods of time, especially in the case of multiple edits. This prolonged culture seems to cause chromosomal instability through large-scale rearrangements. Since large-scale rearrangements are likely incompatible with embryo implantation, this is an important obstacle. However, there is some recent work by the Serrano lab (and likely other groups I’m not aware of) that shows proof-of-concept that human naive pluripotent stem cells can be cultured for a prolonged period of time while preserving genomic stability.

A point made to me repeatedly by two subject-matter experts in the IVG space was that there was a diversity of approaches to IVG, which raises the probability that at least one succeeds. One method is a chemical reprogramming approach, while the other is a genetic reprogramming approach. They also saw the success of IVG in multiple different animal species as another reason for optimism.

Jeff Hsu, CEO of Ivynatal, identified the following problems as the most central to clinical use of IVG in humans:

Yields at each step may be too low for cost-effective production
Somatic cells have more mutations than germ cells, resulting in a high mutation burden in resultant cells
Some quality control measures, like sequencing, are expensive if done at current prices and at multiple steps.

A PHD student working with Gameto, Merrick Smela, identified the following as problems slowing human IVG research:

Availability of human oocytes and fetal ovarian supporting cells (such as pre-granulosa/granulosa cells) for study

Sequencing costs

One challenge that seems important, though I have not brought up to subject-matter experts, and may have an easy solution, is where to obtain the Y chromosomes for females who wish to generate sperm and how to transplant it in– the latter seems like a more difficult technical challenge. There are men who have the XX karyotype (typically found in females) with the SRY region (usually found on the Y chromosome), who are phenotypically male. This might suggest that merely editing in an SRY copy would go a long way towards producing sperm, but in fact there are several regions on the Y-chromosome that are important for sperm production. It seems likely that a full Y chromosome would be required for sperm production.

Using a generic Y chromosome might not be hugely concerning, because there aren’t many genes on it outside of the sexual differentiation region, but I’m not sure how well-studied it is. My understanding is that standard GWAS doesn’t capture sex chromosomes, so our understanding of them (X and Y chromosomes) lags substantially. Thus, we may be somewhat underestimating how important they are for traits.

When I spoke with Prof. Haiqi Chen, who studies spermatogenesis in his lab, he had longer timelines (10 years for in-lab success in sperm development, 20 years for clinical trials in humans) than Matt Krisiloff of Conception or Jeff Hsu of IvyNatal for IVG. A recent paper he published:

Dissecting Mammalian Spermatogenesis through spatial transcriptomics

Basic idea is that since the function of the testes is tied closely to their spatial organization, we need new methods to understand what is going in that specific context.
In that paper they built an atlas; they find differences between mouse/human testes; found possible diabetes-->infertility mechanism
They confirmed their method works by recapitulating what is already known.

He generally thought we needed much better understanding of gametogenesis before it would be advisable to do so in humans. He pointed to evidence of higher rates of imprinting disorders in offspring born from IVF as proof that we need better understanding before moving forward. A project idea he was excited about was scaling up the sperm atlas work he had done in his lab.

Applications

IVG would enable several unique applications:

It would allow same-sex couples to have children that are equally biologically related to both partners.
It would address the long tail of genetic infertility in men and women, which is otherwise difficult to address one cause at a time.
It would address reproductive aging on the embryo side.

If IVG could be done on a large-scale, it would make embryo selection substantially more effective, as is outlined in this article and in the academic literature. Embryo editing, to the degree it is limited by the number of embryos available^[39], would become more practical.

The most impactful, but also most speculative, application of IVG, would be iterated embryo selection (IES). IES would require the generation of gametes in-vitro, fertilization in-vitro, and then the production of gametes from those embryos. This would enable multiple generations of selection to occur in-vitro. A sketch of such a scenario can be found here, and a detailed exploration here. The TL;DR is that IES, in combination with even mediocre genotype-phenotype prediction methods, would enable very large changes in traits, equivalent to many generations of selective breeding. The large numbers of embryos and general positive manifold between socially desired traits would reduce the possibility of having to make substantial tradeoffs on traits.

There are some important caveats to IES: over time, recombination would break up the tagging SNPs that current PRS scores are based on, though this could be remedied through more fine-grained GWAS’s; the genetic variance to select on will theoretically eventually be exhausted, and the unknown unknowns of in-vitro IES.

Practically, IES would require achieving IVG and fixing the problems associated with culturing embryonic stem cells for prolonged periods of time. The costs associated with multiple generations of in-vitro culturing are likely to be very substantial relative to one cycle of IVG, and one natural way to reduce the per-unit cost of IES for gamete generation is to create a “stock” of optimized embryos to generate gametes from, instead of generating gametes from each customer. However, this comes with the same downside of using donor eggs/sperm, eg, reduced relatedness.

Even if clinical use of IVG in humans takes much longer than anticipated, being able to generate human oocyte-like cells in-vitro would be an important research advance. Per 3 of the subject-matter experts I spoke to on IVG, limited availability and high costs of obtaining human fetal tissue and human oocytes slow fertility research. Being able to generate oocytes and human fetal tissue more cheaply, and without the ethical issues that some view as accompanying naturally derived oocytes and embryos, would accelerate research downstream of that input. Depending on how well they function, these oocyte-like cells, even if they are incapable of fertilization on their own, might be suitable material for somatic cell nuclear transfer, which would be a substantial advance by itself.

There are cases of male infertility that cannot be bypassed through existing methods. Primarily these are genetic cases where sperm production does not occur at all or stops at a very early stage, such that these immature sperm cannot successfully fertilize an egg. Because the causes of these sperm developmental failures are heterogeneous, only an intervention like IVG, which sidesteps this stage, seems likely to fix all these issues at once– and help same-sex female couples as well, who currently must use sperm donors.

Solutions?

Caveat: The IVG space is a rapidly advancing field and I don’t think I achieved sufficient subject-matter knowledge to give confident recommendations on rate-limiting steps. I think a person with more reproductive biology knowledge could come up with better recommendations. That said, the ideas below seem sensible and/or came from people directly involved in the field.

Comprehensive report on the laws and regulations on clinical IVG use, especially comparing different jurisdictions. The CEO of an IVG company noted this was something they planned to do internally eventually, but only once they were close to clinical trials. Clarifying this in-advance seems useful, since if regulations limit clinical application, knowing this in advance, and planning ways to change it, would save time.

More ambitiously, this might consist of putting together a group of high-prestige experts to engage the FDA on specific regulatory questions like “what is the minimum set of animal and in-vitro experiments they would demand for human clinical use of IVG?” Since human IVG seems likely to be controversial, and perhaps politicized, obtaining regulatory guidance before any controversy, seems wise.
Since the editing of human embryos for subsequent clinical use is illegal, clarifying if this applied to editing to fix mutations induced by the IVG process (eg, somatic mutations as described above) would also be valuable.
There is a possibility of a backfire effect here and one CEO told me he regretted speaking to regulators informally on this subject. He thought that waiting till the science was much more advanced was a better idea. Others I spoke to echoed this sentiment.

Hiring a subject-matter expert to survey researchers and ask for scientific public goods that would help IVG research specifically. Merrick had some ideas along these lines, but asking for a handful from diverse labs would be helpful.
Do single-cell sequencing on lots of different reproductive cell types. This would require obtaining human fetal tissue, which is difficult with federal funding rules. There’s a recent academic effort that is similar to this, the human reproductive cell atlas, so I’d want clarification on how this is different from that before pursuing this. Per Merrick, the reproductive cell atlas has “exceeded his expectations” so he recommends pursuing the next idea instead.
Making a well-characterized iPSC cell bank for research. There are lots of reproducibility issues that stem from cell lines that are slightly different, that results in protocols not being robust. Also, having standardized reporter cell lines would be helpful, since in the current state of affairs, labs need to reengineer reporter cell lines of interest, which takes a few months and doesn’t always work reliably.
A large-scale project focused on accelerating non-destructive ways to image and screen embryos and gametes.

Here is some distantly related work as an example. In that study, RNAseq profiles could be predicted from Raman microscopy data with machine learning: “... spatially resolved single33 molecule RNA-FISH (smFISH) data as anchors to link scRNA-seq profiles to the paired 34 spatial hyperspectral Raman images”.
This would be useful for the numerous quality control steps that are anticipated to be required in IVG, and might also be useful for gamete selection. Jeff Hsu was excited about this idea. When I brought this up to scientists/founders with an expertise in single-cell transcriptomics, they said the group that did that paper is very reputable; the work is still quite preliminary; and this kind of technology would help CAR-T-cell quality control as well.
An important caveat is that this work focuses on predicting gene expression, not genetic variants; additionally, gametes may not express many genes compared to somatic cells further limiting its applicability.

Lower sequencing costs would reduce quality control costs, which are anticipated to be high. I don’t have any specific projects in mind here. There have been some recent advances expected to lower sequencing costs eg, Ultima Genomics.

Miscellaneous papers

Here are a few important recent papers on human IVG, courtesy of Jeff Hsu (CEO of Ivynatal), that were especially helpful in this section, and could be useful further reading:

Using a CRISRa system, which can selectively activate transcription factors, human PGCLCs were generated from human embryonic stem cell lines. This 2022 paper also identified some important differences in the transcription factor network in humans vs mice embryos. My read of the paper is that the significant advance here is the method: “our re-designed CRISPRa and CRISPRi systems that allow efficient multiplexed modulation of cis-regulatory element”, which can be used to more directly understand the hPGCLC and hPGC differentiation conditions, instead of relying on the “black box” of human fetal ovarian tissue, which is also difficult to obtain.
This 2022 paper managed to produce 8C-like cells (8CLCs), which are totipotent, from human pluripotent stem cells, without knocking in any genes, just by manipulating culture conditions. Though this capability (of producing totipotent cells) is not required for producing primordial germ cells, this does demonstrate the ability to induce substantial reprogramming through culture conditions alone. It may also be helpful for understanding placental biology better, since totipotency is required to form extraembryonic tissue like the placenta.
In-vitro maturation and transplantation of cryopreserved ovary tissue: understanding ovarian longevity In-vitro maturation refers to the maturation of oocytes outside of normal conditions. In addition to being a prerequisite to successful use of IVG, which results in immature oocytes, IVM without IVG would still be a substantial advance, as it would enable more free use of ovarian tissue. The current standard-of-care for fertility preservation for women undergoing cancer treatment is a rapid IVF course, followed by treatment (surgery/chemotherapy). This delays cancer treatment and produces only a limited number of eggs. Once cancer treatment has been completed, there may be substantial depletion from chemotherapy.

Sperm selection

Men usually produce hundreds of millions of sperm in their ejaculate, only one of which fertilizes an oocyte. Fertility physicians have long been interested in choosing the “best” sperm for fertilization when using methods like IVF and ICSI that remove the need for natural fertilization.

Sperm can be measured at-scale on a variety of traits, like motility and shape (“morphology”). If there is even a minor correlation between an easily measured sperm phenotype and offspring outcomes, the large number of sperm would allow for significant improvements in offspring quality. Gwern outlines this in quantitative fashion here. If sperm selection could reduce miscarriage rates or fertilization failure, and increase live pregnancy rates, that would be an additional incentive for its routine use in IVF.

Several sperm characteristics have been investigated:

Sperm motility

Sperm are motile, and there are two main methods designed to select for more motile sperm in clinical use density gradient centrifugation and swim-up. There are some newer methods that may be better but still need to be translated into clinical use, such as the Sperm Syringe, which causes less disruption to sperm integrity and more effectively enriches highly motile sperm with high DNA integrity.

Sperm charge

Zeta potential is the electrical potential between the sperm membrane and its surroundings, which appears to be lower in more mature and functional sperm. With an electrical charge, the mature sperm, which are more electronegative, can be separated from immature sperm.

Sperm morphology

The World Health Organization has guidelines on how to analyze sperm, including sperm morphology.

Many of these sperm selection methods seem to be evaluated on the basis of their effects on surrogate measures instead of results like live birth rates, measures of offspring health, or miscarriage rates.

One surrogate measure that is examined frequently is sperm DNA fragmentation (SDF). There are numerous ways to test for SDF, such as the sperm chromatin structure assay (SCSA), Acridine orange test, Sperm Chromatin Dispersion (SCD) Assay, Aniline blue staining, Terminal deoxynucleotidyl transferase dUTP nick end labeling (TUNEL), and more. Advocates acknowledge that SDF has a mixed evidence base, though it seems to be useful in predicting some cases of unexplained infertility and identifying men who may be good candidates for some surgeries. The 2021 WHO laboratory manual on sperm examination classifies SDF as “not necessary for routine semen analysis but may be useful in certain circumstances for diagnostic or research purposes”.

Overall, however, many of the surrogate measures that sperm selection has so far been evaluated on are not reliably tied to clinically important outcomes like live birth rates, miscarriage rates, etc. A recent proposal to apply machine learning to sperm selection, though something I’m generally excited about, relies heavily on these surrogate measures, with only a brief mention of measuring the relevant clinical outcomes.

There is some evidence from animal studies that non-destructive sperm selection based on sperm phenotype may have beneficial effects on offspring. A series of experiments in zebrafish showed that selecting for sperm longevity (how long after activation sperm can fertilize) can change the phenotype and genotypes of offspring. Similar work has been done in other animal breeding work, though I did not investigate it in-depth. However, my overall impression of the literature in humans, which was confirmed by people in the ART field but not in sperm biology specifically, is that there is no obvious best way to select sperm, and the quality of the evidence is low.

A Cochrane review of the evidence in 2019 came to a similar overall conclusion:

“The current evidence suggests that advanced sperm selection strategies in assisted reproductive technologist (ART) may not result in an increase in the likelihood of live birth. The only sperm selection technique that potentially increases live birth and clinical pregnancy rates is Zeta sperm selection, yet these results were of very low quality and derived from a single study, therefore we are uncertain of the effect...evidence gathered was of very low to low quality. The main limitations were imprecision associated with low numbers of participants or events”

A similar conclusion was reached in a 2020 Cochrane review on IMSI, a modification of the original ICSI technique in which a much higher magnification (6000x) is used to select sperm instead of 200-400x magnification used in ICSI. The higher magnification enables a more fine-grained analysis of sperm morphology than the usual ICSI methodology.

Other possibilities

If sperm could be non-destructively sequenced, that would enable direct selection along similar lines as embryo selection. One speculative possibility is capturing spermatogonia before they undergo meiosis (or making spermatogonia with IVG) and destructively sequencing 3 of the 4 sibling gametes, and then inferring the genetics of the remaining gamete. I am unfamiliar with how feasible this project is, but I suspect we would need substantial advances in sperm maturation and culture methods to be able to keep spermatogonia alive in culture, as well as advances in microfluidics to capture sperm. This would also involve substantial sequencing costs. However, as Gwern outlines in his piece, gamete selection can result in larger gains than embryo selection, and combining the two is even more powerful.

Solutions

The Cochrane review argues that the evidence as of 2019 on sperm selection technology for ART in general was very poor. If that assessment holds in 2022, a reproducibility-focused project for sperm selection would be a good idea. Consult someone like Dr. Daniele Teixeira, who wrote the Cochrane review on IMSI, as well as a subject-matter expert on sperm selection, and sketch sample size necessary, etc.
A project that carries out single-cell sequencing of sperm from a diverse sample of deeply phenotyped men that also tries to obtain clarity on within-individual, between-sperm correlations between sperm phenotypes and individual genotypes. This is similar to what gwern proposes in his notes here.

Similar projects along these lines would be sequencing sperm that have been sorted by any number of promising sort-on-phenotype methods and giving lower and upper bounds on how strong any sperm-offspring correlations could be.

Applying Raman microscopy (linked in IVG section) and machine learning to sperm, in combination with sperm sequencing, to see if any genetic-level information can be picked up. Eg, is there any correlation between the type of RNA activity that Raman microscopy can hypothetically pick up and genes of interest? In more detail: apply Raman microscopy, which is non-destructive, to a large sample of diverse sperm, which are then destructively sequenced. Train an ML model on the microscopy data to see if any genetic information can be inferred from the microscopy data. A caveat here is that sperm genomes are kept compacted and relatively inactive, so we might not expect much RNA activity compared to somatic cells.

Embryonic stem cell nuclear transfer/ embryo editing

A technology that might enable some degree of embryo editing, whether for disease prevention or for other traits, is “engineered embryonic stem cell nuclear transfer”– using an edited embryonic cell as the nuclear donor in nuclear transfer. This method was described to me by Max Berry, a bioengineer. It is in contrast to He Jiankui’s method, which relied on a single application of CRISPR editing to an early-stage embryo via microinjection, resulting in substantial mosaicism and potential off-target mutations.

Here is his sketch of this idea:

He proposed extraction of cells from an early-stage embryo (potentially, one pre-selected from a batch of embryos using PGT), and growing those ESCs in vitro. Extensive editing can be performed on cells in a dish, which can be kept stable and growing in tissue culture for months. After modification, they can be seeded monoclonally and expanded so that several hundred ‘colonies’ of modified cells are derived, each from a different ‘parent’ cell. Genome sequencing a portion of each colony will confirm the correct edits and lack of off-target effects for all cells in the colony, as they are all genetically identical.

When a colony of cells possessing all the correct edits and no genetic damage is identified, one cell from the colony has its nucleus transplanted into an enucleated egg cell. This procedure is identical in principle to somatic cell nuclear transfer, the difference being that in SCNT the egg host must reprogram a terminally differentiated nucleus 100% correctly. In this technique, by contrast, the embryonic stem cell donor nucleus is already 99% of the way to having a correct epigenome for becoming an embryo. Thus the extremely low efficiency of SCNT is bypassed.

The last hurdle to implementing this technique was extended in vitro ESC culture, specifically the maintenance of epigenetic imprinting fidelity. This was recently overcome (see this paper from the Serrano lab), meaning that there are no major technical breakthroughs required for this technique to produce viable modified human embryos. In addition, recent advancements in de novo embryogenesis (seen here) may mean that nuclear transfer can itself be skipped, and the modified ESCs can be cultured to form an entire viable embryo on their own.

This technique would potentially be far superior to CRISPR microinjection, albeit with somewhat more lab work involved. However, at scale the expense should not be wildly more than that of traditional IVF, especially considering that microinjection or PGT also require the expense of IVF regardless.

There are two main benefits over microinjection:

Allows for far more extensive genome modification to occur, as cells can be edited in vitro over long timeframes, meaning that entire multi-gene insertions and/or dozens of point mutations can be made simultaneously and/or sequentially. In microinjection, only a scant few edits are possible, often only knockouts, of which there are a limited number of useful targets.
No risk of mosaicism or off-target effects. Unless you destroy the entire embryo for sequencing, microinjection cannot ensure whether all of its cells received the edits, or are carrying deleterious off-target mutations. This is a fundamental limit of the CRISPR technique, making it arguably unacceptable for creating humans.

[author’s note: everything from “He proposed....creating humans” is Max’s]

To contextualize/caveat the above, Merrick estimated it would cost about 10,000$ to edit via HDR (homology-directed repair) a single specific variant into a stem cell line, including verification of edits with whole-genome sequencing and labor costs. Single-base modifications, as opposed to the HDR above, would be cheaper, he thinks. IVF is probably about $20,000, with about $7,000 per additional cycle. At scale, and with substantial capital costs to pay for automation, costs would likely decrease substantially below $10,000 per edit. However, the $10,000 estimate does not take into account the extra stringency that FDA oversight (eg, CLIA) would bring, so that estimate is more of a rough guess for embryo editing done without FDA supervision (in other words, illegally in the US, or abroad in jurisdictions that are more friendly to germline editing). Note that sequencing does not need to be performed after every single edit, so multiple modifications could theoretically be made in parallel.

Also relevant: a recent paper in mice placed nuclei from somatic cells into oocytes in metaphase II and succeeded in induction of haploidization (generating a haploid genome, like that contained in gametes, from a diploid genome), generating an oocyte which could then be fertilized with sperm and produce live offspring. While donor human oocytes are expensive, using artificial oocytes produced from IVG might reduce the cost, as the authors’ papers note. It is unclear to me if chromosomal crossing over and recombination occurred in this process.

Critics have argued that PGT can prevent practically any genetic disease from being transmitted and thus, that heritable germline editing for genetic diseases is not necessary. There are specific cases where PGT does not work, such as one of the parents being homozygous for dominant diseases, or in older women where the number of embryos is low, but it is true that heritable germline editing would only be truly necessary for disease prevention in a relatively small number of patients.

Legality

Human embryo editing in the US is currently illegal, since FDA approval would be required to perform a clinical trial, and the FDA is banned by Congress from considering any clinical trial applications that propose doing so. In June 2019 Congress again voted to ban the FDA from considering any heritable germline editing applications, though some Democrat House members had urged Congress to instruct the FDA to consider the issue instead of banning it outright. Obtaining regulatory clarity on heritable germline editing, ideally for severe genetic diseases with no alternative treatment, would theoretically allow heritable germline editing to proceed. I am not sure that an advocacy campaign centered on this would work given the potential for backlash if it became highly salient.

There is some polling on these and related issue, eg, Pew polling from 2021 on heritable gene editing to reduce disease risk, which shows roughly a somewhat favorable public with many uncertain, though the proportions differ with different wordings. However, I am uncertain how reliable issue polling is– see David Shor on problems with issue polling in general. My overall guess is that the more that heritable/germline embryo editing resembles prevention/treatment of disease, instead of human enhancement, the more the public will be in favor; also, avoiding a high rate of discarding embryos seems important for alleviating abortion-related concerns for some religious groups.

Embryo Selection

Monogenic diseases

An anonymous colleague who is an early-career human geneticist (henceforth “Hayt”) has written a section (“the road to causally sound embryo selection”) focusing on the limitations and challenges of current methods of embryo selection on complex traits with polygenic scores, and outlines a number of ideas to improve those scores. I have also edited that section to reflect feedback from relevant subject-matter experts– in cases of controversy, assume the more sensible opinion is Hayt’s, while the errors are mine. While embryo selection with polygenic scores has only recently entered clinical practice (Genomic Prediction was founded in 2017, while Orchid was founded in 2019), selection of embryos based on monogenic diseases (known as Preimplantation Genetic Testing, PGT-M) has been part of clinical practice for more than 30 years. I will briefly describe this method, drawing from this book on PGT-M.

Preimplantation genetic testing for monogenic diseases was first performed in humans in 1990 through selecting for female embryos from a couple with an X-linked disease. Screening for autosomal diseases became possible in the mid 1990’s, and current PGT-M techniques can detect a variety of single-gene polymorphisms and chromosomal rearrangements.

While PGT-M was originally performed only for highly penetrant and deleterious diseases, its use has expanded to variants that convey increased but not guaranteed risk for disease and has been performed in 100k+ cycles globally. In one major center, about 13% of all PGT-M cases were for variants that confer increased risk for cancer. Newer techniques for PGT-M have been extended to screening for multiple single-gene disorders at a time, useful for families or populations that carry multiple disorders. In one large center, PGT-M has been used to screen for 45 different inherited cancer syndromes (pg 126 of this book), such as BRCA1, BRCA2, Li-Fraumeni syndrome, and Familial Adenomatous Polyposis. These syndromes carry cancer risks ranging from nearly guaranteed (lifetime risk > 90% for Familial Adenomatous Polyposis to merely very high (~40-60% lifetime risk of breast cancer in BRCA2).

The lifetime costs of monogenic diseases are very high, and it is likely that offering IVF + PGT-M for free to prevent the transmission of monogenic diseases is cost-effective for many diseases: eg, for BRCA1/2 and for sickle-cell disease.

PGT-A

Another use-case for pre-implantation genetic testing is aneuploidy testing. Apart from embryos with Trisomy 21 or Turner Syndrome (45X0, missing an X chromosome), who often survive pregnancy (though at lower rates than chromosomally normal embryos), the vast majority of embryos with aneuploidy either do not implant successfully or result in miscarriage. To prevent the disappointment and trauma of miscarriage for parents, and to increase the success rate of per-embryo transfers, IVF clinicians introduced pre-implantation genetic testing of ploidy status (PGT-A).

Likely due to technical limitations of most commonly used PGT-A methods, as well as the possibility of embryo mosaicism, PGT-A does not perfectly predict aneuploidy status, and hence, implantation status. Supporting this limitation, a recent trial comparing IVF with PGT-A versus conventional IVF without PGT-A found non-inferiority of conventional IVF, with a higher cumulative live birth rate in the conventional IVF group. That is, the group not performing PGT-A had a higher live birth rate.

Even if PGT-A correctly prioritizes the embryos with the highest chance of implantation success, embryos that are called as “aneuploid” by current methods, and especially mosaic, still have a chance of implantation and live birth. Better prioritization of embryos likely does reduce implantation failures, but since it doesn’t increase the number of embryos available, it can't increase the cumulative life birth rate. This popular press article does a good job summarizing the controversy. There are reasons to think^[40] that more sophisticated genetic testing may do a better job correctly calling ploidy status, which would reduce false calls of “aneuploidy”

Polygenic selection

The road to causally sound embryo selection

Editor’s note: mostly written by Hayt, subsequently edited by Willy with feedback from various other experts

Human embryo selection is a promising direction to improving the next generation’s health and well-being. As is, much current research associating genetic variants and predisposition to complex disorders is observational and cannot make firm causal conclusions. Here, I outline the basic principles and methodology underlying current embryo selection approaches and argue that in a number of ways they are lacking with regards to predictive power, accuracy, and causal claims. I then propose future research directions that will pave the way to a more scientifically rigorous and effective approach to embryo selection based on principles of improved biological modeling and causal learning.

Evolution, genes, and reproduction

Evolution (on single genetic variants) operates via selection: negative selection removes damaging mutations from the population (e.g. lethal monogenic disorders), positive selection increases the frequency of favorable variants (e.g. lactase persistence in European populations), and balancing selection maintains multiple alleles present in the population (e.g. the sickle cell anemia causing recessive mutation that in the heterozygous state leads to malaria resistance). Under more complex scenarios where there are multiple genetic variants impacting a trait under selection, other evolutionary dynamics emerge such as stabilizing selection, whereby genetic variants that alter a trait are balanced in the population to achieve an optimal trait value.

Selection shapes the mutational landscape of our genome, but there are also stochastic mechanisms that introduce phenotypically neutral mutations into a population’s gene pool. These mutations are not undergoing selection but are rather silently tagging along through random chance. These neutral variants become more or less common through the process of genetic drift (which also affects non-neutral variants). Changes in the environment change the fitness consequences of variants.

Current approaches to embryo selection and their limitations

Scientific advances are making the genetic optimization of a child's health possible. It is already common practice to select embryos based on monogenic disorders and chromosomal abnormalities, which provide no advantage, cause substantial harm, and are reasonably well-understood. But many traits do not operate via a single gene. So how can we select on these complex, polygenic traits? Currently, statistical geneticists are using quite simple approaches with various degrees of success. The procedure they use to derive genetic predisposition is generally based on the following with minor variation:

Conduct a genome wide association study (GWAS) in unrelated individuals to detect the strength of association and effect size for every (common) variant independently, typically on the order of millions of variants.
Use a training sample of people to optimize the polygenic risk score (PRS) prediction accuracy. The PRS is derived by adding up the effects (determined via the GWAS or some reweighting procedure1) of genetic variants across one’s genome as a linear weighted sum across the associated variants.
In some cases, the predictive power of the PRS is also tested in a separate testing sample.

This procedure is a useful simplification, but does not perfectly model biology. A GWAS measures marginal effects of variants independently, but there are two potentially important caveats:

linkage disequilibrium, the correlation between nearby variants. Per a subject-matter expert this is dealt with well by most PGS methods.
epistasis, variants do not operate independently. Even in the simplest biological systems, we observe complex interactions between genetic variants that are not captured by summing individual marginal effects. The extent to which epistasis is limiting current approaches may not be perfectly well-understood, so general conclusions regarding its effect on predictive power are difficult to conclusively state. An paper from 2012 argued that some of the “missing heritability” of diseases might be due to epistasis. By contrast, a more recent paper on educational attainment did not find any genome-wide significant SNPs that displayed dominance. There is also a longstanding theoretical argument (dating back to Fisher) as well as recent empirical data supporting this, that epistasis should not explain a large proportion of genetic variance, so epistasis may end up being mostly unimportant for prediction. A subject-matter expert states that “there’s fairly strong theoretical and empirical reasons to think it is unimportant for prediction...linear model captures nearly all of the heritability”.

PRSs today show predictive power in analyzing population level data, but our confidence in individual prediction should be lower. In addition, when comparing polygenic scores that perform similarly on an aggregate sample, different PRS of the same trait, constructed in slightly different ways, can vary in trait prediction for individuals. In the most well studied polygenic trait, height, a recent paper argued that some assumptions of the PRS model were significantly violated which led to some systematic (though minor) errors in estimations– and per a subject-matter expert, these errors could be fixed by a monotonic transformation and would not affect embryo ranks.

In addition and perhaps most importantly, the explanatory power for most traits is lackluster for individual level predictions, with optimal current polygenic screening technologies increasing the mean IQ of a selected embryo by an average of 2.5 points and height by under an inch, under certain assumptions. Worryingly, genetic prediction of cognitive traits such as educational attainment, the most well powered cognitive trait studied, has proven to be significantly confounded by nondirect correlates of education attainment, suggesting selection on educational attainment using current technologies would be less than half as powerful as one might naively predict predict, discussed here^[41]. On the other hand, other cognitive traits, such as IQ, may display less confounding of that type than educational attainment.

Figure 23. Meta-analysis estimates of direct and population effects of PGIs.

Confounding

GWAS is observational in nature, which leads to confounding that is difficult to control with current approaches. There are four (potentially overlapping) sources of confounding, briefly summarized below:

Population stratification^[42]

Principal components and linear mixed models adjust for stratification, but are imperfect (as a recent paper on height comparing results in the GIANT consortium and UK biobank shows) and have more trouble with recent structure, per a subject-matter expert.

Assortative mating

Per a subject-matter expert: “Assortative mating confounding in isolation is probably not as troubling since it in general will still lead to a high correlation genome-wide between GWAS estimates and causal effects, but AM in combination with other confounding factors can have complex effects. “

Indirect genetic effects, eg, genetic variants that affect parental behavior, which then affects offspring phenotype.

Can really only be solved with family-based GWAS.

Ascertainment bias

Also causes some issues with obtaining causal estimates, and is more problematic with “highly ascertained datasets such as UKB and 23andme.” A recent study found that higher polygenic risk scores for some diseases, such as schizophrenia and ADD, were associated with lower rates of participation in a longitudinal cohort study.

A classic thought experiment is the “chopstick gene”: imagine you want to find the variants that are responsible for making someone better at using chopsticks. You can take a random sample of people across the world and conduct a GWAS. You would find dozens of strong associations, but did you recover anything biologically, causally meaningful? No, you just found genetic variants that differ between East Asians and the rest of the world due random genetic drift induced by geographic proximity. Clearly, we are not interested in these spurious correlations, though using current approaches they are pervasive in naively performed GWAS studies– though the field is well aware of these problems. On the other hand, per RM, ancestry confounding can be well corrected for in a GWAS with the inclusion of Principal Components. Indirect/parental genetic effects are still picked up in a regular GWAS, but these can be teased apart with family-based GWAS, where multiple siblings and/or parents are examined.

This gets at an additional issue of polygenic score transferability. Per a subject-matter expert:

issues with polygenic score transferability stem primarily from differences in linkage disequilibrium and allele frequencies. Other forms of confounding likely play a role in educational attainment and related phenotypes. There are many people working on methods to ameliorate these issues too. It could be worth mentioning that as well: it's not a completely intractable problem.

To control for the confounding described above, researchers select “genetically homogenous” groups of people to include in their GWAS. These have been overwhelmingly white European individuals. Polygenic scores have reduced utility in individuals that have different ancestry from the GWAS sample, with the reduction increasing with genetic distance from the GWAS sample in which the polygenic score was developed. In one instance, a polygenic score for schizophrenia trained in Europeans correlated much more strongly with ancestry than the condition itself, in other groups. While polygenic scores trained in one ancestry have some degree of transferability to others, the overall reduced predictive power and unintended consequences of selecting against certain ancestries^[43] of the selected embryo complicates current approaches.

Putting aside the limitations of GWAS and transferability, another important consideration (“pleiotropy”) in selecting an embryo’s trait is that not only do variants not act independently, but the same variant may impact traits differently. Sometimes a variant increases disposition to multiple desirable traits of interest, but there are some cases where a variant that increases predisposition to one favorable trait decreases it for another. Pleiotropy is not well understood, and some notable and worrying examples emerge upon investigation. For example, a single variant in the metal transport gene SLC39A8 confers decreased risk for hypertension and Parkinson’s disease, but increased risk for schizophrenia, Crohn’s disease, and cognitive performance. Searching variants in the Finngen browser (https://r6.finngen.fi/gene/) illuminates the pervasiveness of pleiotropy and illuminates thousands of examples of single variants having discordant effects on disease risks. Work is being done to quantitatively describe pleiotropy in these cohorts, and the results will shed more light on this issue for PRSs. Another striking example is bipolar disorder, where current GWAS approaches show that on aggregate, genome wide variants that decrease risk for bipolar disorder would also decrease disposition for higher education attainment. One possibility raised by a subject-matter expert is that many of these results are inflated or spurious due to assortative mating and population structure. His best guess was that overall, genome-wide genetic correlations are low and positive between most diseases– in other words, most of the time, a given genetic variant did not have discordant effects.

In addition, there is risk with current in vitro fertilization (IVF) techniques which must be weighed against the disease risk reduced by embryo selection. The absolute risk reduction as it stands is low and current embryo selection approaches would yield many false positives. As an example, imagine you select against an embryo in the top 10% of genetic risk for schizophrenia. With the current best PRS and assuming PRSs are completely causal predictors, that individual would have had a 5% chance of developing schizophrenia, compared to ~1% for any average embryo in the population (which would most likely be an underestimate for the parents that would produce an embryo in the top 10% in the first place). Contrast this with an important risk of IVF, ovarian hyperstimulation syndrome (OHSS). Exact numbers are difficult to find, but probably about 3-6% of women experience moderate OHSS and 0.1-2.0% experience severe OHSS, which in some rare cases, can be fatal.

In short, we are 1) not modeling biology correctly (though a subject-matter expert countered that selection in agriculture has done very well without much understanding of mechanisms) and 2) relying on confounded observational data to make causal claims. We need to innovate our approach such that we can confidently make concrete claims regarding genetic causality, model genetic interactions more realistically, and avoid selecting embryos without taking pleiotropy into account.

Research directions to improving embryo selection

We need to approach this problem by taking principles of causality and corroborating evidence into account, using 1) new computational approaches, 2) family study designs, 3) deep phenotyping, and 4) diverse population cohorts.

Deep learning approaches can better control for environmental confounders by identifying nonlinear confounders (e.g. socioeconomic status by age interactions)13 and finding higher order representations of our genome that better model true biology. Nonlinear confounders are environmental variables that interact with genetic makeup of an individual and with disease risk in nonlinear, more complex ways than how they are currently being modeled. Applying deep learning approaches from natural language processing has already increased our understanding of genomic interactions that predict molecular phenotypes,14 pointing to higher order genetic dependencies that were previously unknown and can now be incorporated into polygenic trait prediction and understanding biological causation. There are also various examples of machine learning approaches that have been used to improve transferability of polygenic risk scores.15,16 Deep learning will reveal causality via providing biological evidence for the biological relevance of associations made using observational studies and in doing so improve prediction accuracy. Funding AI fellowships to train and get people with diverse machine learning backgrounds interested in the field would accelerate this direction of research.

RM was not as excited about this approach, reasoning that the animal breeding field (which is motivated by commercial success, not publications) has not focused much on modeling non-additive genetic variation. This implies that in practice, the additive model is sufficient. He also thinks focusing on causal associations is somewhat overrated, since you can have accurate genetic prediction without the variants in question being causal– presumably due to the causal variants being in strong linkage with the non-causal SNPs used in the model. Another subject-matter expert added that “functionally informed fine-mapping can help in transferability to different ancestries, as demonstrated by finemap/Susie”.
Another subject-matter expert was not convinced deep learning would help with better understanding of causality, though it might help with PGS score construction:
Deep learning won't get us to causality. It's still just observational data. I'm also skeptical that non-linear models will increase PGS R^2 substantially. Deep learning for genomic annotations I think could help in improving priors for PGS construction, though.

Using related individuals allows us to find genetic effects that are not driven by unobserved environmental confounders as there is randomization of genetic material during meiosis, like an RCT. This removes environmental confounds and genetic confounds that are due to long-range correlations of genetic variants of interest with other variants that are due to assortative mating and population structure. However, datasets on related individuals are far outpaced in size by those of unrelated ones. A promising study constructed a large collection of siblings and conducted GWASs for various traits between siblings (rather than unrelated individuals.17 They find that the SNP heritability (heritability explained by common variants) of many traits including height, weight, ever smoked, and education attainment, substantially decrease, with the estimate for education attainment decreasing by over 50%, and the others attenuating less. In addition, they find that the underlying genetic architectures (the inferred variants and their effect sizes influencing a given trait) of various traits substantially differ than expected when using their sibling approach. One can use genetic correlations to assess how similar one trait’s GWAS association patterns relate to others’, with the interpretation being that their underlying genetics are correlated. Education attainment has been positively correlated with height and age at first birth, and negatively correlated with weight and having ever smoked, yet these correlations are completely attenuated using the sibship estimates. Domain experts have expressed this as being one of the most important directions of research towards the goal of biologically relevant polygenic risk prediction.

RM was also excited about this work, particularly since it relates very directly to embryo selection, which is about prediction within siblings.

Detailed phenotyping that covers a broad range of fields can help us assess which variants increase predispositions to traits without decreasing optimality for others. In addition to breadth, the depth of phenotyping needs to be improved such that phenotypes that are mechanistically closer to the underlying biology can be studied. As an example, recent work in Finland showed that finer-grained descriptions of schizophrenia subtypes revealed differential genetic architectures for different symptomologies, though RM was skeptical of this work. Doing so across cohorts will improve predictive accuracy as well as provide better avenues for validation of PRSs, as shown successfully for blood traits.18

RM liked this idea in theory, but noted that historically, “bigger biobanks with mediocre phenotypes” have won out. He agrees that large biobanks with deep phenotyping would be best.

Lastly, using diverse cohorts can lead to finding variants that are consistently associated with traits of interest in individuals that are not in the same cultural, environmental, and genetic conditions, which increases our confidence that these variants are causal. They can also identify new variant associations for variants that are not present in white Europeans. Fine-mapping, which is the process of identifying the causal variant within a given identified GWAS “hit”, is also aided by diverse cohorts– different patterns of linkage disequilibrium in different ancestries are very useful.
There are large biobank efforts which have already proved the utility of these cohorts in refining our understanding of genetic associations, and these efforts should be expanded. Many new biobanks and large cohorts are emerging and growing (AllofUs and BioMe in the US, Finngen in Finland, UKBiobank and 100,000 Genomes in the UK, Qatar Genomes, as well as various Asian biobanks). Many of these biobanks have a main focus of expanding diversity, especially in those countries with diverse populations. AllofUs, which aims to recruit over 1 million individual’s, is aiming to have over 50% of the cohort be individual’s from underrepresented groups. This approach is perhaps the most fruitful: as much as it would be desirable to establish large scale Biobanks in developing and underrepresented countries, it is important to be mindful of the limited electronic health record infrastructure established in these countries, so efforts to expand representation in cohorts in countries with such infrastructures would likely be most fruitful. However, this is not the case for disease-specific cohorts, where finding individuals with rare diseases even in countries with limited health infrastructure may be most accessible, such as an effort at the Sanger Institute to recruit Indian patients with severe neurodevelopmental diseases. Given these realities, funding would be most cost effective and beneficial if directed towards expanding existing biobanking efforts (including entire families) in countries with diverse populations and appropriate infrastructure as well as seeking to develop severe disease-specific cohorts in underrepresented countries.

RM was excited about this idea as well, as well as a subject-matter expert in statistical genetics.

On a similar note, genetic studies need to be communicated well to the public and efforts should go into public information campaigns to increase both participation as well as acceptance of such studies. Many individuals are worried about eugenics and social preferences against their own characteristics. As an example, an unnamed autism cohort that would have been one of the largest autism cohorts in the world was halted due to severe backlash from the autism community, claiming that the researchers are eugenicists that are trying to eradicate people with autism (an unfounded claim). Society and researchers will benefit from better education on these topics.

Incorporating rare variants into embryo selection

Importantly, there is an entire class of genetic variation that is being left out by current PRS methods (which rely on genotyping) that can be much more readily applied to embryo screening: rare coding variants. It is rare variants that typically have the largest per-variant effect on traits. We already understand certain types of rare variants (especially those in specific parts of the genome that code for proteins) and we can start exploiting this understanding in the context of embryo selection, which has not been done except for some monogenic disease-causing genes. A simple measure of rare variant burden that is biologically interpretable and most convincingly causal has been shown to be associated with reduced fertility, cognitive abilities, and other undesirable traits, with no positive associations with desirable traits. Some of these harmful genetic variants are mutations introduced de novo, or in the most recent generation (i.e. are not present in the parents), potentially making them a great target for embryo screening as pre-screening of the parents is not possible for these variants. Unfortunately, there are some practical issues associated with attempting to screen for de-novo (though not rare) variants, principally that most whole-genome sequencing technologies cannot currently reliably detect de novo variants^[44], as they cannot be distinguished from sequencing errors, with the exception for germline mosaicism that recurred in siblings.

Selection through rare variants may be the closest we are to selecting embryos in a biologically informed manner. Computational tools to predict the effects of rare variants can and should be improved, but papers like this, that can rank genes according to how “tolerant” they are of inactivation, are a good start. Increasing size would help as well: some recent work from the UK biobank that obtained whole-exomes of 500k individuals predicts that with sample sizes in the several millions, loss-of-function variants will be found in nearly all genes.

It is an exciting time in human genetics where we must start causally learning from the data so we can improve the health and well-being of future generations and society as a whole.

Responses

From an anonymous researcher & very early-stage (not public) start-up founder in this space

Some comments:

As you pointed out, epistasis is empirically irrelevant for the diseases and traits we care about. It is exponentially harder for evolution to select for a combination of variants together rather than a single variant.

Stratification can be tested after-the-fact: Hsu's height predictor tested for stratification bias and found none. Most GWASs test for 10+ stratification dimensions (PCs) in addition to only using a homogenous sample. A PGS would not be computed for a GWAS that didn't properly address population stratification.

The 2.5 points estimate is essentially a lower bound due to the assumptions made. Check out Gwern's estimate on this topic. It should also be noted that for all polygenic scores, poor on average benefit can still have very large outlier detection benefit. An r^2 of 0.1 is sufficient for very good outlier detection (e.g., top 10% and bottom 10%). For example, even if your neuroticism predictor can only move neuroticism by 1/20th of a standard deviation on average with embryo selection, it still can be very good at detecting embryos that are extreme outliers in neuroticism.

It is commonly said that pleiotropy is not well understood--this is true if we're talking about the biological pathways resulting from specific alleles, but I personally would say that pleiotropy is very well understood. There are many hundreds of genetic correlations published in the literature, including those for important traits and diseases. If we were to select for one trait or disease, we generally know the pleiotropic effects it will have.

Issues with polygenic score transferability arise from differences in linkage disequilibrium if you're talking about transferability to other ethnicities or to CRISPR. You can safely ignore LD for cohorts/individuals with similar ancestry to the training population. LD is accounted for both on the GWAS level and the PGS calculation level.

Causal associations are very overrated, as our goal is prediction, not editing specific variants. You can tell that a car has wheels even if you see only the top half of the car. A variant which wasn't very correlated with the true causal variant wouldn't show up as significant in the GWAS. [editor’s note: I largely agree with this point for polygenic embryo selection, but getting causal variants would presumably help a lot with translating PRS to different groups, where LD decay is a problem. ]

Pleiotropy is not that big of a problem. Let's say there is a positive genetic correlation between bad trait A and good trait B. This occasionally occurs, though the correlation is very weak. You can easily find an embryo with low A and high B even though the two are statistically correlated; the correlation doesn't mean that B always increases A. You can select for both at the same time. It should be noted that pleiotropy usually causes good things to be correlated with other good things (see Okbay et al. 2022 for example), so pleiotropy is usually good and causes a synergistic effect.

Pregnancy

Uterus and Endometrium

The uterus is the organ where the embryo implants and later grows. It is a hollow muscular organ that can enlarge substantially during pregnancy. From inside to outside, the uterus has 3 layers, endometrium, myometrium, and perimetrium.

Figure 24.

The endometrium is the site of implantation of the blastocyst. It is a very dynamic tissue that changes during the menstrual cycle, depending on hormone levels, growing and then shedding. Abnormalities in the endometrium, such as Asherman Syndrome, can cause infertility, as can large benign growths of the uterus. There is some variation in the shape of uteri and uterine malformation likely plays a role in some cases of infertility: Uterine abnormalities occur in 7-10% of women, 25% of women with uterine abnormalities have poor pregnancy outcomes; major anomalies are 3x more common in women w/ recurrent miscarriages. Though gestational surrogates are expensive, they do provide a workaround for uterine causes of infertility.

For women who are determined to carry a baby to term themselves but have uterine issues, there are surgeries that can fix some problems and as a last resort, uterine transplantation.

The cervix, the lowest part of the uterus that connects it to the vagina, is important for fertility because cervical abnormalities can threaten pregnancies. Weakness of the cervix can cause miscarriages or preterm births. Cervical cerclage can address some of these problems, as can exogenous progesterone administration, and close monitoring of cervical length during pregnancy in individuals with a history of cervical insufficiency.

Implantation

The endometrium must be decidualized (“ endometrial fibroblasts transforming into specialized secretory decidual cells”) for a successful pregnancy, which is controlled by progesterone. In most animals, this is controlled mostly by the fetus; in humans, there is more maternal control.

Most blastocysts do not successfully implant, with a 40% chance of successful implantation in optimal conditions. The current understanding of failed implantation puts some of the blame on “uterine factors” and some blame on fetal abnormalities. Researchers have defined “recurrent implantation failure” in a variety of ways, but the basic findings on predicting implantation failure (from here) are as follows:

Higher maternal age predicts lower implantation rates
Higher BMI predicts lower implantation rates
Cigarette smoking predicts lower implantation rates

Distinguishing between fetal or uterine causes of failed implantation is important, since it can guide treatment. Unfortunately, a review of treatments for women with repeated implantation failure noted the generally poor quality of evidence on many treatments, stating “we witnessed the emergence of a number of RIF treatment options of simple execution but characterized by weak rational bases....their introduction into current clinical practice occurred rapidly without waiting for adequate evidence of efficacy and safety”. Many of these treatments with uncertain evidence have already been introduced into clinical practice, a recurring problem with IVF treatment add-ons.

The best controlled research on risk factors for failed implantation comes from data on donor egg implantation. Donor eggs are healthy^[45], and more importantly, unrelated to the age of the recipient. There is some indication that beginning in the late 30’s, the age of the recipient begins to reduce success rates, but also some evidence that recipient age does not reduce success rates. Regardless, the effect is small compared to the effect of age on donor egg quality and number. A related development is uterine transplantation, which has been successfully carried out in a number of different centers and countries, with at least 18 live births as a result. Surrogacy, though expensive, is also an option for women with uterine factor issues. From the CDC report on IVF, illustrating the relative stability of success with donor eggs as carrier/parent age increases, versus the clear decline in success with parental eggs:

Figure 25.

After a successful implantation, there is still the possibility of pregnancy loss. Recurrent pregnancy loss, defined as 2 pregnancy losses prior to 20 week, occurs in about 2-3% of couples. Risk factors for recurrent pregnancy loss are similar to the above, with higher female age as the most consistent risk factor. While many cases (perhaps up to 50%) of recurrent pregnancy loss will remain unexplained after a diagnosis, previous unexplained pregnancy losses are a risk factor for future pregnancy losses, implying a stable underlying trait. There may be some maternal “rejection” of genetically abnormal embryos, but is unclear if this really occurs in humans. From Speroff:

Evidence from several mammalian species indicates that the endometrium is capable of “sensing” the quality of the attaching embryo, mounting a decidualization response that is tailored to individual embryos. Microarray analysis of bovine endometrium has revealed differential gene expression depending on the origin (somatic cell nuclear transfer, IVF, or artificial insemination) and developmental potential of the implanting embryo.211 In humans, endometrial stromal cells have been shown in vitro in a coculture model to respond selectively to low-quality embryos by inhibiting the secretion of key implantation factors, including IL1-beta, HB-EGF, and LIF.212 In addition, low-quality embryos elicit an endoplasmic stress response in human endometrial stromal cells in vitro, as well as in mouse uterus in vivo.21

An intriguing finding from the COVID-19 pandemic was a drop in prematurity rates, without an increase in stillbirths, in Denmark, during lockdown. This finding was later replicated in high-income but not low-income countries. Some proportion of stillbirths and extremely premature births (which have severe health consequences for the baby^[46]) are likely associated with infections, and the authors of the Denmark study viewed reduced maternal infections, as well as reduced exposure to air pollution, as possible causes of this drop.

One feared complication of pregnancy, preeclampsia, is also related to abnormal implantation:

“this invasion process is limited in pregnancies with preeclampsia, and this is the fundamental cause of the poor placental perfusion associated with preeclampsia and intrauterine growth retardation.”

One possible reason that preeclampsia is still poorly understood is that we lack a good animal model for human pregnancy. Only in great apes does the embryo completely invade the endometrium. Since the invasion process is part of what appears to go wrong in preeclampsia, and experimentation on great apes presents ethical/regulatory challenges, this may limit our understanding.

Identification of a logistically easier animal model or appropriate organoid model, might improve our understanding of implantation. Overall, I view improving implantation rates as an important target. Relative to a goal like “improve our understanding of the genetics of complex traits”, which has a straightforward mechanism of increasing GWAS sample size, improving phenotyping, increasing the use of exomes and whole genomes in large-scale genetic studies, and more within-family studies, improving implantation rates is less straightforward.

Pregnancy Risks

As a general disclaimer for this section, here is a thread by Lyman Stone exploring how different definitions of maternal mortality in different countries can change results. For that reason I have avoided cross-country comparison. With that caveat out of the way, I will briefly address two specific questions, with the US as the focus:

What is the risk of death in pregnancy?

In 2020 in the US, the rate of maternal death per 100,000 live births was 23.8, with race (non-Hispanic Black women have ~ 2-3x risk vs non-Hispanic white women) and age (older women have higher risk) predicting higher mortality rates. As some context, some people use micromorts (1/1,000,000 risk of death) to compare different risks of death to each other. With that metric, a maternal death rate of 23.8/100,000 is 238 micromorts, about half as dangerous as base jumping.

However, maternal mortality is not randomly distributed: some women are at predictably higher risk. The term for this in obstetrics-gynecology is “high-risk pregnancy”. I have not found risk calculators^[47] (akin to the ASCVD risk calculator based on Framingham data) for estimating maternal risk, but certain conditions are known to increase risk, to varying degrees: autoimmune diseases, high blood pressure, obesity, higher maternal age, previous C-sections, hypercoagulability, and more. However, it seems very likely that the risk of maternal death for women free of most or all those conditions is substantially lower than the 23.8/100,000 estimated above.

A variety of risk prediction models for pregnancy-related severe illnesses have been developed, though some focus on predicting mortality of obstetric patients that are hospitalized and require laboratory values. For example, the CIPHER model uses: “10 predictors: maternal age, surgery in the preceding 24 hours, systolic blood pressure, Glasgow Coma Scale, serum bilirubin, activated partial thromboplastin time, serum creatinine, potassium, sodium and arterial blood gas pH”.

A CDC report examining all maternal deaths from 2011-2015 argued that about 60% of all pregnancy-related deaths were preventable. I did not investigate this question in-depth enough to guess if that estimate was reasonable or not.

Child/Infant Mortality Trends

The reduction in neonatal and infant mortality is responsible for a large increase in life expectancy over the 20th century. Interestingly, mortality rates for infants and children seem to have been consistently high^[48] across a wide range of historical societies and hunter-gatherer groups. Thus, from a long-term perspective, the dramatic reduction in child mortality is a world historical accomplishment. Per a 2005 WHO report, the majority of high young child (age < 5 years) mortality at that time was driven by communicable diseases, which is roughly consistent with data from hunter-gatherer groups as well– with the caveat that determining the precise cause of death in infant deaths is still occasionally a challenge with modern diagnostics. Congenital defects, violence or accidents against infants, and infanticide/abandonments were also important causes of historical infant mortality.

Neonatal Care Trends

In high-income country settings, preterm births make up a significant portion of infant mortality. Specifically, per the American College of Gynecology, births before the 3rd trimester are about 0.5% of births but account for 40% of infant deaths. They also account for a significant amount of childhood morbidity through the long-term harm of preterm births (eg, cerebral palsy, etc.) There has been progressive improvement in the outcomes for preterm infants^[49].

I will very briefly outline trends in fetal viability, the physiological limits on fetal viability, and the possibility of exogenesis to improve fetal viability. As a caveat, the approximate fetal viability numbers described here are in high-income countries.

Defining fetal viability involves some degree of ambiguity and controversy. The TL;DR is that infants as young as 22 weeks can achieve, in some highly specialized centers, survival rates of about 5%, though with very severe long-term complications and at very high hospital/long-term costs. At 26 weeks, survival rates reach about 80%, though with high rates of long-term complications. For these reasons, deciding to treat extremely preterm infants is controversial, and below a certain threshold, many centers will offer palliative care only.

A 2020 review of extremely preterm infants’ prognosis provides some useful figures, reproduced below:

Figure 26.

Some important takeaways from this figure:

22-week infants achieve survival rates ranging from 5-60%, while most studies reported “at least 50% survival rate among infants born at 24 weeks”. This differs from the numbers cited above, from this previous source, which reported survival rates of 5% for babies born at 22 weeks.
Importantly, the rate of severe complications for surviving infants was high, ranging from gut (NEC=necrotizing enterocolitis), brain (IVH=intraventricular hemorrhage), lung (BPD=Bronchopulmonary dysplasia), and eye (ROP=retinopathy of prematurity) issues.
These organs are not fully developed until relatively later in fetal growth. My impression is that these complications are the rate-limiting problems for extending fetal viability further, but I’m not sure of this and did not investigate this in-depth.

Extremely premature infant survival appears to still be improving, at least from the 1990’s into the 2010’s, as data from England from the 1990’s into 2014 shows.

Figure 27.

Figure 28.

It is unclear to me why the survival rates above, which reach about 35% 22-23 week old infants, are substantially higher than the 5% reported by the ACOG report (which was last updated in 2021, but written in 2005). Differences in inclusion criteria and outcomes measures seem like the most likely candidates, as well as some improvement in prenatal care over the last decade. For this reason, I am only somewhat confident in the finding that preterm viability appears to be improving for extremely preterm (less than 26 weeks) infants. I would caution that future in-depth research into this area should first try to obtain clarity on the following:

Have outcomes in very premature infants (<26 week-old infants) improved over the last 20 years, conditioning on the same outcome measurement?
This information would inform estimates of likely improvements in the future.

Artificial wombs

A recent article, as well as many on twitter, have raised the possibility of artificial wombs, mostly as a means of extending fetal viability and as a means to reduce the burden of childbirth on women (in the hypothetical case that artificial wombs could completely replace natural pregnancy) . The scientific paper that started the discussion is this one, in which fetal lambs were gestated in a liquid environment, with better results than previous efforts. Some of the same authors have published a recent review on the challenges of translating research on artificial wombs into humans, whose points I will briefly summarize:

A major medical problem in very premature infants is their underdeveloped lungs, which makes supplying them with oxygen challenging. Their lungs are highly susceptible to injury from the medical interventions (mechanical ventilation) used to keep them alive.
Recent efforts have made major advances by using a lower-resistance oxygenation strategy, which makes more physiological methods of supporting circulation and respiration possible. The fetal circulation is quite different from adult circulation– see here for a summary– and this allows a closer approximation to it than assisted mechanical ventilation.
Immersion in a closed and sterile liquid environment mimics fetal exposure to amniotic fluid, which is important for lung and gut development, and reduces infection risk.
The remaining important challenges appear to be:

Reducing the need for systemic anticoagulation to reduce the risk of germinal matrix (brain) hemorrhages
Improving the mix of nutrients and growth factors delivered to the baby
Developing a “simple, rapid, and effective cannulation technique” – that is, improving the method of accessing the umbilical vein.
Developing animal models that mirror non-healthy human pregnancy conditions that cause prematurity, such as chorioamnionitis and intrauterine growth restriction.
Testing these techniques on animal models that are earlier in development, since 22-week human fetuses appear to have less developed lungs than the lambs used in this study.

The studies mentioned so far have focused on improving the survival of extremely preterm infants, not on extending the period of time from fertilization that embryos can be cultivated for. An additional challenge is that the process of transferring a fetus from the mother to an artificial womb might necessitate a C-section, which is more dangerous earlier in the pregnancy^[50].

From an ethics paper by one of the authors of the lamb paper:

A maternal burden of AWT is that fetal extraction via C-section (as is currently described in all successful AWT models) entails a higher perioperative risk for maternal complications (such as bleeding, complicated extraction, higher risk of uterine rupture in future pregnancies) at earlier stages of pregnancy. Of note, C-section is currently often used as a method of delivery for extreme premature infants in distress, and AWT following vaginal delivery may become possible in the future.

There is at least one conservation biology startup, Colossal, that has floated the possibility of trying to develop artificial wombs for wooly mammoths, as part of a roadmap for de-extinction. Such efforts may advance human artificial womb research as well, though there are likely very substantial differences in biology that will make translation from animals to humans challenging. In addition, a person close to some of the team noted their research was very preliminary and doubted they would actually do this [work on artificial wombs].

Due to time constraints I did not investigate this topic in-depth– tentatively, my impression is that neonatal care for extremely premature infants is so expensive and far from optimal that even highly expensive artificial wombs could be justified on those grounds, though I have no strong sense of how likely to succeed efforts on that front are.

In-vitro embryo culture

Doing the reverse, and attempting to culture human embryos in culture for as long as possible, is another approach. From a regulatory perspective, this is more difficult, due to the 14-day rule (now repealed), beyond which many countries prohibit, whether formally or informally, culturing human embryos. This rule was developed in the late 1970’s and early 80’s, perhaps in reaction to progress in in-vitro culture of embryos (IVF was first performed successfully in 1978).

With the caveat that I have little expertise in this field, and did not investigate this in-depth, my impression is that this 14-day rule was not an important barrier to research until somewhat recently, when embryo culture methods improved. Before then, culturing embryos much beyond the time they would normally implant (around 5-7 days after fertilization) was not successful.

There have been recent advances in embryo culture methods which likely prompted the International Society for Stem Cell Research to relax their guidelines, obviating the 14-day rule. The new guidelines state:

Should broad public support be achieved within a jurisdiction, and if local policies and regulations permit, a specialized scientific and ethical oversight process could weigh whether the scientific objectives necessitate and justify the time in culture beyond 14 days, ensuring that only a minimal number of embryos are used to achieve the research objectives.

Instead, they now have a tiered system of research regulation, in which research involving cultivation beyond 14 days requires review by a “specialized oversight process”, but is not blanket banned. The categories of research oversight are shown below:

I am unsure what the current state-of-the-art is capable of achieving or how quickly it will likely advance. Some notable examples:

A paper from 2019 managed to culture cynomolgus monkey embryos in culture until day 20.
A 2020 paper cultured aneuploid human embryos in culture up to day 9
A 2016 paper appears to have cultured human embryos until about day 14.
A 2022 paper on growing mouse embryos derived from embryonic stem cells outside of a uterus (coverage here).

Fundable Projects

I have marked project ideas that I find especially promising with either ⭐ or ⭐⭐. I have also marked projects that I think (about 60% confident) I could find a champion/executor for with 🚀, and close to “funder-ready” with 🚀🚀.

Overall, I have tried to prioritize projects by my guess at how positively impactful they seemed likely to be, though without any pretension at a formal calculation of expected value. My reasoning of which projects were especially promising often hinged (not exclusively) on the following considerations:

Tractability: does a project seem likely to succeed?

I think that public-facing advocacy along traditional lines to make, eg, embryo editing for human enhancement acceptable is unlikely to succeed.
Related, is a track record of success: if a given approach has worked in the past, and doesn’t seem like it has exhausted its possible gains, continue doing it.

Large-scale (national-level) biobanks with open data access policies that combine genetic and phenotypic information have been enormously important for improving our understanding of human genetics– best illustrated through the UK Biobank. While some traits are nearing saturation for common variants, much remains to be discovered in rare variants, structural variants, within-family prediction, and more diverse cohorts. We should thoroughly mine the biobank goldmine before concluding it is dry...

Neglectedness: is there plenty of extant interest in the area already? As Tyler Cowen would say, is the idea overrated or underrated by the broader science funding ecosystem?

For that reason, embryo editing seems likely to benefit substantially from broader scientific interest in gene editing, so even though it seems highly impactful, I didn’t find projects specifically aimed at improving editing especially promising. However, projects aimed at specific challenges in embryo editing do seem more neglected.

Leverage: if large-scale funders of science can be convinced to shift their priorities by X% through a much smaller expenditure Y, that small expenditure effectively led to a large change in funding, through a relatively small expenditure.

That’s why ideas that promise to change policy maker decisions in a broad sense are especially promising: eg, arguments along the lines of Dean Spears’ group that the statistical value of human life implies pro-natalist government policies are highly underfunded.

Demography

Giant baby bonus

a pilot project trialing very large baby bonuses (not a few hundred dollars, but something like 50k for a few years), ideally with a few different incentive sizes to get a sense of the demand curve.

Improving childcare tech

A project focused on improving childcare technology, like making more technology like the automatic baby rocker (baby snoo).

A good start to this project would be researching what parts of childcare are most costly and most time-intensive as a way to prioritize tech in those areas.
There seems to be substantial interest in pro-natalist policies from some governments (eg, China, Russia) to Elon Musk, and the number of officially pronatal countries has increased steadily from 1976 to 2015 (data from UN database of World Population Policies, courtesy of Lyman).

⭐⭐🚀Pro-natalism is underfunded relative to other causes

Dean Spears’ group contends that making more people (eg, raising TFR) is underfunded relative to statistical value of human life that federal agencies use. Funding them to explore this question further and make this argument publicly.
Funding pro-natalist think tanks like the Institute for Family Studies

Fertility education programs

Lyman noted that some small RCTs had found that fertility education changed stated fertility preferences, though none had directly measured changes in completed fertility. He suggested a large-scale study involving 4k women, who are currently childless, age 20-30, comparing [no treatment] vs [fitness and wellness program] vs [pro-natalist fertility awareness/education program] vs [both programs]. More importantly, he emphasized a long follow-up period over a decade to see if the change in fertility preferences actually altered fertility behavior.

There’s a small RCT from Canada showing that a fertility education intervention reduced intentions to delay childbirth relative to an alcohol-related education intervention.
There is a small RCT from Japan showing that partnered women exposed to a fertility education intervention accelerated birth timing, but the study had a high attrition rate and the main finding (if I understand correctly) was the result of post hoc subgroup analysis.

Environmental pollution

While I am skeptical that environmental pollutants have a large impact on infertility, there are ancillary benefits of better pollution management that may make it a smart idea overall. With that in mind, some infertility add-ons to a pollution-focused project may be wise. I would defer to Daniel Goodwin’s ideas on this.

⭐⭐🚀Exposome add-on to NIH AllofUs biobank:

The NIH’s AllofUs program, while a genomics focused biobank, is interested in adding an environmental exposure component. Some fertility-related ideas that it might be interesting to propose as an add-on, if they’re not already there:

antral follicle count (AFC) as a endophenotype for fertility;
blood levels / metabolites of pollutants vs various fertility phenotypes;
identifying genetic variants that track environmental pollution can be used as instruments for Mendelian Randomization studies. Eg, are there any variants that cause lower or higher levels of some candidate pollutant that's thought to influence fertility? This would be analogous to using Mendelian Randomization for alcohol consumption.
The MillionMarker team seems like a natural synergy here

Improving IVF

Improving hormone assays for monitoring IVF cycles and predicting ovarian reserve

Measuring hormone levels is routine, though the currently used methods are not perfect. Some problems include autoantibodies causing hormone clumping and slightly different hormone isoforms having substantively different biological activity but showing up as the same on immunoassays. This is likely responsible for some diagnostic “fuzziness” and heterogeneity. A possible takeaway is that better hormone measurement techniques may yield unexpected fruit by improving diagnostic precision. Because anti-Müllerian hormone levels, aside from age, are the best predictor of ovarian reserve, improving hormone assays, if AMH levels are currently imperfectly measured, might improve ovarian reserve prediction, which would be useful.
There have been some attempts at this, with an gNRH pump that was designed to mimic natural rhythms of FSH/LH, for FSH LH deficient patients but it was expensive and cumbersome and is not on the market anymore.

Improving IVF research quality and methodology

Educating clinicians

Educating clinicians to be more skeptical consumers of research, eg, that “big data + ML” cannot replace RCTs. Doctors (at least in the US) are required to take a certain amount of CME credits for continued maintenance of their medical license, so developing a “research literacy” course that qualifies for CME credits might be one high-leverage place to intervene.

Regulating IVF clinic add-ons more stringently

Regulating how clinics market add-on treatments with no strong evidence. A partial step in this direction was the UK regulator instituting a traffic light system to explain the levels of evidence behind IVF add-ons. Jack was pessimistic regarding this being a realistic goal because the reaction of clinicians and embryologists was mixed even to the traffic light system described above^[51].

⭐ ⭐Funding pivotal IVF replication and methodologists

Funders (government, eg, NIH) ought to award funding for large and simple RCTs on fertility interventions. Funding should be set aside explicitly for this purpose and awarded to teams with a track record of carrying out large RCTs.
Funders who are supporting RCTs should require a methodological collaborator on grants.
🚀🚀Funders should fund methodologists like Jack Wilkinson to investigate some especially tricky problems that fertility trials run into like multi-stage treatment and participants receiving multiple treatments

Pressure fertility journals to raise rigor

Pushing for the routine use of methodological peer reviewers (not just clinical peer reviewers) in prominent fertility journals
Trying to standardize outcome reporting in prominent fertility journals
Moving to pre-registration for more research beyond RCTs, eg, pre-registration of in-vitro studies. One possible way to incentivize this would be for large-scale funders to award extra grant money for doing this: eg, an NIH grant that unlocks an extra 10% of funding conditional on investigators pre-registering in-vitro trials.

Misc.

Large-scale embryo culture medium trial

More speculative ideas:

Naming and shaming IVF add-ons with poor evidence

Funding a small group to more aggressively market the UK “traffic light” system by naming and shaming clinics who have especially aggressive/deceptive marketing of IVF add-ons.

Restricting insurance coverage to IVF add-ons with evidence

Some US states mandate insurance coverage for ART, to varying degrees. Investigate how coverage policies were formed, and push to make insurance coverage available only for treatments with good evidence^[52].

⭐ Reproducibility project for fertility

Reaching out to influential and well-networked science policy leaders like Stuart Buck (who funded the Reproducibility Project, focused on psychology) and making them aware of the reproducibility crisis in fertility.
Helping organize and fund the equivalent of SIPS (Society for the Improvement of Psychological Science) for fertility, to raise awareness in the field.

Reproductive Aging

⭐Better prediction through improving biobanks

Better prediction of which women will have earlier-onset subfertility would be useful for advising earlier pregnancy in those women or offering fertility preservation. One approach that seems somewhat promising is developing polygenic risk scores for early menopause and related phenotypes. Some work on this has been done already, but larger sample sizes, more diverse cohorts, and deeper phenotyping (if possible– it's unclear how realistic obtaining Antral Follicle Counts for a few thousand women in a biobank would be...) would all likely improve these PRS scores.

A project focused on predicting embryo implantation success.

Better prediction of embryo quality or uterine receptivity.

We lack good predictors of gamete function and it seems likely that differences in gamete function explain a large proportion of unexplained infertility. There are several possible approaches to this:

Predicting embryo quality through pooling data, genetic data, and ML video analysis of embryos in culture

Developing large datasets of embryo genetic data, perhaps through multiple genetic testing companies/labs pooling data, and following the subsequent live birth rate, and trying to develop polygenic risk scores for embryo implantation success. (this idea is indebted to Steve Hsu)
Using machine learning on video of embryos in vitro to see if any ML models can be developed that can predict embryo implantation success. This has been tried, but I’m unsure if the groups doing this had sufficiently large sample sizes and ML expertise. Because this is something that benefits significantly from scale, an project could be a natural Schelling point for many large consortiums to pool data and expertise.

Understanding uterine receptivity

Uterine receptivity seems poorly understood as well. Better hormone monitoring and/or more experimental methods of monitoring uterine function (eg, single-cell transcriptomics of uterine tissue, focusing on immune function/rejection) might yield data on what predicts uterine receptivity and how to increase it.

Pressuring aging researchers to add reproductive measures as outcomes in trials

Something Marco Demaria suggested in an email is adding reproductive outcomes as an outcome to preclinical and clinical trials on anti-aging interventions. Awarding grants to investigators to add on these measures to planned or extant trials seems like a reasonably simple approach. Eg, fund Loyal to test for effects of their candidate drug on fertility in dogs.

IVM

IVM in animals as a ceiling

As a sanity check to get a sense of how optimistic we should be, a review paper on IVM success rates in animal breeding to see if it has achieved success rates comparable to IVF in that setting.

⭐🚀 Funding better surgical tools for follicular aspiration

Improving recovery rate of follicles per follicle aspirated, perhaps through funding the development of better surgical tools or imaging technology (eg, something like AI-enhanced ultrasound for follicle retrieval) for in-vitro maturation. More speculatively, better surgical tools could de-skill oocyte retrieval, allowing more practitioners to perform the procedure (which would lower costs) or improve per-physician productivity.

Prizes for successful human IVM from primordial follicle

(through prize authority of federal government or private donors, if the goal is too controversial for public funding)

In-depth investigation of ovarian cryopreservation and autotransplantation

What is the current state-of-the-art for organ cryopreservation in organ transplantation? Does ovarian cryopreservation follow those methods? Other organs are generally not frozen, merely kept a few degrees above freezing and I’m unaware if this has been a line of research pursued for other organs.
How much ischemic damage is seen in other organ transplantation surgeries? Is the ischemic damage observed in ovarian cryopreservation particularly high relative to other organ transplantation outcomes?

Hanna Olesen studies this topic and would be a good person to follow-up on this, per Merrick.

Once those questions are answered, a more informed answer on whether it is worth funding further research in it would be possible.

IVG

Legal Report on IVG in different jurisdictions

Comprehensive report on the laws and regulations on clinical IVG use, especially comparing different jurisdictions. The CEO of an IVG company noted this was something they planned to do internally eventually, but only once they were close to clinical trials. Clarifying this in-advance seems useful, since if regulations limit clinical application, knowing this in advance, and planning ways to change it, would save time.

More ambitiously, this might consist of putting together a group of high-prestige experts to engage the FDA on specific regulatory questions like “what is the minimum set of animal and in-vitro experiments they would demand for human clinical use of IVG?” Since human IVG seems likely to be controversial, and perhaps politicized, obtaining regulatory guidance before any controversy, seems wise.
Since the editing of human embryos for subsequent clinical use is illegal, clarifying if this applied to editing to fix mutations induced by the IVG process (eg, somatic mutations as described above) would also be valuable.
There is a possibility of a backfire effect here and one CEO told me he regretted speaking to regulators informally on this subject. He thought that waiting till the science was much more advanced was a better idea.
Note: this report explores US state and federal policy on embryo and embryoid research.

⭐🚀🚀Ask experts for public goods ideas

Hiring a subject-matter expert to survey researchers and ask for scientific public goods that would help IVG research specifically. Merrick had some ideas along these lines, but asking for a handful from several labs would be helpful.

Single-cell seq atlas for reproductive cells

Do single-cell sequencing on lots of different reproductive cell types. This would require obtaining human fetal tissue, which is difficult with federal funding rules. There’s a recent academic effort that is similar to this, the human reproductive cell atlas, so I’d want clarification on how this is different from that before pursuing this.
Scale up Prof. Haiqi Chen’s sperm atlas work

⭐🚀🚀 Making a well-characterized IPSC cell bank for research.

There are lots of reproducibility issues that stem from cell lines that are slightly different, that results in protocols not being robust. Also, having standardized reporter cell lines would be helpful, since in the current state of affairs, labs need to reengineer reporter cell lines of interest, which takes a few months and doesn’t always work reliably.

⭐🚀Non-destructive ways to image and screen embryos and gametes.

Here is some distantly related work as an example.
In that study, RNAseq profiles could be predicted from Raman microscopy data with machine learning: “... spatially resolved single33 molecule RNA-FISH (smFISH) data as anchors to link scRNA-seq profiles to the paired 34 spatial hyperspectral Raman images”. This would be useful for the numerous quality control steps that are anticipated to be required in IVG, and might also be useful for gamete selection. Jeff Hsu was excited about this idea. When I brought this up to scientists/founders with an expertise in single-cell transcriptomics, they said the group that did that paper is very reputable; the work is still quite preliminary; and this kind of technology would help CAR-T-cell quality control as well.

⭐⭐Lowering sequencing costs

Lower sequencing costs would reduce quality control costs, which are anticipated to be high. I don’t have any specific projects in mind here. Single-cell or few-cell whole-genome data (the kind of data you might get from a trophectoderm biopsy) is likely to be especially relevant for fertility, so improvement there would be especially valuable.
This would have big implications for a variety of other biotechnology, in a way that likely promotes defense over offense, so I’m excited about this idea even outside of its implications for fertility.

Sperm selection

Reproducibility Project for sperm selection

The Cochrane review argues that the evidence as of 2019 on sperm selection technology for ART in general was very poor. If that assessment holds in 2022, a reproducibility-focused project for sperm selection would be a good idea. Consult someone like Dr. Daniele Teixeira, who wrote the Cochrane review on IMSI, as well as a subject-matter expert on sperm selection, and sketch sample size necessary, etc. to answer relevant questions.

⭐⭐Clarifying between-sperm correlation with individual differences

An project that carries out single-cell sequencing of sperm from a diverse sample of deeply phenotyped men that also tries to obtain clarity on within-individual, between-sperm correlations between sperm phenotypes and individual genotypes. This is similar to what Gwern proposes in his notes here.
Similar projects along these lines would be sequencing sperm that have been sorted by any number of promising sort-on-phenotype methods and giving lower and upper bounds on how strong any sperm-offspring correlations could be.

⭐Raman microscopy combined with machine learning vs sperm sequencing

Applying Raman microscopy (linked in IVG section) and machine learning to sperm, in combination with sperm sequencing, to see if any genetic-level information can be picked up.
Eg, is there any correlation between the type of RNA activity that Raman microscopy can hypothetically pick up and genes of interest? In more detail: apply Raman microscopy, which is non-destructive, to a large sample of diverse sperm, which are then destructively sequenced. Train an ML model on the microscopy data to see if any genetic information can be inferred from the microscopy data.

Embryo Editing

Obtaining regulatory clarity (overlaps with an idea from the IVG section)

Obtaining regulatory clarity on heritable germline editing, ideally for severe genetic diseases with no alternative treatment, would theoretically allow heritable germline editing to proceed. I am not sure that an advocacy campaign centered on this would work given the potential for backlash if it became highly salient.

Embryo Selection on polygenic traits

Improving computational methods

Funding AI fellowships to train and get people with diverse machine learning backgrounds interested in the field.

⭐⭐🚀Funding more within-family studies in biobanks to improve between-embryo prediction and better understand causality

Using related individuals allows us to find genetic effects that are not driven by unobserved environmental confounders as they are from the same family, also improves our ability to perform between-embryo (which are generally siblings) prediction.

Improving biobank phenotyping

depth of phenotyping needs to be improved such that phenotypes that are mechanistically closer to the underlying biology can be studied.

⭐⭐Increasing diversity in biobanks

Given the logistical difficulties in setting up new biobanks in countries without national electronic healthcare systems, funding would be most cost effective and beneficial if directed towards expanding biobanking efforts (including entire families) in countries with diverse populations and appropriate existing infrastructure as well as seeking to develop severe disease-specific cohorts in underrepresented countries.

Incorporating rare variants into embryo selection

Computational tools to predict the effects of rare variants need to be improved, as well as pushing for large whole-genome and whole-exome cohorts, not just genotyping (which cannot pick up rare variants).

Pregnancy

Identifying and characterizing better animal models for human pregnancy

One possible reason that preeclampsia is still poorly understood is that we lack a good animal model for human pregnancy. Only in great apes does the embryo completely invade the endometrium. Since the invasion process is part of what appears to go wrong in preeclampsia, and experimentation on great apes presents ethical/regulatory challenges, this may limit our understanding. Identification of a logistically easier animal model or appropriate organoid model, might improve our understanding of implantation.
Overall, I view improving implantation rates as an important target. Relative to a goal like “improve our understanding of the genetics of complex traits'', which has a straightforward mechanism of increasing GWAS sample size and phenotyping, # of whole genomes, and within-family studies, improving implantation rates seems less straightforward.

Fund organoid development for better research models of key questions

Uterus placenta interface is poorly understood and important for implantation and preeclampsia

Racial equity element here: since AA women face higher mortality from preeclampsia, any treatment would probably help them disproportionately.
Making a good model that can substitute for great ape research.

Examining the COVID lockdown –> reduction in prematurity

Rates of extreme prematurity dropped during COVID in some areas, which raises my suspicion that less exposure to some microorganisms was the cause.
Use next-gen “forensic” (like Daniel Goodwin uses the term) technology to rigorously track a cohort of pregnant women and identify the causative microorganisms.

Miscellaneous

⭐🚀High-quality polling on public attitudes toward ART

Do high-quality polling (eg, not biased issue polling by partisan organizations) on IVG, IVM, and various reproductive technologies to get a better sense of what the public actually thinks about a lot of these issues, and see how much interest there could be in less regulation on some of these questions.

⭐Fund an animal breeding expert to answer “what low-hanging fruit are we missing in humans?”
Ongoing “replication fund”

Fund independent replications of controversial but pivotal results in fertility, ala the Reproducibility Project.

Contact me

DM me on Twitter or email me at wjchertman@gmail.com.

Acknowledgements

Thanks to Isabel Juniewicz, Lyman Stone, Mackenzie Dion, Steve Hsu, Jack Wilkinson, Daniel Goodwin, Simon Dadoun, Haiqi Chen, Robert Gilchrist, Paula Amata, Matt Krisiloff, Marco Demario, Max Berry, Jeff Hsu, Dean Spears, Aria Babu, Alexander Young, Reza Nosrati, Ruxandra Tesloianu, Noor Siddiqui and Merrick Pierson Smela for being generous with comments/feedback/interviews. To the pseudonymous contributors: thank you too!

Even bigger thanks to Milan Cvitkovic (awesome blog) for lots of feedback, continued encouragement, and prodding me to do this in the first place!

And thanks to Gwern for writing an Embryo Selection FAQ all the way back in 2016.

Appendix

Contains summaries of conversations with most of the people I’ve spoken to as well as notes on some of their papers I read.

Environmental pollution

Daniel Godwin

Summary: We spoke about the possible impact of pollutants on fertility. He pointed me to two companies in this space: Maximus and Millionmarker. Daniel is working on a whitepaper focused on small molecule pollution. He relayed a conversation he had with a senior ex-FDA official that getting a drug approved to remove toxins that are EPA approved, would be a tough sell in the FDA. He points to changes in testosterone and sperm levels as reasons to think pollution may be causing some infertility. He also thinks some large-scale changes in behavior might have some relationship to pollutants. I am convinced that pollution is underexplored, and impressed by his ideas on how to better understand and address it, but not convinced that pollution is causing a large increase in infertility per se. I think his pollution –> behavior–>reduction in fertility idea is more plausible. My skepticism for pollution having large effects on fertility in high-income countries, besides through behavioral change, comes from the following reasoning:

Populations that explicitly aim for high fertility, such as Orthodox Jews and Hutterites, achieve comparable or higher fertility than historical populations with very high fertility, such as the Quebecois. A counter-argument to this might be that Hutterites are exposed to very different levels of pollution than the average American, but Orthodox Jews live in urban environments that are comparable to the average urban-dwelling American. This places a ceiling on how large an effect pollution could be having on fertility.
Age-adjusted infertility rates have probably not increased over the last 100 years, and may have decreased due to improvements in preventing/treating STDs and the availability of ART.
The much-publicized drop in sperm counts may be due partially to changes in measurement techniques. In addition, past about 40 million sperm per mL there may be no increase in pregnancy rates. While a drop in average sperm count would cause an increase in men below the 40 million threshold, it is unclear to me what % of men would be affected and how that would affect TFR overall.

Pro-natalist policy

Lyman Stone

Summary: The consensus of the literature is that reviews conclude pro-natalist policies work, but the magnitude of effect isn’t too large; eg, a change from 1.4 to 2.2 is unlikely; 1.6 to 1.8 may be feasible with expensive policies; .05-.2 fertility boost; 100k-400k per additional US birth, much cheaper than US statistical life; baby bonuses=edge of Overton window, most cost effective, front-loading the $ helps. Something he thinks is underrated is doing better messaging that names the programs in a more pronatalist manner, calling something a “baby bonus” directly; along re: Overton window, nobody has ever tried paying a woman to have kids in at such high rates that it becomes more profitable than having a job;

On ART and TFR: ART in total accounts for ~ 6-7% of births in high-income countries. In the US it's like 4% (IVF + ART drugs). So it can’t be a huge effect, but it could be a moderately sized effect (eg, 4% of 1.7 is .068, which is comparable to lower estimates of pronatalist policies). However, if reproductive technology just extends reproductive lifespan, it may just push fertility to later in life, so the net effect is unclear to him. We’re also not close to the limits on natural fertility, so its not the rate-limiting step.

What’s a good lever to push on? There’s good data arguing the mommy wage penalty (Henrik Kleven) is from childcare, not pregnancy, by comparing adopted vs biological mothers, so childcare may be a better lever. Parenting norms are less intense in higher fertility areas like Utah. He has unpublished data (which he blogged about here) showing that a change in Georgian Orthodox Church rhetoric/policy that raised the prestige of parenting causes a big jump in fertility (1.5 to 2.2) without any government expenditure/change.

On expected vs realized fertility and fertility preferences: There’s a robust gap between realized and preferred fertility. Surplus labor like older relatives and siblings can reduce the cost of childcare. Preferred fertility has predictive power, even if people consistently undershoot their preferred fertility, and stated child number preference at 18 predicts TFR at 40. Higher fertility groups have higher preferences. Cost of childcare and opportunity cost have increased because as income rises, the scope of leisure opportunities has increased. Different attitudes towards career satisfaction and family preferences do predict fertility somewhat;

Wider social attitudes and fertility: Changing social script on child bearing seems important; his research is focused on this; So many TV shows have basically 0 children, nobody has any kids; Eg, The Expanse; [editor’s note: there’s some econometric evidence that Brazilian telenovela exposure reduced fertility, particularly in lower SES women. ]

When/why did demographic transition happen: Best evidence is that the first transition of fertility (France, Massachusetts) comes with secularization; fertility transition happens before infant mortality rates fall; France had fertility transition 1 century before Germany; Paper: Censorship and birth control censorship in UK, apparently had a big effect on fertility rates; Culture is really the spark;

Exporting smaller family sizes: “Developmental idealism” Arnold Thornton is the culprit for fertility transition today; development experts and mass media will basically propagandize about smaller family sizes, individuals associate development with small family sizes; countries will tackle the correlates of development instead of the core stuff like good institutions; The development industry arose with 1940-60’s institutions and politics, which he thinks neutralized their ability to say X institution is good, not just Y correlate of development/growth is good; World Bank and IMF are sometimes prohibited from saying “in order to get economic development you need to expand voting franchise and have competitive elections and you need good property rights regime”; tied up with anti-colonialism; Good author on this; Book recs on British colonial legacy: James Ferguson anti political machine; Mathew Lange legacies of despotism and development

Dean Spears

Summary of meeting

Summary: Dean and colleagues at UT Austin are starting a group, and one of their core ideas is that increasing fertility rates, at current TFR levels, is good even for average utilitarians. This is because of positive returns to scale that seem to hold in the modern economy.

A specific project along these lines is combining the Nordhaus model of climate change with Romer’s model of endogenous growth theory, and realistic TFR projections, to show that because of population momentum, even with a rapid TFR rebound to replacement or above-replacement, the critical Q of “will climate change be a big deal” will be baked in / dealt with (or not) by the time higher TFR increases population size. Basically, his group is arguing that even with climate change in mind, increasing TFR rates on the margin is a good idea. Their preliminary results are that paying up to 1 million dollars for an extra child today has positive returns.

Along with this goal, he also thinks the Second Demographic Transition, in which TFR drops below replacement, is best explained by changes in preferences and an increase in the opportunity cost of children, as opposed to constraints (ala Becker’s quantity-quality) tradeoff. In other words, having a child is a larger opportunity cost in a world with lots of entertainment that a child competes with. The case for this is that we clearly live in a much richer world than 50-100 years ago, and fertility rates are much lower- what is the material constraint? In addition, there are places with female labor force participation of ~25% where desired family size is still around 2, so the constraint of female time can’t be the explanation there.

Some project ideas he had: a pilot project trialing very large baby bonuses (not a few hundred dollars, but something like 50k for a few years), ideally with a few different incentive sizes to get a sense of the demand curve; project focused on improving childcare technology, like making equivalents of a baby snoo.

He is very skeptical that the extant range of pro-natalist policies will change things on the scale required to move TFR to 2. Sweden has generous parental leave policies and other policies, but its TFR is only 1.76. He also flagged that the empirical demography field may have something of a file-drawer problem, such that positive effects are reported while negative results are not, and advised some skepticism of smaller studies. Getting TFR to replacement would require, in his view, policies that are far outside the range of current pro-natalist policies, something he seemed to agree with Lyman about.

He also emphasized that surveys on intended fertility, at least outside of East Asia, show desired family size is higher than is achieved, so we would be helping women achieve desired fertility, not burdening them with children they don’t want.

Somewhat contra to Kaufmann, he thinks that religious groups retaining high fertility rates will likely not be enough to stave off below-replacement TFR. He argues that given retention rates of 50%, groups would need fertility rates above 4 to continue growing, which is quite high. Another important point he made was that even if we are somewhat convinced by Kaufmann’s argument, we should hedge our bets against the possibility of religious groups TFR declining by pushing for higher TFR of everybody. Working paper his group wrote, under certain assumptions, heritability of fertility and existence of high fertility subgroups still doesn't fix declining populations unless fertility is quite relative to defection rates. This is similar to the conclusion that Isabel Juniewicz comes to in a recent blog post.

They come at ART from a market failure/externality problem; people’s choices are shaped by preferences/incentives; difficult to make policy that can really change stuff like that; when people make private decisions; fertility is a quantitatively important instance of market failure; they’re starting a new group that thinks of the marginal social benefit of extra people as going in the other direction because of larger economy is better for everybody, eg, returns to scale; a lot of this is driven by higher rates of economic growth; their first project takes off shelf components 2018 nobel prize + 2018 Nobel prize Romer, put them together and see if climate externalities or effects of innovation are more important; most important story is the positive externality of more economic growth; population momentum is part of this story, basically climate change story will be played out by the time fertility differences make a big difference; their best guess is one extra person today will be worth ~ 1 million $ in making world richer/better etc. ;the good news is that many women in low-fertility populations say they want more children than they’re having; goal of their group is to fill in the details; basically per-capita standards still improve, gets around aggregation utility problem;

Notes from the papers he recommended

Very long range global Population Scenarios to 2300 and the Implications of Sustained low Fertility

European ~ 1.5; SE Asia and Central America ~2.5;
Global TFR 4.9 in the 1960’s
The UN population projections out to 2300 only use a narrow band of possible fertility scenarios, from 1.85 to 2.35.
“fertility ideals and intentions have been described as powerful predictors of future fertility behavior (e.g. Morgan and Rackin 2010).”
The two-child family size is the ideal in Europe, but is likely smaller in East Asia. and the “Low Fertility Trap” hypothesis states that this ideal will make raising TFR in the future harder.
East Asia– Japan, Singapore, Hong Kong, South Korea, and Taiwan– have bottomed out below 1.3 TFR. Major Chinese cities have even lower levels of 1.07-1.23, and lower ideals of family size, though people’s responses may be somewhat biased downwards by social desirability bias.
The authors think that extant fertility projections assume the European fertility norm is the model one, but it may not be.
They assume substantial progress in medicine with even the 100-year average lifespan, and especially with 120 average lifespan, but they’re also assuming not reaching longevity escape velocity.
Lifespan increases don’t make a huge difference.
Authors don’t think dependency ratio is a huge deal if elderly health continues to improve and young productivity increases;
Seems like the important open questions are:

Where do TFR rates stabilize?
When does Sub-Saharan African TFR stabilize?

Misc thoughts I had in response

There are unpredictable political and cultural responses that should perhaps increase the variance of outcomes we expect– eg, South Korea, just elected a politician who has explicitly invoked some incel ideas on feminism.

Working paper his group wrote, under certain assumptions, heritability of fertility and existence of high fertility subgroups still doesn't fix declining populations.

Embryo Selection

Meeting with RM

Note: RM is a pseudonym

Summary: endorses more within-family studies as a way to get better within-family prediction (which is what embryo selection is), thinks that personality and facial attractiveness are understudied and thinks that linking genetic data + social media could fix that (lots of pictures of faces, lots of output on social media that indicates personality) but is pessimistic this will happen in the near/medium-term; thinks that rare variants are generally underrated, there’s plenty of evidence rare variants can have big effects on polygenic traits (eg, FBRN variant in Peruvians that causes substantial decrease in height, data on rare variants that disrupt protein function have large effects on phenotype (usually negative)).

In-vitro Gametogenesis

Haiqi Chen

A challenge with understanding spermatogenesis is that it takes place in a spatially ordered manner, with different sections of the seminiferous tubule corresponding to different stages of development. I spoke to Prof Haiqi Chen, who focuses on spermatogenesis and applied Slide-seq, which is a method of performing transcriptomics that preserves spatial information, to mouse and human testicles. While learning more about spermatogenesis will surely improve our ability to treat male infertility, the heterogeneity of hard-to-treat male infertility makes me somewhat pessimistic that this is an especially scalable solution. However, if this method enables IVG for sperm, in a mostly disease-agnostic way, that would sidestep the heterogeneity issue.

One of his papers:

Dissecting Mammalian Spermatogenesis through spatial transcriptomics

Basic idea is that since the function of the testes is tied closely to their spatial organization, we need new methods to understand what is going in that specific context:
In that paper they built an atlas; they find differences between mouse/human testes; found possible diabetes-->infertility mechanism
They confirmed their method works by recapitulating what is already known.

Summary: thinks that we need a comprehensive understanding of how spermatogenesis happens before we can attempt to copy it; to rule out subtle issues like imprinting issues we would need lots of testing. Optimistic scenario: Guess 10 years for an in-lab human spermatogenesis, 20 years to do thorough testing. His general feeling on in-vitro fertilization is that it's somewhat crude and the harms of it are understudied. We don’t understand most unexplained infertility, so we’re not actually fixing the core issue in most cases of IVF that don’t involve aging. He thinks we need a precision medicine approach to male infertility– different treatments for different disorders of male infertility. Thinks there are some subtle long-term issues with IVF we don’t fully understand (eg, imprinting issues).

Merrick Pierson Smela

Summary: Merrick was optimistic that IVG could be in clinical trials in 5-10 years and that human oocyte-like cells could be achieved in-vitro in 2-3 years. He is optimistic because multiple groups are working on this and it has been successfully achieved in multiple mammals. He thinks specific details of protocols (that work in animals) will have to be modified substantially. When I told him about Haiqi Chen’s work on single–cell transcriptomics that preserves spatial information, he was unsure how useful it was for IVG in oocytes, but thought it might be needed for sperm development.

He agrees that improving IVM knowledge would help with IVG research. He attributes the delay in human IVG and IVM success to human fetal tissue being much harder to obtain and mouse development timelines being much shorter. He is optimistic regarding somatic cell nuclear transfer, since the pig xenotransplants were made with that method– however, he noted this was quite expensive. He agrees that IVF research and labs are inconsistent, and is somewhat optimistic that uterine preparation could be improved.

He is less optimistic re: embryo editing, thinks there are many challenges with it. In animals, the F2 generation is what’s usually used in research to get around the editing efficiency issue, which is problematic in humans–something that might work better is editing a stem cell line to make edited gametes. He doesn’t know a lot about sperm selection, but he thought that non-destructive sequencing would be quite tough, and would require something like immobilizing gametes during meiosis and sequencing some of the sibling(?)-cells of a resultant gamete, and then inferring the gamete genotype. If there really is a reliable correlation between an easily measurable sperm phenotype (eg, motility) and some other genetic or phenotypic trait, then that would be ideal. He recommends looking into the animal breeding literature, since he thinks they have probably explored that question in more detail.

In the scenario in which complete IVG in humans is very difficult or impossible, being able to induce meiosis in addition to somatic cell nuclear transfer would still be highly impactful, since that would permit some degree of iterated embryo selection, though with the requirement of needing oocyte acceptor cells. Oocyte acceptor cells (also known as “ovarian supporting cells”) have been generated from pluripotent stem cells in mice, and if this could be achieved in humans, could reduce the bottleneck of requiring natural human oocytes, which are expensive and scarce.

Some project ideas he was excited about:

Do single-cell sequencing on lots of different reproductive cell types. This would require obtaining human fetal tissue, which is difficult with federal funding rules. There’s a recent academic effort that is similar to this, the human reproductive cell atlas, but he wasn’t sure how much data would be released from this effort when we first spoke– however, they did release all the data recently (1 2), so .
Making a well-characterized iPSC cell bank for research. There are lots of reproducibility issues that stem from cell lines that are slightly different, that results in protocols not being robust. Also, having reporter cell lines would be helpful, since in the current state of affairs, labs need to reengineer reporter cell lines of interest, which takes a few months and doesn’t always work reliably.
Making a livestock species that’s resistant to most viruses by inserting Cas13 into animal cells and making it act as a second immune system, presumably based on this work.

Matt Krisiloff

Summary: Matt is the CEO of Conception, which is likely the most late-stage IVG company. He was optimistic that in 5-10 years, there could be IVG clinical trials in humans and recently tweeted that a human egg could be generated in labs by 2023. Our conversation generally steered clear of detailed scientific discussion. He viewed regulatory caution on the part of the FDA as a barrier. He cited the example of mitochondrial replacement therapies (MRT), which are technically under FDA jurisdiction, but cannot legally proceed because Congress has prohibited the FDA from accepting clinical applications related to genetically modifying human embryos with heritable modifications. Since there’s heterogeneity in jurisdictions on laws relating to embryos (eg, UK and Australia both allow MRT), he is optimistic that other jurisdictions, in the scenario where the US initially prohibits it, would allow it.

He is optimistic re: reproductive aging approaches to fertility but thinks the gains would be incremental relative to IVG. He thinks endometrial/uterine preparedness is understudied and could improve IVF outcomes, since a lot of IVF fails because of that. He agrees ART uptake right now is low, but thinks uptake could improve substantially if more convenient, cheaper, and better ART technology was available. He thinks IVM could benefit from better surgical tools. He is generally interested in artificial wombs, unsure how realistic it is.

Jeffrey Hsu

Summary: CEO of Ivynatal, overall very optimistic re: achieving in-lab (not necessarily in clinical use) in-vitro gametogenesis in humans in 5-10 years. Some reasons for optimism: there are multiple labs working on this, at least three different startups, and substantial commercial interest from the agriculture and animal breeding and (a more minor contribution) de-extinction world. There are multiple approaches to IVG: somatic cell nuclear transfer and reprogramming. Reprogramming can be done through specific factors added to culture medium or through genetic manipulation. There have been substantial advances recently in finer control of methylation/demethylation.

Jeff Hsu, identified the following problems as the most central to clinical use of IVG in humans:

Yields at each step may be too low for cost-effective production, even if it is eventually technically feasible in labs.
Somatic cells have more mutations than germ cells, resulting in a high mutation burden in resultant cells
Some quality control measures, like sequencing, are expensive if done at current prices and at multiple steps and high volume.
Imprinting issues

Artificial Womb

Simon Dadoun

Summary:

I spoke to Dr. Simon Dadoun, who was excited about artificial wombs as a supplement for current neonatal care for extremely preterm infants but pessimistic about artificial wombs as a total substitute for natural pregnancy any time soon. Some key points we discussed:

Neonatal care for very preterm infants is very expensive so an artificial womb could be quite expensive before it is less affordable than the current standard.
Fetal lungs, brains, and eyes are the rate-limiting organs for premature infants.
There are very high rates of lifelong disability from various causes for infants born in the 20-28 week period– anything that improves this would be a big deal. He thinks this is a much better approach/framing than full-on “artificial wombs”.
The 2017 paper on artificial wombs in fetal lambs was roughly equivalent to 20-28 week old fetuses.
When fetal surgery is performed, some centers use small 5 mm ports vs an open approach, but all approaches leave the fetus in the uterus. You don’t take a fetus out and put it back in with current approaches.
Doesn’t think early C-sections to remove a fetus for transfer to an artificial womb is a good idea: early C-sections are very morbid, uterus is much thicker at that time, more dangerous, repair of uterus is harder.
Evidence base for fetal interventions is somewhat poor, a lot of it comes from decades-old research, he thinks there may be a component of [paraphrasing] “IRB’s are limited research that would be helpful.”

Misc

Steve Hsu

Summary: I spoke with Steve Hsu (co-founder of Genomic Prediction) about IVF usage rates, some of GP’s technology (particularly their aneuploidy screening), and IVF optimization. He agrees with me that large-effect size changes in the IVF protocol have probably been found already, but thinks there is substantial room for finding more small and medium-size effects. His ideas are similar to those of Jack Wilkinson’s: the sample sizes used in IVF studies are too small to reliably detect the likely effect sizes of interventions, and so many purported effects found probably don’t replicate. He recommends a project centered around coordinating many IVF centers to try different tweaks to the protocol. He thinks that IVF rates close to Denmark’s (approximately 2x current US) are a good proxy for US rates in the future, also US IVF experts generally predict a lot of growth. He also thinks that if IVF success rates improved substantially, that would change parental calculus re: IVF. He is somewhat optimistic re: more speculative ART technology, like in-vitro gametogenesis, but estimates that even under highly optimistic timelines, the clinical adoption of such technology would take a decade or more due to regulatory concerns.

One concrete improvement he proposed is using better techniques for aneuploidy screening. Per Hsu, current aneuploidy screening from embryo biopsies is woefully inadequate and has a high technical failure rate. Technical failures are from inconclusive test results, but are called (and reported back to patients) as “aneuploidy” so as to avoid the possibility of implanting aneuploid embryos. This results in a high rate of embryos being called as aneuploid, which results in unnecessary waste of embryos. Better aneuploidy testing, such as through GP’s technology, would thereby improve iVF cycle success rates compared to current methods of aneuploidy screening. GP has not tried generating embryo scores for implantation success, that’s something that might work, but nobody has tried yet. His GP co-founder Nathan thinks that “mosaicism” is partially a result of lab error / imperfect assays, and so that seems like another example of improvement in testing improving things.

He thinks that gulf states might be a good place for creative use of ART/screening because they have high rates of cousin marriage there, and are highly aware of possible issues from that.

[1] For instance, prominent behavioral geneticist James Lee has argued strongly against polygenic selection for most traits on the grounds that it would eventually fundamentally change “an aspect of our nature”.

[2] Of course, within countries, after the demographic transition has occurred, higher religiosity predicts higher fertility.

[3] Empty Planet: The Shock of Global Population Decline by John Ibbitson and Darrell Bricker

[4] Demographic Engineering: Population Strategies in Ethnic Conflict by Paul Morland

[5] Speroff’s Clinical Gynecologic Endocrinology and Infertility 9th Edition

[6] The better counterfactual is comparing the total number of children born with a shorter delay to ART versus a longer delay, which will surely be higher in the first case, but the difference attributable to ART will be smaller than [total number of children w/ shorter delay]-[total number of children w/ longer delay], since some of the children in the former would have occurred anyways with natural reproduction.

[7] This also doesn’t include extramarital/extra-couple relationships/affairs.

[8] My belief, derived from speaking/working with researchers in the genetics of male infertility, is that whole genome sequencing for men with idiopathic azoospermia will eventually increase the diagnostic yield considerably– perhaps in addition to polygenic risk scores for male infertility. This will necessitate large consortiums of infertile men that undergo whole-genome sequencing after the known genetic causes of male infertility are ruled out, such as y-chromosome microdeletions, aneuploidy, and other known mutations.

[9] It is likely that some andrologists would quote higher numbers for patients offered the TESE procedure, likely because of differences in patient selection or because many studies only report sperm retrieval rates (sperm retrieval is necessary but not sufficient for a live delivery).

[10] There are some male fertility procedures which are very invasive, eg,micro-TESE, micro-surgical testicular sperm extraction, in which a part of the testes are biopsied and sperm retrieved with microscopy.

[11] French TFR likely starting declining around 1790, so couples from 1670-1789 can reasonably be assumed to approximate a “natural fertility” populations.

[12] Having a lower than average age at menarche is associated with somewhat earlier age at menopause, but there does not appear to have been a cohort-level effect– women are not undergoing menopause any earlier now, even though on average they are undergoing menarche earlier.

[13] An important mechanism by STI’s can cause infertility is through pelvic inflammatory disorder (PID)

[14] This raises interesting population ethics questions, which, incidentally, were recently brought up in this SBF conversation with Tyler Cowen, implying SBF may be amenable to valuing pro-natalist interventions over life-extending interventions if the former is more cost-effective.

[15] A similar paper (H/T Lyman’s twitter feed...) showed a reduction in abortions in Italy but no rise in births.

[16] An uncharitable way to summarize this is that demographic idealism is a kind of “cargo-cult” development ideology, where epiphenomena of economic growth are taken to be causal factors in improving economic growth.

[17] Panhypopituitarism is a condition where the pituitary gland is damaged and reduces (or stops entirely) its production of hormones. Life expectancy may be somewhat reduced but is still close to normal with treatment.

[18] previously known as “intersex” or “hermaphrodites”, now a defunct term.

[19] There are important differences in mouse and human gametogenesis, but this summary applies relatively well to both.

[20] Speroff cites this study as evidence of this claim, which finds that a combination of male and female factors account for 39% of infertility in couples, female infertility alone accounts for 33%, and male infertility alone about 20%. Another way to support this claim is the following: 1) I argue elsewhere that delays in the age at which couples begin trying to conceive account for the majority of the decline in infertility; 2) female fertility declines much more with age than male fertility.

[21] One source estimates a cycle cost at between $ 15 and 30 thousand dollars; another source says ~ 500$/year for egg storage. There are some lower cost clinics that cost around four to five thousand.

[22] Best thought of as the extreme lower part of the bell curve of normal female reproductive aging, with the caveat that various insults (genetic, chemotherapy, radiation, etc.) effectively shift women to the left.

[23] A needle is used to aspirate oocytes from the follicles in the ovaries under ultrasound visualization, as shown here.

[24] Even if a single embryo is transferred, monozygotic twinning can still occur after transfer.

[25] A different metric than live birth rate per cycle, as the preceding paper used.

[26] In the fertility space, “add-ons” are often used to refer to treatments that can increase the chance of having a baby, so these would not be “add-ons” in that traditional sense.

[27] As I cover more in-depth in the Infertility by the Numbers section, this is using a loose definition of “difficulty with infertility”, and the proportion of men and women who are sterile if they begin trying to conceive in their early 20’s is closer to 2% than 10%.

[28] Jack and colleagues have a paper asking UK clinicians and embryologists their reaction to the traffic light system.

[29] This may already be standard practice– I have not looked deeply into insurance coverage for ART by state.

[30] Women with high antral follicle counts, who are good candidates for IVM

[31] Cancer is a common reason for ovarian transplantation.

[32] The work was funded partially by Gameto, another IVG-focused company.

[33] Which are high-quality, and represent the upper bound of IVF performance

[34] Which would overestimate the live birth rate

[35] See here for an accessible informal introduction to somatic mutations and evolution.

[36] See here, section “parental origin...” for an overview.

[37] However, I have not looked deeply into the level of testing scientists have subjected IVG derived organisms to.

[38] Per speaking with someone with an interest in this field– I have not verified this myself.

[39] as a result of imperfect currently available editing technology, likely only a fraction of embryos would be successfully edited

[40] Based on conversations with employees from two different genetic testing companies, who found their methods had fewer false aneuploidy calls and lower rates of technical failure than the conventional ploidy testing.

[41] Turley et al also bring up other problems, such as pleiotropy.

[42] Here is a lecture explaining confounding in GWAS

[43] This should theoretically only be an issue in embryos of admixed parents.

[44] The reasoning for this is as follows: while the proportion of infants having a pathogenic de novo mutation might be as high as ~1/300 (and the proportion with harmful but not quite pathogenic is likely much higher, depending where the threshold for “pathogenic” is set), the probability of a specific loci being mutated is much lower. Since we are concerned with calling a specific de novo mutation, which are very rare at any given location (though they are relatively common when considering the whole genome), we would need very accurate sequencing to accurately identify de novo mutations. Playing around with this calculator with reasonable values for sequencing accuracy (eg, 98%, 99%, 99.9%, 99.99%) and probability of a specific location having a mutation (perhaps 10-8), you need highly accurate sequencing to be confident in DNM calling.

[45] More precisely, unlike patients who generally use ART, they are unselected for fertility problems, and undergo an evaluation to make sure they are good candidates for egg donation.

[46] Among children and babies <5 years, prematurity is the leading cause of death

[47] A repository of risk calculators in medicine is here– click [specialty] –> [OB-gyn] to see all those available for obstetrics-gynecology.

[48] close to 25% for infantry mortality, and another 25% for child mortality, per Volk and Atkinson

[49] That is, conditioning on a given gestational age, outcomes have improved. The lower limit on viability has decreased at the same time, which is likely resulting in more infants of very low gestational ages being born with substantial morbidity.

[50] Per an OB-GYN, this is likely because the uterus is thicker the earlier the gestational age.

[51] Jack and colleagues have a paper asking UK clinicians and embryologists their reaction to the traffic light system.

[52] This may already be standard practice– I have not looked deeply into insurance coverage for ART by state.