323 Comments

> And then generalize further to the entire world population over all of human history, and it stops holding again, because most people are cavemen who eat grubs and use shells for money, and having more shells doesn’t make it any easier to find grubs.

This is inaccurate. The numbers are pretty fuzzy but I find reputable-looking estimates (e.g. https://www.ined.fr/en/everything_about_population/demographic-facts-sheets/faq/how-many-people-since-the-first-humans/) that roughly 50% of humans who ever lived were born after 1 AD.

Expand full comment

Also, I am often satisfied with psychological studies that may only apply to modern Americans. Or even modern upper-middle class Americans (if so labelled). If the findings don't hold in ancient Babylon I am okay with that.

Expand full comment

Exactly. There’s nothing wrong with studies that only apply to developed Western democracies if the people that will read and use the info are from developed Western democracies.

Expand full comment

Generalizability says a lot about mutability. Show me a population that thinks differently from mine and that is at least weak evidence my population can be changed.

Expand full comment

Having more shells might not make it easier to *find* grubs, but it does make it easier to *trade* your shells for grubs.

Expand full comment

Yeah, that part didn't make sense either. If having more shells doesn't allow you to obtain more grubs and services, then I don't think your society really uses shells as money.

Expand full comment

No. In that specific case it doesn't work. It would work for nuts, but grubs won't keep if you don't have some method of preserving them. You eat them as you dig them up.

OTOH, there's also a lot of question as to what "money" would mean to a caveman. It's quite plausible that saying they used shells for money is just wrong. But maybe not. Amber was certainly traded a very long time ago, and perhaps shells were also, but saying it was money is probably importing a bunch of ideas that they didn't have or want. That trading happened is incontestable, but that's not proof of money.

I *have* run into reports of shells being used for money, but I've also read of giant stone wheels being used for money. I really thing it's better to think of things like that as either trade goods or status markers. Those are features that money has, but it also has a lot of features that they don't, like fungibility.

Expand full comment
author
Dec 28, 2022·edited Dec 28, 2022Author

This is irrelevant and I shouldn't be arguing about it, but since I am, James Scott says that until 1600 AD the majority of humans lived outside state societies. I don't know if they were exactly grubs-and-shells level primitive, but I think it's plausible that the majority of humans who ever lived throughout time have been tribalists of some sort.

Expand full comment

The gulf between shells and grubs level primitive and 1600 AD pre state "primitive" is not irrelevant.

Wealth doesn't become irrelevant either no matter how far back you go, it's just accounted for differently.

Expand full comment

Granted! But the sentence as written absolutely implies a less nuanced comparison, e.g. BC population > AD population; one seldom thinks of more-modern stateless societies as mere shell-grubbers. ("Don't they, like, hunt and gather or something?") Seems like it'd pass a cost-benefit analysis to reword for clarity, or just remove entirely, since it is in fact irrelevant to the rest of the post. Noticing-of-confusion ought to be reserved for load-bearing parts, if possible.

Expand full comment

I agree. But it shows a pattern of Scott's repeated dismissal of valid criticism towards contrived selection criteria. Ie: doubles down with some weird and wrong cutoff for when we moved away from shells and grubs, or something akin to that.

It's similar to dismissing outright criticsm that his selection criteria for mental health might be invalid.

Expand full comment

I frankly didn't understand the hubbub about the previous post. Felt like one continuing pile-on of reading-things-extremely-literally leading to bad-faith assumptions and splitting-of-hairs, for what seemed like a pretty lightly advanced claim. It's riffing half-seriously on some random Twitter noise, the epistemic expectations should be set accordingly low. Like some weird parody playing out of what Outgroup thinks Rationalists are really like. Strange to watch.

Which doesn't give Scott free cover to be overly defensive here (even about minutiae like this - it's clearly not irrelevant if there's like 5 separate threads expressing same confusion, and you can't do the "I shouldn't argue about this, but I will" thing with a straight face), which is also Definitely A Thing. But I'm sympathetic to feeling attacked on all fronts and feeling the need to assert some control/last-wording over a snowballing situation. Sometimes the Principle of Charity means giving an opportunity to save face at the cost of some Bayes points, especially when it seems obvious both the writer and the commentariat are in a combative mood to begin with. Taking a graceful L requires largesse from both sides...

Expand full comment

Absolutely. The follow up thread from the original poster does a good job clearing up what he "really meant" by the tweet. There's some grace towards Scott, though probably more bitterness.

And yes, I would have been more okay with Scott's first article if he took a lighter approach, or even doing the same work and ending with a more open conclusion. I.e. there are a lot of people that report happiness and well being without also reporting spiritual experience, then go on to acknowledge, yes, this a half serious tweet, intentionally provocative, but what else might they might be pointing at?

Democratizing spiritual, mystical, or at least profound and sacred experience is vitally important. Attacking the claim with an unbalanced, unwavering, "fact-check" comes across as antithetical to that purpose.

Expand full comment

I think that the "obesity vs. income" concept becomes overextended once you go back to primordial times. "Income" implies an at least partly monetized economy; that is not how cave people traded stuff, so the question becomes meaningless for them.

Expand full comment

Wealth still applies. But I get what you're saying. Still, orders of magnitude more people existed after the widespread adoption of agriculture. I don't appreciate Scott's dismissal of this by contriving an arbitrary cutoff of 1600AD as the time when most people were now ruled by a state.

Expand full comment

Would this be something like how you expected things to work out if you had a conflict and needed help with it? Like, do you go to the police/legal system, or to your local lord, or to the local big man in the village, or to your brothers and cousins and extended family, or what?

Expand full comment
Dec 29, 2022·edited Dec 29, 2022

> James Scott says that until 1600 AD the majority of humans lived outside state societies

For all that I enjoy James Scott's work, I'm very skeptical of that claim. Estimates from multiple sources here https://en.wikipedia.org/wiki/Estimates_of_historical_world_population put world population around year 1 as between 150 and 300 millions, the average being 230 millions. Population estimates for the Roman Empire and Han China give 50-60 millions each*; Persia probably adds a couple tens of millions, and I'd expect the kingdoms in the Ganges plain to be not very far behind China. I would be very surprised if non-state people had amounted to more than 25% of the world total. Perhaps JS was counting some people who lived physically inside state boundaries as not really being *part* of state societies?

The majority of all humans *ever* is more plausible, though (most would still be farmers of some kind, but not necessarily ruled by a state).

* Eerie how evenly matched Rome and China were around year 1. Kind of a pity they never got to interact directly.

Expand full comment

Estimating the population of non-state societies is obviously difficult, but, yes, my guess would be that this is overlooking that the much higher population density of state societies. Any way, it's irrelevant because most people certainly lived in agricultural societies, which are able to accumulate wealth and trade it for other resources.

Expand full comment

Yeah, it's pretty counterintuitive.

Also, there were only about 115B humans, ever, so far. Which means that death rate is just 93%, currently. Before I saw that number, intuitively, I'd think it's something akin to >99.9% - virtually everyone died.

This makes not solving aging ASAP rather catastrophic. Sure, there were countless generations before who all died - but at least the population was relatively small back then...

Expand full comment

"Which means that death rate is just 93%, currently."

I have been know to point out that contrary to the common belief that death is certain, a more scientific approach notes that if we randomly select from all humans only 93% of them have died. With a large enough (and random enough) sample the obvious conclusion is that any human has only a 93% chance of dying rather than a 100% chance of dying.

Most people stubbornly cling to the common belief, however, even when shown the math. Sad.

Expand full comment

Empirical odds of any given human surviving 120 consecutive years still seem very slim.

Expand full comment

You're counting everyone currently alive as a sample case of a human that won't ever die.

Expand full comment

Well, by definition alive people haven't died ...

Expand full comment

worth noting that measuring this naively most people who have ever lived died before age 3, so you probably want to pick a better precise measure

Expand full comment

I assumed Scott was just exaggerating to make a point

Expand full comment

I'm pretty confused by this kind of attitude. To be quite frank I think it's in-group protectionism.

I'll start off by saying I think most psych studies are absolute garbage and aella's is no worse. But that doesn't mean aella's are _good_.

In particular, aella's studies are often related to extremely sensitive topics like sex, gender, wealth, etc. She's a self-proclaimed "slut" who posts nudes on the internet. Of course the people who answer these kinds of polls _when aella posts them_ are heavily biased relative to the population!

I think drawing conclusions about sex, gender, and other things from aella's polls is at least as fraught as drawing those conclusions from college freshmen. If you did a poll on marriage and divorce rates among college-educated people you would get wildly different results then at the population level. I don't see how this is any different from aella's polls.

Expand full comment
Dec 27, 2022·edited Dec 27, 2022

>If you did a poll on marriage and divorce rates among college-educated people you would get wildly different results then at the population level. I don't see how this is any different from aella's polls.

If you did the former in real life on a college campus you could publish your results in a journal, potentially after playing around a bit to find some subset of your data that meets a P value test. If you run an internet poll you will be inundated with comments about selection bias and sample sizes to no end.

There is no actual difference but there is a massive difference in reception/perception.

Expand full comment

My comment says explicitly "I'll start off by saying I think most psych studies are absolute garbage and aella's is no worse."

Just because there's high prestige in publishing garbage doesn't mean truth-seeking people should signal boost bad methodological results. As I said, this sounds like in-group protection, not truth-seeking.

Expand full comment

We don't know what you may be rolling into the word "most".

Imagine data X is published by aella on twitter, and Y is published by the chair cognitive science at harvard in the American journal of psychiatry. Most people, it's fair to say, will automatically give a bit more credit to the latter. If that's you, that's understandable.

But the *reason* for doing that should never be given as "aella has selection bias" - they will both have some amount of that. If the reason is going to be "one has MORE selection bias", then that should be demonstrated with reference to the sample both parties used.

The actual reason we give less credit to aella is likely to be about the fact that some sources of information are more "prestigous" than others; whether or not that information tells you anything about reality is, often, irrelevant. This creates a bias in all of us.

Expand full comment

You can't just simultaneously make a demand for greater rigor, and also ensconce internet polling behind a veil of unfalsifiability. It seems like you're asserting that we should treat these things equally until given a substantive reason for doing otherwise.

If the prestige and the selection bias are both unknown quantities and it's either impossible or impractical to establish quality and magnitude of effect, then why push back against the skepticism?

This article would be more truthful if it made the simple observation that data, as it is, generalizes rather poorly.

Expand full comment

I don't think you fairly described the point of the person you are responding to.

Expand full comment

That may be true. It's a bit difficult to discern the underlying arguments since the disagreement I'm responding to is happening on a higher level. I'm not trying to be disputatious for the sake of it, but it seems like there's something contradictory about the way the discussion is being framed.

Expand full comment

It's not just bad methodology, it's antithetical to psychology, it doesn't study psyche as an absolute but hand waves around the topic and commits itself to utilitarianism. It makes what random groups say about themselves provisionally universal (but not really, haha, we're too postmodern for that....). It's not psychology at all, and it should out itself as demographic data in every headline and every summary so that people can see who and why the illusion of generalization is being made in the name of science. This is Scott's worst take ever!

Expand full comment

...what? What should out itself as demographic data, and what is it being perceived as instead, and what illusion of generalization is being committed by whom?

Expand full comment

Journals vary significantly in their publishing standards, up to and including pay to play ones existing, but every standard psych undergrad education involves teaching students to think about selection bias in psych studies run on people willing to participate in psych studies at a college. This is as psych 101 of an observation as it gets. People in leadership positions at journals think about this too, and it's too cynical to assert they'll publish anything if the authors just fiddle with the p-values the right way. That's not really true once we separate out that the term "journals" includes everything from reputable high-quality journals to predatory fly-by-night operations.

Expand full comment

>but every standard psych undergrad education involves teaching students to think about selection bias in psych studies run on people willing to participate in psych studies at a college. This is as psych 101 of an observation as it gets.

I would be more impressed with this if most of the research wasn't so garbage.

>People in leadership positions at journals think about this too, and it's too cynical to assert they'll publish anything if the authors just fiddle with the p-values the right way.

I think you mean publish anything if it hits their feelies the right way. Academia seems to have really hemorrhaged people actually interested in the truth in recent decades. And the high quality journals are nearly as bad.

Expand full comment

Yeah, I don't agree with your assertion that most published psychological research is garbage and publication standards are little more than the emotional bias of editors due to a loss of people who care about truth.

To the original point, psychology as a field is aware of types of bias that derive from convenience samples. Individual actors and organizations vary in how well they approach the problem, and it helps no one to flatten those distinctions.

Expand full comment

Meh the field can win back my respect when it earns it. Too much much Gell-Mann effect triggering all the time from most non hard-science parts of academia (and even a little bit there) to really give it much faith these days.

You cannot be double checking everything you read, and so often when you do the papers are poorly thought out, poorly controlled, wildly overstating what they show, etc. And that is when they aren’t just naked attempts to justify political feelies regardless of what the facts are.

Academia had a huge amount of my respect from when I was say 10-20, to the extent I thought it was the main thing in society working and worth aspiring too.

Since then I have pretty much been consistently disappointed in it, and the more I look under the hood the more it looks like the Emperor is substantially naked.

Still much better than 50/50, but that isn’t the standard I thought we were arriving for…

Expand full comment

Sounds like you agree with Scott's point: that "real" surveys also have unrepresentative samples so it doesn't make sense to single out Aella for criticism.

Expand full comment

Most people do not have a good way to respond to the authors of "real" surveys in a public way.

Expand full comment

It seems clear to me that Aella's audience vs Aella's questions are massively more intertwined, than psych students vs typical psychology questions. Both have some issues, but no question I'd trust her studies less.

Expand full comment

This is fair. For me the big focus of a piece like this would be less on defending her/twitter polls and more on dragging a lot of published research to just above "twitter poll" level.

Expand full comment

"In particular, aella's studies are often related to extremely sensitive topics like sex, gender, wealth, etc."

I think Scott's point about correlations still applies. For example, if she tries to look at 'how many times a month do men in different age bins have sex', it doesn't particularly matter that her n=6000 are self-selected for the kind of man who ends up a twitter follower of a vaguely rationalist libertine girl. The absolutes may (for whatever reason; maybe her men are hornier, maybe they're lonelier) be distinct from those of the general population, but she can draw perfectly valid (by the academic standard of 'valid') conclusions about trends.

She does seem to have the advantage of larger sample sizes than many studies. And if she were writing a paper, she'd list appropriate caveats about her sample anyway, like everyone else does.

Expand full comment

She makes claims of the form "more reliable" (https://twitter.com/Aella_Girl/status/1607482972474863616). Most people would interpret this as "generalizes to the whole population better." I simply don't think this is true.

Her polls do certainly have larger sample sizes, but it doesn't matter how low the variance is if the bias is high enough.

Expand full comment

Re: “more reliable”, I wouldn’t interpret that as “generalizing to the whole population better”. One possible goal is to get results that generalize over the whole population, but that is far from a universal desideratum. Many interesting things can be learned about subsets of the population other than “everyone”!

For example, if I’m interested in what people in my local community think of different social norms, attempting to get more representativeness of the US population as a whole would actively make my data worse for my purpose.

Expand full comment

I think that's fair. It wasn't a very precise statement, but that claim comes across as overbold.

"No less reliable" on the other hand...

Expand full comment

There are two forms of reliability, sometimes called "internal validity" ("is this likely to be a real effect?") and "external validity" ("is this effect likely to generalise to other settings?") They're often in tension: running an experiment under artificial lab conditions makes it easier to control for confounders, increasing internal validity, but the artificiality reduces external validity. Aella's large sample sizes give her surveys better internal validity than most psychology studies (it's more likely that any effect she finds is true of her population); it's unclear whether they have better or worse external validity.

Expand full comment

It’s not at all clear to me that the binning would work accurately. What if Aella has a good sampling of young men, but only lonely old men without partners follow her? Then that one cohort would be off relative to the others. You can come up with many hypotheticals along these lines pretty easily.

I think Aella and Scott’s surveys are great work and very interesting, but I think it’s also important to keep in the back of your mind that they could have even more extreme sample issues than your average half-baked psych 101 study.

Expand full comment

This is not correct. Aella's blog is selecting for "horny people," or at least "people who like reading and thinking about sex."

If old men in the real world are less horny than young men, they'll be less likely to read Aella's blog. As a result, Aella's going to end up drawing her young men from (say) the horniest 50% of young men, and her old man sample from (say) the horniest 10% of old men. As a result, she'd likely find a much smaller drop-off in sexual frequency than exists in the real world (if sexual frequency is positively correlated with horniness).

Expand full comment

Yes, although that last if is a big one. Are they hornier and thus have more sex, or do they have less sex, making them hornier?

Expand full comment

Right--the bias could go either way! Which makes the whole thing a giant shitshow, because your relationship might be upwardly or downwardly biased by selection and it's hard to say how big of a bias you're likely to have.

What you'd probably want to do if you were doing a careful study with this sort of sample would be to characterize selection as much as possible. How underrepresented are old men in your sample relative to samples that aren't selected on horniness (i.e. maybe compare Aella's readership to the readership of a similar blog that doesn't talk about sex). How does the overall rate of sexual activity of your readers compare to some benchmark, like published national surveys? If your readers have a lot less sex, then you're probably selecting for lonely people and should expect the old men to be lonelier, on average. If your readers have more sex, it's the opposite.

But all of this is a problem when asking sex questions to a sex readership that you wouldn't have if you asked sex questions to the readers of a blog about birdwatching or something. They're not "representative" either, but they aren't in or out of your sample on the basis of their sexual attitudes.

Expand full comment

I haven't looked at Aella's survey in detail, but I presume it asks a number of demographic questions of its respondents which allows her to do basic adjustments (e.g. for relationship status). As I understand it, the quarrel isn't with her statistical methods, it's with the quality of her sample.

And the very fact that we cannot agree on a direction of bias leads me to question the likelihood that bias introduced specifically by 'aella follower' is non-uniform across age or any other given attribute. One could just as well presume that birdwatching as a hobby correlates negatively with getting laid in youth (nerds!) and positively in old age (spry, outdoorsy, quirky) - but you'd have to at least claim a mechanism in both cases, and have something to show for it empirically. And it would be interesting if you did. And, naturally, we see that kind of back and forth in ordinary research and it's the stuff knowledge is made of.

Obviously calibration and characterisation are good, but then they're always good. So is more data, especially if all unusual qualities of the sample are clearly and honestly demarcated.

Expand full comment

I don't know more about aella's survey than has been discussed here, but I agree that the question is sample quality.

The reason we can't agree on the direction of bias is because we don't know how horniness/interest in internet sexual content is related to sexual frequency. We're not disagreeing that "horniness" is likely related to age (indeed, this is the research question!) And we're not disagreeing that reading aella's blog is related to horniness. It's possible that there are a bunch of disparate relationships that cancel each other out, but the fact that blog readership is selected on a characteristic closely related to the research question makes the existence of bias quite likely, imo.

I agree with you that a bird watching sample could also have problems! Any descriptive research should think about and acknowledge the limitations of the data. That said, I'd believe a finding on age and sexual frequency that came from a bird watcher sample more than one that came from a sex-blog-reader sample. Let's say that bird watchers are awkward nerds who are physically fit enough to spend time in the woods. If nerds have less sex (debatable!), you'd expect all bird watchers to be less sexually active than non bird watchers of the same age. That's not a problem for internal validity--a bird watcher sample could still tell you how sexual frequency changes with age among nerds.

If physically fit people have more sex, you'd expect that among the general population, sexual frequency would drop off as people got older and less fit. If less fit people also stop birdwatching, you wouldn't see this (or wouldn't see it as much) in a birdwatching sample. That might be a big problem if you want to know how sexual frequency changes from 50s -80s, but probably isn't as big of an issue if you want to see how it changes from 20s-40s.

Expand full comment

Maybe, maybe not.

Among the general US population, height is weakly but positively correlated with most basketball skills (since tall people are more likely to play basketball). Among NBA players though, height is negatively correlated with basketball skills, since a six foot guy needs to be really really skilful to compete against seven-footers.

It could well be that Aella's readership is like the NBA of horniness.

Expand full comment

I can't argue against that, really. I'd only say that you'd have to show a mechanism for kind of thing before assuming it exists.

Expand full comment

I agree with this, and will add that (afaict) the majority of Aella's Twitter polls are, well, polls, akin to "how many people oppose abortion?" – the situation where Scott says selection bias is "disastrous", "fatal", etc.

Sure, some of them ask for two dimensions, so you could use them to measure a correlation, as long as you ignore all the things that could go wrong there (conditioning on a collider, etc.). But that's a fraction.

Expand full comment

I am also confused. This feels like "Beware the man of one study (unless it's Aella, then it's fine)".

If you're citing any single Psychology study, without demonstrating replication throughout the literature, you either haven't done enough research or you have an ideological axe to grind. Part of replication is sampling different populations and making sure that, at the very least, X result isn't just true for college students at Y university. Hopefully, it extends much further than that.

I don't see how Aella's data is any different than a single psychology study of a university called "Horny Rationalists U".

If what Scott is saying is that Aella's data is a good starting point and maybe worth doing some research into -- yeah, absolutely! (I feel like it won't surprise you to know that there is already a lot of psychological research, some which replicates some which doesn't, on Aella's topics).

Otherwise, I can't help but agree that this feels like in-group protectionism.

Expand full comment

> If what Scott is saying is that Aella's data is a good starting point and maybe worth doing some research into -- yeah, absolutely!

I think Scott would say that about any study, including Aella’s (except maybe “worth doing _more_ research into” because Aella’s studies are themselves research).

Expand full comment

*ARE* they research?

If, as some have said, she just asks one question of her audience, then the only thing I can think that they might be researching is "What would make my site more popular?". You need multiple questions to even begin to analyze what the answers mean.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

Nah, it's "Beware Isolated Demands For Rigor".

Some people say this is some new weird take by Scott (and/or imply that he's just unprincipedly defending ingroup), but it's really not.

Expand full comment

I would expect both the marriage and divorce rate among college freshmen to be very low. :)

Expand full comment

I worked tangentially with an IRB board at a large public university a few years back. I wasn’t actually on the board, but I worked to educate incoming scientists, students, and the community about expectations and standards for human-subject research.

In that role, I sat in a lot of IRB review meetings. Our board talked extensively about recruitment methods on every single one of those proposals. And, because we always brought in the primary researcher to talk about the project and any suggested changes, we often sent them back for revision when the board had objections about how well-represented the population groups were.

Now, there were some confounding factors that the board took into account when determining whether the selection criteria for participants needed reworking:

1) The level of “invasiveness” of the research. How risky, sensitive is the required participation?

2) The potential communal rewards of the research. Are the risks of the research worthwhile?

3) The degree to which the scientists on the board could see problems with the stated hypothesis being more general than the proposed population would support.

The reason the risks and rewards were relevant to the board were because, if the risks were lower, the selection criteria standards could be lower. Likewise, if the potential rewards to the community were higher and the risks were low, the selection criteria would also be less stringent. But anything with high risk or low potential benefit got run through the absolute ringer if they tried some version of the things you wrote out above: “The real studies by professional scientists usually use Psych 101 students at the professional scientists’ university. Or sometimes they will put up a flyer on a bulletin board in town, saying “Earn $10 By Participating In A Study!”” Perhaps things are run differently elsewhere, but that kind of thing definitely did not pass my university's IRB.

All that to say, psych studies (generally speaking) were considered by the IRB to be quite low on the invasiveness scale, but were considered of good potential value to the community, so the selection standards were…not high. While this might make for a bunch of interesting published results, I don’t know that you could, by default, argue that any of the results of the psych research at the school would translate outside of the communities the researchers were evaluating. Correlations between groups of people are super-valuable, of course, but if you're looking for more "scientific," quantifiable data to apply at huge scale, that's just not the place to find it.

That’s an important distinction, though. Because the board would have to ask the researchers to correct the ‘scale’ of their hypothesis on almost all the psych research proposals that I sat in on, as researchers were usually making grand, universal (or at least national) statements, but usually only testing very locally.

I think our IRB had the correct approach on this. And I think it’s one that others should use (including Scott). To be clear, I think this self-critique and examination is something that Scott does regularly, judging from his writing. I also think he’s aware of the “selection bias” in his polls, such that it is, in that he tries to spot easily identifiable ways in which his demographic’s results might not translate to a broader community. That doesn’t mean he always sees it accurately, but I've seen the attempts enough times to respect it.

However, from what I’ve seen, most researchers don’t have the instinct to try to find fault or limitations to their own research's relevance. Perhaps that’s just my experience of seeing so many first-time graduate researchers come through the IRB, though. I don’t have any idea who Aella is, so perhaps I’m missing something, but it seems to me that stating the possible limits of the applicability of your research should be pro forma, and I'd be very hesitant to play down the importance of that responsibility.

But I think I get where Scott is coming from. He's right that selection bias is a part of research that you can’t get rid of, and a lot of the established players seem to get away with it when "amatuers" get dismissed completely because of it. But I think it’s something that you should definitely keep in mind and try to account for in both your hypothesis and your results.

To steel-man Scott's point, I don't think he's arguing that selection bias can be a real problem (even at his most defensive, he just says it "can be" "fine-ish"). I think he's just trying to argue against using “selection bias” and “small sample size” as conversation-enders. Doesn't mean he's discounting them as real factors. But some people use them in a similar way that I often see people shout “that’s a straw man” or “that’s a slippery slope fallacy” online—to not have to think any further about the idea behind sometimes flawed reasoning. Those dismissive people often aren’t wrong in terms of identifying a potential problem, but they ARE wrong to simply dismiss/ignore the point of view being expressed because the argument used was poor. If I ignored every good idea after having heard it expressed/defended poorly, I wouldn't believe in anything.

In the same way, there's no reason to completely dismiss the value of any online poll, as long as you (the reader of a poll) have the correct limits on the hypothesis and don't believe any overstating of results.

Expand full comment

Well said.

Expand full comment

"However, from what I’ve seen, most researchers don’t have the instinct to try to find fault or limitations to their own research's relevance."

Alternatively, when applying for grants or trying to start a new big thing, researchers are often very much encouraged by grant commitee to oversell their projects, which does not incite to openly discuss the projects' limitations.

Expand full comment

Ding ding ding. You make everything a race, you end up with people focused on speed and not safety, even if you claim to be very very concerned about safety. Especially if there are few penalties for mess ups.

Expand full comment

There's nothing wrong with giving your blue-sky hopes when pitching the grant application. Why not? The program manager definitely wants to know what the best-case outcome might be, because research is *supposed* to be bread on the waters, taking a big risk for a potential big outcome.

I think the discussion here is what goes into your paper reporting on the work afterward -- a very different story, where precision and rigor and not running ahead of your data are (or ought to be) de rigeur.

Now, if you are pitching the *next* grant application in your *current* paper, that's on you, that's a weakness in your ethics. Yes, I'm aware there's pressure to do so. No, that isn't the slightest bit of excuse. Withstanding that pressure is part of the necessary qualifications for being entrusted with the public's money.

Expand full comment

This seems a really naïve interpretation of how the process actually works and what the incentives are.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

It's certainly not naive, since I've been involved in it, on both ends, for decades. You may reasonably complain that it's pretty darn strict, but that doesn't bother me at all.

Being a scientist is a sweet gig, a rare lucky privilege, something that any regular schmo trying to sell cars, or cut an acre of grass in 100F Houston heat, or unfuck a stamping machine on an assembly line that just quit would give his eyeteeth to be able to do -- sit in a nice air-conditioned office all day, speculate about Big Things, write papers, travel the world to argue with other smart people about Big Things.

If you're independently wealthy and can afford to theorize about the infinite, then do whatever you damn well please. But if you are doing this on the public's dime -- on money sweated out of the car salesman, the gardener, or the machine shop foreman -- then yeah there are some pretty strict standards. If you don't like it, check out the Help Wanted ads in your hometown and do something for which someone will pay you voluntarily.

Expand full comment

My point would be that people very rarely live up to those standards in large part because the way science is done has the incentive structure all wrong. The naïveté I was talking about was you belief the system was working very well.

Expand full comment

If Scott had a list of work Aella has done related to something like banana eating I'd take his argument much more seriously. Using such a specific example seems like a bad idea in general for the purpose of this post and it seems like an even *worse* idea to use the specific specific example he used.

I honestly wonder if there is some sort of weird trick question/fakeout going on with this whole post just because of that.

Expand full comment

There are three main points to consider that I think you're glossing over.

1. This only matters if aella is *claiming* to represent a larger population than 'people who read my blog and people who are similar to them.' Having accurate data that only tells you about one thing rather than everything, is not the same as having inaccurate data. You would have to read each blog post to judge whether or not the results are being misreprepresented each time.

2. This only matters if the selection criteria and the thing being measured are correlated. You say that aella mostly asks about things relevant to their blog which implies there will be a correlation, which I'm sure is true for some things they measure, but won't be true for everything. Psychology studies are also not as stupid about this as people imagine, it is normal to just use psych students when you are studying low-level psychophysics concepts that should not correlate with college attendance, and to get a broader sample when studying social phenomena that will correlate.

3. This primarily matters if you are doing a descriptive survey of population counts and nothing else, and matters a lot less if you are looking at correlations between factors or building more complex models. For example, lets say that you were asking about sex positivity and lifetime number of partners; sure, you might plausibly imagine that both of those things are higher among aella's audience, so you wouldn't represent the simple counts as representative of the general population. But if you were asking what the *relationship* between those two factors is, there will still be variation within that sample that lets you tell the relationship, and there's no particular reason to believe that *how those factors vary together* is different in aella's audience than in the general population.

Expand full comment

I get tons of responses from people who have no idea who I am! People seem to not understand I do research that's not just twitter polls.

Expand full comment

What studies are good?

Expand full comment

If smart people eat bananas because they know they are good for their something something potassium then we should be skeptical about the causal language in your putative study title. Perhaps something more like "Study finds Higher IQ People Eat More Bananas" would be more amenable to asterisking caveats and less utterly and completely false and misleading.

Expand full comment
author

I realized someone would say this two seconds after making the post, so I edited in "(obviously there are many other problems with this study, like establishing causation - let’s ignore those for now)"

Expand full comment

Sorry for being predictable ;)

I think I was primed to have this concern because when I initially read "selection bias" my brain went right to "selection into treatment" (causality issues) rather than "sample selection bias".

Expand full comment

Same thought. Super surprising he did not say "Study finds positive correlation between IQ and banana consumption". I thought he was going to be funny and put both disclaimers in the asterisk.

Expand full comment

I think the real difference here is that the studies are doing hypothesis testing, while the surveys are trying to get more granular information.

I mean you have a theory that bananas -> potassium -> some mechanism -> higher IQ, and you want to check if it is right, so you ask yourself how does the world look different if it is right versus if it is wrong. And you conclude that if it is correct, then in almost any population you should see a modest correlation between banana consumption and IQ, whereas the null hypothesis would be little to no correlation. So if you check basically any population for correlation and find it, it is evidence (at least in the Bayesian sense) in favor of your underlying theory.

On the other hand, if you were trying to pin down the strength of the effect (in terms of IQ points/ banana/ year or something), then measuring a correlation for just psych 101 students really might not generalize well to the human population as a whole. In fact, you'd probably want to do a controlled study rather than a correlational one.

Expand full comment
Dec 27, 2022·edited Dec 27, 2022

This is a much better explanation than Scott's post. Very helpful comment

Expand full comment

That would work very nice if the actual steps indeed were:

1. formulate hypothesis

2. randomly pick a sample

3. make measurements

4. check if the hypothesis holds for that sample

But I suspect that often it's: 2 is done first, then 3, then 1, then 4.

Then, I think it doesn't work.

And there might some approaches along this spectrum where the hypothesis "shape" is determined first, but it has some "holes" to be filled later. They holes could range in size from "huge" like "${kind of fruit}" to "small" like "${size of effect}".

Expand full comment

I mean if you formulate your hypothesis only after gathering your data and aren't correcting properly for multiple hypothesis testing or aren't using separate hypothesis-gathering-data and hypothesis-testing-data, then you are already doing something very very wrong.

But I think that there are probably lots of things you might want to test for where whether or not the effect exists is relatively stable group to group, but details like the size of the effect might vary substantially.

Expand full comment

That way lies the "green jellybean" effect.

If you have a bunch of data points, you can always find an equation to produce that collection within acceptable error bounds. Epicycles *do* accurately predict planetary orbits. And with enough creativity I'm sure they could handle relativity's modification of Mercury's orbit.

Expand full comment

Isn't the green jellybean effect like 30% of modern "research"?

Expand full comment

This doesn't sound quite right to me. I would say the main difference not to be the granularity, but that "correlation studies" look for x -> y, where both x and y are measured within persons, while polls look x between/over different persons. Thus in the "correlation studies" the interesting thing happens (in a way) within each participant of the study; whether something they have is related to something else they have (not). Thus, who the participants are is less relevant, as they kinda work as a ""control"" for themselves.

In case it sounds like nitpicking, this difference is relevant in that my point leads to a different conclusion on the effect sizes. I see no reason to think that estimating effect sizes from psych students (generalized to the whole population) is more wrong than estimating the existence of an effect. The latter is really just a dichotomous simplification of the former ("it is more/less than 0"), and if we draw a conclusion that there is an effect, say, >0, we might as well try to be more nuanced of the size of it.

Because if you say that the effect size does not generalize, why would the sign of it (+ or - or 0) then generalize? Of course, it is more possible to be right when only saying yes or no, but there is no qualitative difference in trying to generalize an effect existing and trying to generalize the size of that effect. The uncertainty of the effect being -.05 or .05 is not really different from the uncertainty of the effect being .05 and .1

Expand full comment

I don't think that this holds up. You don't really have correlations within a single person unless you measure changes over time or something. Remember correlation is defined as:

([Average of X*Y] - [Average of X]*[Average of Y])/sqrt{([Average of X^2]-[Average of X]^2)([Average of Y^2]-[Average of Y]^2)}.

It's a big mess of averages and cannot be defined for an individual person. I suppose it is robust to certain kinds of differences between groups that you consider, but it is not robust to others.

My point is that if you are looking at a relatively big effect and want to know whether it is there or not, most groups that you look at will tell you it's there. However, if you want to know more accurately how big it is, sampling just from one group is almost certainly going to give you a biased answer.

Expand full comment

Yes the correlation is measured across the sample, but my point is that: the correlation is like testing whether BMI and running speed is correlated; and the poll-approach is like testing whether running speed is say on average more than 8km/h. The former works usually in a non-representational sample because it has two measurements for each participant, and is testing the relationship between them. The latter does not, as it tries to test some quality of the people as a whole.

I still claim that it has nothing to do with the granularity, you can answer both of those questions at different levels of exactness, but only one of them can give meaningful results on an unrepresentative sample.

Expand full comment

I think we're only seeing a difference in this example because the correlation between running speed and BMI is not close to 0 while the average running speed *is* close to 8km/h. If you were trying to test whether average running speed was more than 2km/h, you'd probably get pretty consistent answers independent of which group you measured.

Actually, measuring correlation over just a single group might even be less reliable because of Simpson's paradox. The correlation could be positive within every group and yet be negative overall.

Expand full comment

I agree that most people rush to "selection bias" too quickly as a trump card that invalidates any findings (up there with "correlation doesn't mean causation"). However, I disagree that "polls vs correlations" is the right lens to look through it (after all, polls are mostly only discovering correlations as well).

The problem is not the nature of the hypotheses or even the rigor of the research so much as whether the method by which the units were selected was itself correlated with the outcome of interest (i.e., selecting on the dependent variable). In those cases, correlations will often be illusory at best, or in the wrong direction at worst.

Expand full comment

I agree that "polls vs correlations" isn't right, but partly because I think "polls" are far more heterogeneous than Scott suggests. If you want to find out who's going to win an election, then you really care about which side of 50% the numbers are on. But if you're polling about something like "how many people support marijuana legalization?" then your question is more like "is it 20% or 50% or 80%?" and for this, something that has a good chance of being off by 10 is fine (as long as you understand that's what's going on).

Expand full comment

But off-by-10% is not that bad scenario, real life can be worse. You can be off by much much more. For example by polling the much-mentioned psych students, I know from results I've seen that you can get for example >40% support for a political party that is at 10% for the overall population. So the main difference is not he hypothesis, the main difference is whether you are looking population-level descriptive statistics or a link between two variables within people ("liberals like cheesecake more than conservatives").

Expand full comment

What do you all think about the dominance of Amazon’s Mechanical Turk in finding people for studies? Has it worsened studies by only drawing from the same pool over and over?

Expand full comment

Seems probably better than just using college students tbh - you get the same pool of students volunteering over and over for studies as well, and they are more demographic-restricted.

(Of course there is natural turnover in that most college students are only college students for 4-5 years - I wonder how that compares to the turnover in MTurk workers)

Expand full comment

There needs to be a rule that you can only volunteer for one paid psychology experiment in your lifetime.

I did a bunch of these when I was in school and you quickly realize that the researchers are almost always trying to trick you about something. It becomes a game to figure out what they're lying about and what hypothesis they're testing, and in most cases that self-awareness will ruin the experiment.

Expand full comment

That is not true of most psychology experiments. For a meaningful portion of them (say 20% or something), sure, but definitely not most. But sure, this self-awareness can be a big problem (although I have participated multiple studies which included cheating, and I don't think it affected me in any of them, maybe I'm just a gullible person).

Expand full comment
Dec 27, 2022·edited Dec 27, 2022

"Selection bias is fine-ish if..."

I'm interpreting this as saying that one's prior on a correlation not holding for the general population should be fairly low. But it seems like a correlation being interesting enough to hear about should be a lot of evidence in favour of the correlation not holding, because if the correlation holds, it's more likely (idk by how much, but I think by enough) to be widely known -> a lot less interesting, so you don't hear about it.

As an example, I run a survey on my blog, Ex-Translocated, with a thousand readers, a significant portion of which come from the rationality community. I have 9 innocuous correlations I'm measuring which give me exactly the information that common sense would expect, and one correlation between "how much time have you spent consuming self-help resources?" and "how much have self-help resources helped you at task X?" which is way higher than what common sense would naively expect. The rest of my correlations are boring and nobody hears about them except for my 1,000 readers, but my last correlation goes viral on pseudoscience Twitter that assumes this generalises to all self-help when it doesn't and uses it to justify actually unhelpful self-help. (If you feel the desire to nitpick this example you can probably generate another.)

I agree that this doesn't mean one ought to dismiss every such correlation out of hand, but I feel like this does mean that if I hear about an interesting survey result's or psych study's correlation in a context where I didn't also previously hear about the survey/study's intention to investigate said correlation (this doesn't just require preregistration because of memetic selection effects), I should ignore it unless I know enough to speculate as to the actual causal mechanisms behind that correlation.

This pretty much just bottoms out in "either trust domain experts or investigate every result of a survey/every study in the literature" which seems about right to me. So when someone e.g. criticises Aella for trying to run a survey at all to figure things out, that's silly, but it's also true that if one of Aella's tweets talking about an interesting result goes viral, they should ignore it, and this does seem like the actual response of most people to crazy-sounding effects; if anything, people seem to take psych studies too seriously rather than not taking random internet survey results seriously enough.

Expand full comment

It's a good point. But it seems this applies equally to psych studies, so it doesn't weaken Scott's point that we shouldn't single out internet surveys as invalid.

Expand full comment

Like any kind of bias, selection bias matters when the selection process is correlated with BOTH the independent and dependent variables and as such represents a potential confounder. Study design is how you stop selection bias from making your study meaningless.

Expand full comment

The way I think about the key difference here (which I learned during some time doing pharma research, where this kind of issues are as bad as... well) is that when claiming that a correlation doesn't generalize, some of the *burden of proof* shifts to the person critizicing the result. Decent article reviewers were pretty good at this: giving an at least plausible-sounding mechanism by which when going to a different population there's som *additional* effect to cancel/revert the correlation. It's the fact that the failure of correlation requires this extra mechanism that goes against Occam's Razor.

Expand full comment

It's not about correlations, it's about the supposed causal mechanism. Your Psych 101 sample is fine if you are dealing with cognitive factors that you suppose are universal. If you're dealing with social or motivational ones, then you're perhaps going to be in danger of making a false generalization. This is particularly disastrous in educational contexts because of the wide variety of places and populations involved in school learning. It really does happen all the time, and the only solution is for researchers to really know the gamut of contexts (so that they realize how universal their mechanisms are likely to be) and make the context explicit and clear instead of burying it in limitations (so that others have a chance to catching them on an over-generalization, if there is one). Another necessary shift is for people to simply stop looking for universal effects in social sciencies and instead expect heterogeneity.

Expand full comment

“But real studies by professional scientists don’t have selection bias, because . . . sorry, I don’t know how their model would end this sentence.”

...because they control for demographics, is how they’d complete the sentence.

Generically, we know internet surveys are terrible for voting behavior. Whether they’re good for the kinds of things Aella uses them for is a good question!

I’m on the record in talks as saying “everything is a demand effect, and that’s OK.” I see surveys as eliciting not what a person thinks or feels, but what they are willing to say they think and feel in a context constructed by the survey. Aella is probably getting better answers about sexual desire (that’s her job, after all!) and better answers on basic cognition. Probably worse on consumer behavior, politics, and generic interpersonal.

Expand full comment
author

Here is a randomly selected study from a top psychiatry journal, can you explain in what sense they are "controlling for demographics"? They have some discussion of age but don't even mention race, social class, etc.

https://ajp.psychiatryonline.org/doi/10.1176/appi.ajp.2020.19080886

Expand full comment

That’s an RCT, not a survey, and it’s probably more useful for them to run with the selection effects that get people into the office, rather than attempt to determine what would work for an unbiased sample of the population at large.

Expand full comment
author

I'm not sure why the RCT vs. survey matters for this purpose. Randomization only guarantees that people don't have extra confounders aside from the ones that brought them into the study, it doesn't address selection bias in getting into the study itself.

If I did an RCT of ACX readers, where I artificially manipulated the mental health of one group (by giving them addictive drugs, say), and then tested which group had more spiritual experiences, they would still be ACX readers, different from the population in all the usual ways.

Nor does it have to do with "getting them into the office". You find the exact same thing in nonclinical psychology studies - for example, can you find any attempt to control for demographics or extend out of sample in https://asset-pdf.scinapse.io/prod/2001019597/2001019597.pdf (randomly selected study that appeared when I Googled "implicit association test", a randomly selected psych construct that came to mind).

Maybe it would be more helpful if you posted a standard, well-known psychology paper that *did* do the extension out of sample as a routine part of testing a psychological construct. I think I've never seen this and would be interested to know what you're thinking of.

Expand full comment

I’m kind of going study by study as you post them. The RCT you suggest would be great if you were proposing a treatment for ACX readers, but I think you’d agree that results would be much weaker if you were proposing to extend to a large inner city hospital. Correlations in one population can reverse in another, famously from collider bias!

An example from Aella: I would not be surprised if Aella finds a negative correlation between kinks that is positive in the general population. This would be driven by a non-linearity; mildly kinky people are in to lots of things—so positive correlation between kinks in general population. Very kinky people have an obsession that excludes others (so weak or negative, in a survey that selects for people by interest in a sex worker feed.)

In an academic psych talk, a query about selection bias might receive the response “we corrected för demographics”, or it might get the response “what confound do you have in mind?” In political science I think the demographic question is more salient, because they have a (rather imaginary) “polis” in mind.

IMO the most interesting questions involve an interaction between group and individual, so the most interesting work is asking “what happens for people in group X”, and talus about both the cognitive mechanisms and how they interact with the logic of the group.

Expand full comment
author
Dec 28, 2022·edited Dec 28, 2022Author

Sorry, I do want to stick with the original thing we were discussing, which is whether most real scientists control for demographics. As far as I can tell, this still seems false. Do you still believe it is true? If so, can you give me examples? If we now agree this is false I'm happy to move on to these other unrelated points.

Expand full comment

I'm a psychology PhD and in my experience at least ~1/4 of the time reviewers will ask you to control for basic demographics like gender/race/age. (And lots of papers will throw this in there as a supplementary analysis.) If this seems incongruent with other people's experience, I can pull up a few random papers & see if I'm right...

And I agree with Simon that "did you control for gender/race/etc?" or "does this differ across genders/etc?" are common questions in talks.

Expand full comment

No apology necessary! You’re asking a science-of-science question which is best settled not by our anecdotes, but a quantitative survey of the literature. My suggestion, ironically enough, is to control, if you do such a survey, for selection bias.

Expand full comment

If you did the RCT of ACX readers and you found that giving addictive drugs led to more spiritual experiences, you could be pretty confident that it was true that the type of people who read ACX are more likely to have mystical experiences if they take addictive drugs. You'd then have to wonder whether this is only true for the type of people who read ACX, but you'd at least have solid ground on which to generalize from.

Without the RCT, you might find a relationship between drug use and spiritual experiences because drug use and spiritual experiences influence readership of your blog. In that case, your correlation might not even hold for "the type of people who read ACX"--just for actual ACX readers.

You're a psychiatrist who writes about drugs, ad you're also an irreligious rationalist. Let's suppose that your irreligious rationalism makes you less appealing to people who are more likely to be spiritual, but your drug writing makes you more appealing to people who are more into drugs. Together, that might mean that spiritual experience people who don't like drugs tend to stop reading your blog, but spiritual experience people who like drugs stick around, while non-spiritual people stick around regardless of whether they like drugs.

If so, you'd end up finding a positive correlation between drugs and spiritual experiences when looking at blog readers, even if that correlation doesn't exist for people with low interest in spirituality (as a group), or for people with a high interest in drugs (as a group), or people with a high IQ, or any other way of characterizing your readership. The correlation would be entirely an artifact of the selection process into becoming a blog reader.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

I think a bit part of the argument is "what is a spiritual/mystic experience?"

Tweeter didn't define it in that tweet, and that does make a difference.

I have had profound feelings of awe and gratitude at the beauty of the universe, but I would not define those as "mystic". Now, if Tweeter does say "but that is a mystic experience!", then we can begin to arrive at some kind of definition: a 'healthy' mind will have feelings of 'more than the usual grind or the rat-race'.

So everyone who has had the oceanic feeling can agree that they have had it, whether or not you define that as spiritual/mystic, and *then* we can ask "so how is your mental health?" and correlate one with the other.

If "excellent mental health" and "regularly experience of profundity" go together significantly, then Tweeter has made their case.

As it is, it's just more "Eat Pray Love" tourism showing-off about being *so* much finer material than the common clay normies.

The Hopkins poem "God's Grandeur" speaks to me, but that is because (1) we're co-religionists so I get where he's coming from and (2) I too have had experiences of the beauty of the world, despite all the evil and pain and suffering, but I would not necessarily call those mystic or spiritual:

https://www.poetryfoundation.org/poems/44395/gods-grandeur

Expand full comment

Nope, that matters enormously.

RCT vs. survey does make a critical difference as soon as you are secretly thinking about some causal nexus the correlation might somehow suggest even if not quite imply, so bacically always.

With an extra assumption that the causal mechanisms work the same for the entire population, the RCT on the unrepresentative sample is actually good evidence for some kind of causal connection in the underlying population. And then of course you get to speculate on direction of causality, common causes, multiausality and so on.

On the other hand *even with such an assumption* a sample unrepresentative in one of the correlated variables will change correlations enough to make them basically non-evidence for causal stories in the underlying population rather than the already weak evidence they would be in a representative sample. So "convenience sample" is a fairly weak objection to an RCT and a pretty fatal one to a correlationaly survey.

(That said, of course you will also find correlational studies on convenience samples in literature. In that case I think the conclusion should not be phrased as "Aella polls as good as science" but rather as "Lots of 'science' no better than Aella polls, this is part of why replication crisis". Logically the same, but correctly identifies which side of the false distinction people are wrong about).

Expand full comment

>And then generalize further to the entire world population over all of human history, and it stops holding again, because most people are cavemen who eat grubs and use shells for money, and having more shells doesn’t make it any easier to find grubs.

I know this is somewhat tongue-in-cheek, but for accuracy's sake: the number of people who were born before widespread adoption of agriculture was on the order of 10 billion, vs. about 100 billion after. https://www.prb.org/articles/how-many-people-have-ever-lived-on-earth/

Expand full comment

Ah but you are only considering the past and present, not the whole of human history.

Expand full comment

If the future of humanity involves a total collapse of civilization including the loss of all agricultural capacity such that it's no longer possible to exchange wealth or non-food labour for food, then the human population will rapidly collapse to match the Earth's carrying capacity for hunter-gatherers.

Even under the generous assumption that the carrying capacity after this unspecified apocalypse is not lower than it was in the past, it would take on the order of a million years of hunter-gatherer life without anyone reinventing agriculture for the cumulative hunter-gatherer population to match the cumulative agricultural-industrial population.

Expand full comment

What if humanity ends up colonizing other planets? Assume a scenario where interstellar space travel is 1. possible, 2. extraordinarily expensive and time-consuming, to the point where it's always a one-way trip and there's no transportation or communication between human worlds. Many of these worlds could end up reverting to primitivism, especially if the resources necessary to develop modern technological devices are scarce there. In that situation, each individual world's population would be much lower than modern-day Earth, but the overall population of humans scattered across the galaxy would be far higher than Earth's population alone.

This is an extraordinarily unlikely and ridiculously contrived scenario, I know. But it's one way that Melvin's statement could end up being accurate, and I've seen plenty of sci-fi universes built on some variant of this premise.

Expand full comment

My prior on "there exist a large number of planets where humans can subsist in viable numbers as hunter-gatherers but cannot establish even rudimentary agriculture, and the total hunter-gatherer carrying capacity of these planets is higher than the total agricultural-industrial carrying capacity of all the planets suitable for agriculture" is...basically zero.

Like I don't even know what it would mean for a planet to be suitable for hunter-gatherer lifestyles but not for any kind of agriculture. There's a reliable, sustainable supply of nonpoisonous organic matter with adequate micronutrient content for humans, but you can't breed/cultivate it and you can't use it as a planting medium? What would that even look like?

Expand full comment

>Like I don't even know what it would mean for a planet to be suitable for hunter-gatherer lifestyles but not for any kind of agriculture.

There are plenty such regions on Earth. Permafrost and polar caps. Dense forests, especially jungle, without the means to clear it. Basically the opposite gradient of arable land:

https://en.wikipedia.org/wiki/Arable_land

Expand full comment

Polar caps are not suitable for agriculture, and they also aren't suitable for hunting-gathering. A population there will just die.

Permafrost is suitable for hunting-gathering, and it's not suitable for agriculture. But hunting-gathering isn't going to happen, because the permafrost is also suitable for pastoralism and the pastoralists will easily defeat the hunter-gatherers due to their vastly superior numbers. https://en.wikipedia.org/w/index.php?title=S%C3%A1mi_people

Dense jungle is suitable for hunting-gathering and for agriculture. There is no such thing as not having the means to clear it; doing so is well within the means of hunter-gatherers. There is only jungle that no one has yet bothered to clear.

Expand full comment

In addition to what Michael Watts said:

Those are regions, not planets. The 'single-biome planet' exists only in fiction; neither the physics nor the biology of it works in reality. Any real planet will have a gradient of climates and interdependent biomes. If the warmest of them is permafrost, it won't resemble Earth's arctic region, which is full of organisms that evolved in warmer conditions and then adapted to the cold. It also most likely won't have an atmospheric composition hospitable to humans.

Also, importantly, there are many kinds of agriculture in the broad sense I'm using the term that don't require arable land. Pastoralism often works where farming doesn't. There's also aquaculture (which would be perfectly viable in the Arctic), greenhouses, hydroponics, algae vat farms, and more. These aren't all *economically* viable at scale in a world with huge amounts of arable land, but it's important not to confuse "unprofitable" with "impossible."

Expand full comment

Maybe the people of the future are technologically advanced but they eat grubs because the government has outlawed all other foods for environmental reasons.

And they use shells as currency because repeated financial crises have shown both fiat and crypto currencies to be unreliable.

Expand full comment

In that scenario they'd have grub farms and grub-harvesting specialists who would accept shells (or some other currency) in exchange for grubs.

(My correction wasn't about the specifics of the food or the currency - I understand "grubs and shells" to be metonyms for all the foods and currencies that hunter-gatherers might use - but about the economic circumstances where someone wouldn't be able to exchange currency for food.)

Expand full comment

Unless food is severely rationed by the government and money is only useful for buying NFTs and new hats for your avatar.

Expand full comment

Severe food rationing by the government results in the government becoming food, and then everything stabilizes again.

Expand full comment

Unless the future involves the birth of -90 million humans (or 90 million anti-humans?), these numbers can only get worse for Scott's claim by including the future.

Expand full comment

Yeah but you also have to count the billions of people ordering paleo GrubHub with TurtleCoins.

Expand full comment

I am a professor of political science who does methodological research on the generalizability of online convenience samples. The gold standard of political science studies is indeed *random population samples* -- it's not the whole world, but it is the target population of American citizens. Yes this is getting harder and harder to do and yes imperfections creep in. But studies published in eg the august Public Opinion Quarterly are still qualitatively closer to "nationally representative" then are convenience samples, and Scott's flippancy here is I think a mistake.

My research is specifically about the limitations of MTurk (and other such online convenience samples) for questions related to digital media. My claim is that the mechanism of interest is "digital literacy" and that these samples are specifically biased to exclude low digital literacy people. That is, the people who can't figure out fake news on Facebook also can't figure out how to use MTurk, making MTurk samples almost uniquely bad for studying fake news.

(ungated studies: http://kmunger.github.io/pdfs/psrm.pdf

https://journals.sagepub.com/doi/full/10.1177/20531680211016968 )

This post is solid but it doesn't emphasize enough the crucial point: "If you’re right about the mechanism...". More generally, I think that there are good reasons that Scott's intuitions ('priors') about this are different from mine: medical mechanisms are less likely to be correlated with selection biases than are social scientific mechanisms.

There is a fundamental philosophy of science question at stake here. Can the study of a convenience sample *actually* test the mechanism of interest? As Scott says, there is always the possibility of eg collider bias (the relationship between family income and obesity "collides" in the sample of college students).

So how much evidence does a correlational convenience sample *actually* provide? This requires a qualitative call about "how good" the sample is for the mechanism at issue. And at that point, if we're making qualitative calls about our priors and about the "goodness" of the sample....can we really justify the quantitative rigor we're using the in the study itself?

In other words: should a study of a given mechanism on a given convenience sample be "valid until proven otherwise"? Or "valid until hypothesized otherwise"? Or "Not valid until proven otherwise"? Or "Not valid until hypothesized otherwise"?

Expand full comment

For psych (as opposed to poli sci) you’re looking for reasonably robust mechanisms that can survive restriction to weird subpopulations. If you’re describing some cognitive bias X, you want it to be present even if you restrict only to (say) “high digital literacy Democrats”.

The debates are (in other words) really specific to field. Trying to explain voting behavior is really different from studying (say) risk perception.

Expand full comment

"Scott's flippancy here is I think a mistake."

That's what I came here to say. Glad someone smarter than me said it first.

Expand full comment

> and Scott's flippancy here is I think a mistake.

I was thinking that initially, but he did address that.

"b) hire a polling company like Gallup which has tried really hard to get a panel that includes the exact right number of Hispanic people and elderly people and homeless people and every other demographic"

Expand full comment

Is there a reason why you just wouldn't want to be somewhat specific with the headline of what you're publishing? So instead of "Study Finds Eating Bananas Raises IQ," you instead publish “Study Finds Eating Bananas Raises IQ in College Students," if they're all college students.

Expand full comment

You certainly can, and to a great degree good science involves judging where to place your title on the spectrum between "Eating Bananas Raises IQ for Everyone " and "Eating Bananas Raises IQ for Three Undergrads named Brianna and One Jocelyn." That said, one of the first things that often happen in pop science reporting is that these caveats get left out.

Expand full comment
author

Because "Study Finds Eating Bananas Raises IQ In College Students" is not, in fact, what you found. How do you know if a study in college freshman generalizes to college sophomores? If a study in Harvard students generalizes to Berkeley students? If a study in 2022 college students generalizes to 2025 college students.

You could title it "Study Finds Eating Bananas Raises IQ In This One Undergraduate Berkeley Seminar Of 80% White People, 20% Hispanic People, Who Make Between $50K and $100K Per Year, and [so on for several more pages of qualifications]", but the accepted way to avoid doing that is just to have a Methods section where you talk about the study population.

Expand full comment

There does seem to be a fair line of reasonableness to this as what @DaveOTN pointed out, at least from my experience. Lots of papers I've been reading have some degree of specification in their title but not all of them. And then in the Methods sections, even more specificity is pointed out. Maybe this is more of a recent trend.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

Completely agreeing what Scott already answered, but to flesh it out a bit more.

Sometimes you see (mostly old) papers titled something like "x does y in women", and what that title really does, at least to me, is to raise an assumption that this is somehow specific effect for women and not men. Now, I have seen enough of those studies to know that they almost never have tested that difference, they just only happened to have female participants.

But still it sounds completely odd. It is correct in a way, but I think a title of a study, or well, anything, is something you are not supposed to read completely literally. It tries to convey as much information in as few words as possible, and if you include something in it, the reader will assume it is of importance. And this is good. We need titles, and we need them not to say every detail of the study, otherwise they would be useless. So to add in to your title some detail like that, you are really communicating to the reader something else than just the fact of the study population.

So yes, I think there is a good reason you don't want that in your headline. A possible compromise would be something like "Eating banana raises IQ: study on college students", in case you want to stress the fact of college-student-participants a bit, but do not want to create false assumptions. But usually it's not worth it.

Expand full comment

I think the important issue is whether the selection bias is plausibly highly correlated to the outcomes being measured. I think the reason ppl scream selection bias about internet polls is that frequently participation is selected for based on strong feelings about the issue under discussion.

So if you are looking for surprising correlations in a long poll (as u do with your yearly polls) that's less of an issue but the standard internet survey tends to be in a situation where the audience can either guess at the intended analysis and decides to participate based on their feelings about it or is a situation where they are drawn to the blogger/tweeter because of similar ways of understanding the world so is quite likely to share whatever features of the author prompted them to generate the hypothesis.

Choosing undergrads based on a desire for cash is likely to reduce the extent of these problems (unless it's a study looking at something about how much ppl will do for money).

Expand full comment

Real scientists control for demographic effects when making generalizations outside the specifics of the dataset used. I'm confused why this article doesn't mention the practice - demographic adjustments are a well-understood phenomenon and Scott would have been exposed to them thousands of times in his career. And honestly, I think an argument can be made that the ubiquity of this practice in published science but its absence in amateur science mostly invalidates the thesis of this article, and I worry that Scott is putting on his metaphorical blinders due to his anger at being told off in his previous post for making this mistake.

This article does not feel like it was written in the spirit of objectivity and rationalism - it feels like an attempt at rationalization in order to avoid having to admit to something that would support Scott's outgroup.

Expand full comment
author

I have no idea what you're talking about. I have been reading psychology and psychiatry studies for years and have never seen them do this.

Here are some studies from recent issues of the American Journal of Psychiatry, one of the top journals in the field. Can you show me where they do this?

https://ajp.psychiatryonline.org/doi/10.1176/appi.ajp.2020.19080886

https://ajp.psychiatryonline.org/doi/10.1176/appi.ajp.20220456

https://ajp.psychiatryonline.org/doi/10.1176/appi.ajp.21111173

Expand full comment

pewresearch.org/our-methods/u-s-surveys/frequently-asked-questions/

From the article:

"...To ensure that samples drawn ultimately resemble the population they are meant to represent, we use weighting techniques in addition to random sampling. These weighting techniques adjust for differences between respondents’ demographics in the sample and what we know them to be at population level, based on information obtained through institutions such as the U.S. Census Bureau."

My apologies for assuming you were already familiar with this concept.

Expand full comment

Opinion polling by a think tank is not most people’s central example of “real scientists”. Is “real pollsters” maybe the category you have in mind? I have have no personal info one way or the other on whether psych researchers at universities and other scientific research institutions do what Scott says, but I don’t think Pew Research is a very useful counter-example.

Expand full comment

One gets the impression you didn't read the essay before commenting. Right in bold near the top it says this:

"Selection bias is disastrous if you’re trying to do something like a poll or census."

Scott is talking about a different kind of research, one that is *not* a poll or census, which is *not* attempting to say "x% of the people in [some large group] have [characteristic], based on measurement of [some small group]."

As I understand it, he is talking about the difference between measuring a distribution and measuring a correlation function. A distribution says "x% of the population has this characteristic." A correlation says "if a member of a population has characteristic Y, then the probability that he also has characteristic Z is x%." They are two very distinct kinds of measurement, and so far as I can tell, he is correct that to a first approximation you can at least test for the existence of correlations on any subset of the distribution without worrying a great deal about your sample well representing the overall distribution.

There are certainly weird edge cases where this would not be true, but that doesn't mean the general rule is unreasonable.

Expand full comment

This actually points (as does the whole article), at a much more interesting point: the belief in “proper official people who are doing everything right” vs “random cranks larping at [science/politics/law/history].” I suspect this is the intuition behind people saying “selection bias:” they want a reason why some things are proper official science and some things aren’t.

Anyone who works in any field is aware that they’re staying ahead of the cranks through cumulative experience and sharing their homework, but there are no grown-ups and no-one has access to special doing-things-properly methods.

Expand full comment

I read Scott's post as the emperor, and lots of other people, are barely dressed and that's ok. The replies read like imperialist counter claims, but seem lacking in evidence, i.e. proper official studies that select from a spread of population with lower bias than psych students or poor/bored folks.

Nothing against imperialiststs

Expand full comment

Yep. This thread on TheMotte is relevant: https://www.themotte.org/post/221/culture-war-roundup-for-the-week/41477?context=8#context

> Aella recently made an online survey about escorting and posted a chart on Twitter. It shows monthly earnings binned by BMI and clearly depicts that escorts with lower BMI making more on average than escorts with higher BMI. I would not have thought anybody would be surprised by that. The comments under the post proved me wrong.

> Christ almighty, I had no idea that there are so many statistically literate whores around just waiting to tell you your survey is bad. I also wasn't aware that escorts advertise their services so openly on social media.

> The number of escorts, both slim and not so slim, calling her out with little to no argument is mind blowing. The arguments they do give basically amount to sample size too low, BMI isn't real or "your survey is bad, and you should feel bad". Some of them also appear to lack reading comprehension. (...) Some give the argument that they themselves have high BMI but earn way more than that, and therefore the survey result must be wrong. Averages are seemingly a foreign concept to some.

> **A few are asking what Aella's credentials are or whether the survey has been reviewed by an ethics committee, as if you need any of that to do a random google forms survey on the internet. They appear to believe that ethics committees are to protect people who might find the result offensive and not the participants of the study.**

Expand full comment

This surely depends on the field and the questions asked, but I can assure you that a large portion of psych studies do not do that. And (most of the time) that is completely fine (as argued in the main post).

Expand full comment

(1) It's also worth noting that you can do a lot of sensitivity tests to see how far the results within your sample appear to be influenced by different subgroups which can help indicate where the unrepresentativeness of your sample might be a problem. IIRC the EA Survey does this a lot. This also helps with the question of whether an effect will generalise to other groups or whether, e.g. it only works in men.

Of course, this doesn't work for unobservables (ACX subscribers or Aella's Twitter readers are likely weird in ways that are not wholly captured by their observed characteristics, like their demographics).

(2) I think you are somewhat understating the potential power of "c) do a lot of statistical adjustments and pray", which understates the potential gap between an unrepresentative internet sample which you can and do statistically weight and an unrepresentative internet sample (like a Twitter poll) which you don't weight. Weighting very unrepresentative convenience samples can be extremely powerful in approximating the true population, while Twitter polls are almost always not going to be representative of the population.

Expand full comment

Seems like a good argument for rejecting studies done on Psych 101 undergrads, not for accepting surveys done on highly idiosyncratic groups of blog readers.

Expand full comment

I would agree with that.

I think that may be a bridge too far, especially since I somebody trained in good methodology could look at a dataset with care to try to balance out and offset these risk factors. (note: they may not do it, but the evidentiary value of biased data is not zero)

Just..... this is really after the Elon Twitter surveys, and for people who are used to idiosyncratic group surveys in other context. Surveys on the Fox News website, or those performed by the RNC on their likely voters also would & have clear biases, even if the questions were worded in an unbiased manner.

Expand full comment

Yeah. My baseline prior for *all* psych and sociology studies is "more likely than not utter garbage" unless the effect size is *huge*. And even then it's "probably utter garbage". Those done on Psych 101 undergrads start at "almost absolutely utter garbage. And internet polls, *especially* of "social media followers" are in that same bucket. Too many uncontrolled variables that are very likely to correlate strongly with the effect under study.

And lest you think I'm particularly biased there, my baseline for *hard physics* studies is "50% chance of being utter garbage."

Basically, almost all science is utter garbage. But some is more often utter garbage. And both "surveys of internet followers" and "Psych 101 undergraduate studies" are in the "don't even bother looking further except for amusement" bucket for me.

And @Scott--even in correlations, bias matters strongly. Giving the banana study to Mensa members means that your effect, if any, is out there in the part of the curve that we can't measure very well at all. Measuring the difference in IQ past a standard deviation or two is basically just noise *anyway*, so trying to correlate that with banana consumption is just noise squared.

Expand full comment

Care to explain your baseline on hard physics studies? IME it's pretty rare for an experimental paper to make false positive claims, though plenty are weaker than they should be or testing hypotheses that were probably not worth wasting time on.

Expand full comment

It doesn't have to be outright false to be utter garbage. It just has to fail to say anything meaningful. It could be 100% true, 100% valid...and still be utter garbage such that the writer and the world would have been better off if it hadn't been done (ie was a waste of resources). And not just experimental work--I'm including all the theory. This is based on my own training--I have a PhD in computational quantum chemistry. Plus my usual jaundiced eye--I'm a firm believer in Sturgeon's Law (90% of everything is crap). So a 50% "crap rate" is actually doing much better than normal.

Expand full comment

Or, instead of employing the binary of reject/accept to whole categories of studies, one may wish to adopt a more nuanced, Bayesian, approach. Like, isn't this whole blog basically about weak evidence being evidence too?

Expand full comment

The underlying phil of sci question is to what extent are you justified in believing your sample is representative for the question you are testing. It's generally understood that the "we test on people we can rope into our studies" is a problem that generates potential bias that can undermine representativeness for a general conclusion, but I think the article is far too flippant about the amount of effort that goes into this kind of question when psychologists are drawing inferences (or failing to do so) as compared to amateur Internet polls. It's flattening the distinction between a known problem that exists to varying degrees and is handled with varying degrees of respectable response and throwing your hands up in the air.

Expand full comment

I wouldn't settle for so simple heuristic, as any non-expert of those fields can do much better than this simply by just asking whether the study-question sounded plausible (as the studies on people trying to predict which studies replicate have shown). The studies done on psych101-students are probably not much worse than the other ones, as the main reason for bad studies is not the sample but the p-hacking etc.

If you want simple heuristic, it's more like "boring psych results" -> true, "surprising and interesting psych results" -> false.

Conflict of interest: I'm doing boring research.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

People's internal sense of plausibility is informed by their cultural beliefs about folk psychology, which in turn are influenced by pop psychology. This sometimes transforms what maybe should be thought of as surprising, interesting idea into something boring. False memory research was in vogue when I was a psych student. This had a lot of sexy results that called into question "repressed memories" whose reality for the public had become a rather conventional, boring belief.

Expand full comment

Since someone evaluating a claim can never know how many polls didn't show interesting results so both the fact that real world surveys are much more expensive to conduct and have fewer variables under control of the survey giver (accepted practice isn't to say what the UG is coming in for and cash is primary motivator in all of them) is a very strong justification for treating online polls as less reliable.

In some sense the real selection bias is the selection bias in terms of what polls you haven't heard about but it's a good reason. Though it leads to an interesting epistemic situation where the survey giver may have no reason to doubt their poll more than the academic polling UGs but those they inform about it do.

Expand full comment

What you’re describing is not unique to amateur internet studies. The term File Drawer Effect refers to the exact phenomenon you describe, but in officially real science.

Expand full comment

Yes, I'm aware of that, but things like the cost of running in person surveys, IRB approval etc meams the problem is orders of magnitude worse for online surveys.

As I suggest in another comment if each poll was accompanied by a certain sized charitable donation it might help make them comparable.

Expand full comment

> It doesn’t look like saying “This is an Internet survey, so it has selection bias, unlike real-life studies, which are fine.”

Eh, this seems like a highly uncharitable gloss of the concern. I would summarize it more as "Selection (and other) biases are a wicked hard problem even for 'real-life' studies that try very hard to control for them; therefore, one might justly be highly suspicious of internet studies for which there were no such controls."

One good summary of the problem of bias in 'real-life' studies: https://peterattiamd.com/ns003/

The issue is always generalization. How much are you going to try to generalize beyond the sample itself? If not all, then there is no problem. But, c'mon, the whole point of such surveys is that people do want to generalize from them.

Expand full comment
author

They're not a hard problem for real-life studies! Most people just do their psychology experiments on undergraduates, and most of the time it's fine! Most drug trials are done in a convenience sample of "whoever signs up for drug trials", and although there are some reasons you sometimes want to do better, it's good enough for a first approximation.

Expand full comment

Why do you say it's fine? It's publishable, sure. But it's not like this even led to a body of literature that's reproducible in the SAME unrepresentative population of freshman undergrads, let alone generalizes to tell us true facts about the world.

Expand full comment
author

Yes, I agree it had unrelated problems, which were not selection bias.

Expand full comment

What? Selection bias is definitely one of the issues that caused (/is still causing) the replication crisis. I definitely disagree that "most of the time it's fine", and I'm pretty surprised to see you, in particular, making that claim.

Expand full comment

“The replication crisis” normally describes ideas which aren’t true for *any* population. For example, psychology studies which tend to support the researcher’s favourite intervention don’t behave that way because different researchers use different sets of undergraduates (maybe there’s some “how-popular-is-this-method” effect, but I’d suspect the researcher effect would still apply between studies in the same university and year).

Something which is true of undergraduates but not of the general population would be interesting, and might get counted as part of the replication crisis, but it’s not a central example of the replication crisis.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

> What? Selection bias is definitely one of the issues that caused (/is still causing) the replication crisis.

I'm with Gres; the replication crisis was caused by having standards of publication that didn't even refer to whether the finding was true or false. Selection bias isn't an issue if your entire paper is hallucinated. There was nothing to select.

Expand full comment

The problem with this argument is that you have no evidence either way on the selection bias issue. We had a bunch of psychologists do a bunch of research that was hopelessly contaminated with selection bias. The selection bias didn't matter, because there were so many other problems with this body of research that it had no value at all, and therefore there was nothing for selection bias to ruin. If selection bias takes the value of a body of research from zero to zero, it hasn't hurt anything.

But you appear to be claiming that, if those other problems hadn't existed, the selection bias still wouldn't have been a problem. This is not obvious; maybe it would have been a big problem.

Expand full comment

There’s also a big body of psychology research which does replicate on both undergrads and the general population, and a much smaller body of research which replicates for undergrads but not for anyone else. Thus, selection bias is rarely a problem among good studies on undergrads.

Expand full comment

The problem has led to an entire literature of observational studies in nutritional epidemiology that is essentially worthless –– impossible to discern signal from noise in many cases.

Many drug trial are deeply compromised, not at all good enough for a first approximation. The book "Ending Medical Reversal" by Prasad and Cifu goes deep on this.

https://www.amazon.com/Ending-Medical-Reversal-Improving-Outcomes/dp/1421429047/

Expand full comment

I can't agree with your summary because what I'm reading in this article is an argument that the 'real-life' studies *don't* try very hard to control for selection bias (even if they acknowledge that it is a wicked hard problem) and so you should justly treat internet studies about as highly or lowly as other studies, because they have about the same level (lack of) controls for selection bias.

Expand full comment

The fact that academic studies are often terrible does not imply that internet studies are therefore OK. That's just an emotional reaction ("How come you pick on me when you don't pick on them?") It's totally possible for them both to be terrible –– and, as a matter of fact, I do also pick on them.

Expand full comment

So, this is kinda accurate, but I feel like you're underestimating the problems of selection bias in general. In particular, selection bias is a much bigger deal than I think you're realizing. The correlation coefficient between responding to polls and vote choice in 2016 was roughly 0.005 (Meng 2018, "Statistical Paradises and Paradoxes in Big Data"). That was enough to flip the outcome of the election. So for polls, even an R^2 of *.0025%* is enough to be disastrous. So yes, correlations are more resistant to selection bias, but that's not a very high bar.

Correlations are less sensitive, but selection effects can still matter a lot. As an example, consider that among students at any particular college, SAT reading and math scores will be strongly negatively correlated, despite being strongly positively correlated in the population as a whole: if a student had a higher score on both reading and math, they'd be going to a better college, after all, so we're effectively holding total SAT constant at any particular school.

So the question is, are people who follow Aella or read SSC as weird a population as a particular college's student body? I'd say yes. Of course though, it depends on the topic. For your mysticism result, I'm not worried, because IIRC you observe the same correlations in the GSS and NHIS--which get 60% response rates when sampling a random subset of the population. But I definitely wouldn't trust the magnitude, and I'd have made an attempt at poststratifying on at least a couple variables. Just weighting to the GSS+Census by race, income, religion, and education would probably catch the biggest problems.

Expand full comment
author

I think you're specifically selecting categories where this weird thing happens, and then saying we should expect it in other categories.

(also, polls are a bad example - because of Median Voter Theorem we should expect them to be right on the verge of 50-50, and so even small deviations are disastrous)

Expand full comment

I was selecting those as examples, but these are actually very common. Surveys in general are very sensitive to these problems unless you’re at least a bit careful to get good examples.

Maybe a good way to explain this is that seeing a correlation in a survey provides about as much evidence of correlation in the population, as seeing a correlation in the population provides of causality. This isn’t just a metaphor: there’s a very real sense in which selection biases are just backwards confounding. They’re often called “Inverted forks” in the causal inference literature—a “fork” being the classical confounder where you have one variable affecting two unrelated things. (If you imagine a diagram with lines going from cause to effect, a confounder has lines going to the two correlated variables, which looks like a two-tined fork if you’re sufficiently high and/or hungry.) A selection effect is the exact same, except flipping which variables are observed—you have two effects going into the same variable (e.g. mental illness and spirituality might both affect the probability of answering the SSC survey, in which case considering only responders creates a bias). Having to think backwards is a lot harder, so we intuitively imagine these problems must be rare, but they’re just as common as their flipped counterparts.

To be clear I’m not saying the survey data is useless; I’m guessing it’s right! But I definitely feel like this post isn’t urging sufficient caution. Pollsters put millions of dollars into trying to get representative samples, or reweighting responses to make the sample representative, and they *still* get correlations very wrong sometimes (e.g. underestimating the correlation of being black with the probability of voting for Walker in GA by a factor of 2).

I’d like to see commenters offering more reasons to expect the results could be wrong, but the presumption that the data here aren’t confounded is pretty weak, and we should be pretty uncertain about it—at least until we’ve tried to make the results somewhat more representative with weighting or poststratification.

Expand full comment

I just read the Meng (2018) paper because of your mention above.

It says that when using a non-random sample, a correlation between opting into the sample (by, for example responsing to a poll) and the variable being tested causes huge problems. The paper is specifically addressing this problem in the context of very large datasets or Big Data which are non-random and showing how the sample size in those cases doesn't improve predictive power.

The example presented to illustrate this is 2016 pre-election polling. In that case, the correlation between opting into the sample and actually voting for Trump (based on post-election results) was tiny but negative, namely people who were going to vote for Trump were slightly less likely to respond to the poll, and this caused the result of the poll to significantly underweight the Trump vote. And, of course, this didn't flip the outcome of the election, it flipped the outcome of the poll. The results of the election were based on the result variable, not the choice to respond.

Basically, I think this paper doesn't tell us much about relatively small samples from relatively small populations, like Scott's annual survey. Its issue is that the above correlation scales with the square of the population, so Big Data isn't all that great if it's not random.

Expand full comment

If this is about the last article your general point is correct but you polled a readership that's notoriously hostile to spirituality to determine if mental health correlates to spirituality. It'd be like giving Mensa folks a banana and measuring their IQ. You selected specifically for one of the variables and that's likely to introduce confounders.

Expand full comment
author

I'm worried we're still disagreeing on the main point, based on your example. To a first approximation, testing the correlation between banana-eating and IQ in a Mensa sample should still be fine. Everyone will have high IQ, but if the super-high-IQ people eat more bananas than the just-regularly-high IQ people, this could still support the hypothesis.

(the main reason you wouldn't want to do this is ceiling effects, I think).

Likewise, in a very-low-spirituality sample, you should still be able to prove things about spirituality. For example, I bet ACX readers will have more spiritual experiences on LSD than off of it, just like everyone else.

Also, less important, but I wouldn't describe this sample as "notoriously hostile to spirituality" - 40% said they had a spiritual experience or something like it, some of my most popular posts are ones on meditation and jhanas and stuff.

Expand full comment

I agree it's still evidence but it's evidence that should be treated with extreme skepticism and then more skepticism in case you were insufficiently skeptical the first time.

It's not certain that high IQ folks will have different reactions to IQ increasing techniques than the general population but it's more likely than not.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

This is true to first order, and probably true in most cases, but I suspect not all.

Suppose bananas increase variance of IQ without affecting the mean, and Mensa contains a random sample of people with IQ >140. Then Mensa banana-eaters would have higher IQs on average, even though banana-eater IQ in the general population would have average IQ of 100. I think this is the main reason people argue for smaller schools - the top 100 schools have more small schools than expected, because small schools have higher variance in scores.

To give a more relevant example, consider (my mental model of) crypto. I imagine crypto use has a U-shaped correlation with techiness, where normal people distrust it, moderately techy people like it, and very techy people distrust it again (this is probably wrong, but pretend it’s true for my example). Then in the general population, IT people would use more crypto than normal, but on ACX, IT people would use less crypto than the general population.

Probably 99% of surveys aren’t like this, but probably only 10% of surveys look like they might be like this, to a given observer. For that observer, those surveys should only be able to update their belief by p=0.1, unless they can be convinced that there’s actually less than a one-in-ten chance this survey has a nonlinearity like the second example.

Expand full comment
founding

You’re right each and every survey might have sampling bias, but whether that’s a problem ultimately all depends on what constitute a good model of reality. Would you like to see some code to prove this point?

In words: suppose banana-eating increase IQ but only if you *don’t* eat enough fish. Then, sampling high IQ means sampling high status, then more diversified food, then less impact of banana-eating, and you might miss it even if statistical power is good.

Or, suppose banana-eating increases IQ, but only if you *do* eat enough fish. Then, sampling high IQ means sampling high status, then more diversified food, then more impact of banana-eating, then failure to replicate in general population.

Expand full comment

This example is exactly the point I was trying to make. Except I don't think you need the causal parts at all. It doesn't matter that IQ is a proxy for status is a proxy for more diversified diet. If banana-eating increases IQ only if you *don't* eat enough fish, almost certainly Mensa folks who don't already eat a ton of bananas are already eating enough fish. And you know that because they have massively high IQs. By selecting "Mensa members who don't eat many bananas" you're virtually guaranteeing that you're selecting for every single confounder to your banana hypothesis, whether you're able to identify those confounders or not.

If you want to see whether spiritual experiences correlate with mental health, and your sample is "non-religious folks who are mentally healthy" then you've selected exactly the group that would confound the hypothesis, even if it's true.

To be fair, that's not *exactly* what happened with the survey, but it's close.

Expand full comment

> For example, I bet ACX readers will have more spiritual experiences on LSD than off of it, just like everyone else.

ACX readers - maybe. On LW, maybe not? LSD doesn't seem to just generate random beliefs out of nowhere. Unless stuff like ego death counts as spirituality.

Expand full comment

Going to Aella's tweet that was linked:

> using it as a way to feel superior to studies, than judiciously using it as criticism when it's needed

just because people use selection bias as a way to feel superior to studies doesn't mean that the study isn't biased in the first place

and

> But real studies by professional scientists don’t have selection bias, because...

ignoring the fact that professional studies control for selection bias, or at least have a section in the paper where the participants are specified, unlike twitter polls

Expand full comment
author

As I've said many times in this post, I challenge you to find these professional psychology and psychiatry studies that "control for selection bias". I think doing this would actually be extremely irresponsible without a causal model of exactly how selection into your study works. If you look at actual psych studies (eg https://asset-pdf.scinapse.io/prod/2001019597/2001019597.pdf and https://ajp.psychiatryonline.org/doi/10.1176/appi.ajp.20220456 , randomly chosen just so we have concrete examples, they don't do anything of the sort.

I agree they sometimes mention participant characteristics (although I think that psych study I linked doesn't even go so far as to mention gender, let alone class), but so does the SSC survey! I agree Twitter polls are extremely vulnerable to selection bias (especially since their polls), but my impression is that Aella also does more careful surveys.

Expand full comment

If someone's doing an RCT they're "controlling for selection bias." Their inferences are comparing a treatment group to a fundamentally similar control group.

What they're not doing is demonstrating or accounting for external validity. You're right that the best you can ordinarily expect is some description of the sample, plus perhaps a heterogeneity analysis or a reweighting of the sample to look like some population of interest.

But these are different problems, and the "selection bias" problem is a lot more fundamental than the "external validity" problem. If you have an unbiased study of a weird population, you're still measuring a real effect, and can think about how likely it is to generalize by thinking about the likely mechanisms of effect. If you have a study that's biased by the weirdness of your population, the correlation you measure might just be measuring how the factors you study affect people's likelihood of reading your blog, without any real relationship or real mechanism.

Expand full comment

Selection bias can and absolutely does break correlations, frequently. The most obvious way is through colliders (http://www.the100.ci/2017/03/14/that-one-weird-third-variable-problem-nobody-ever-mentions-conditioning-on-a-collider/) - but there's tons of other ways in which this can happen: the mathematical conditions that have to hold for a correlation to generalize to a larger population when you are observing it in a very biased subset are pretty strict.

Further: large sample sizes do help, but, they do not help very much. There is a very good paper that only requires fairly basic math that tackles the problem of bias in surveys: https://statistics.fas.harvard.edu/files/statistics-2/files/statistical_paradises_and_paradoxes.pdf (not - this is not specifically correlations, but the problem is closely related). Here is the key finding:

Estimates obtained from the Cooperative Congressional Election Study (CCES) of the 2016 US presidential election suggest a ρR,X ≈ −0.005 for self-reporting to vote for Donald Trump. Because of LLP, this seemingly minuscule data defect correlation implies that the simple sample proportion of the self-reported voting preference for Trump from 1% of the US eligible voters, that is, n ≈ 2,300,000, has the same mean squared error as the corresponding sample proportion from a genuine simple random sample of size n ≈ 400, a 99.98% reduction of sample size (and hence our confidence)

And keep in mind - this is in polling, which 'tries' to obtain a somewhat representative sample (ie, this sample is significantly less biased than a random internet sample).

Expand full comment

Looking at Aella's data & use of it, I don't have the same concerns I may have about the SSC survey used on religious issues.

So this chart, for example:

https://twitter.com/Aella_Girl/status/1607641197870186497

I am not aware of a likely rationale for these results to change by the selection effect, specifically on the axis studied. I may wave the selection effects concerns if the slope were the specific question, but not the presence of a slope.

Even further, I don't have a background where Aella is trying to turn a very messy & vague original problem statement into something to attempt to refute without providing a number of caveats.

It is valid to push back that selection effects are everywhere. It is valid to argue that SSC data has some evidentiary value, and that as good Bayesians we should use it as evidence. The tone of the post does not hit the right note not to have it rejected.

However, to push-back the push-back, I would seriously try to assess if you have a challenge in dealing with disagreements or challenges. Not trying to psychologize this too much, however, is this post actually trying to raise the discourse? Or is this post just trying to nullify criticism? Are you steelmanning the concern, or are you merely rebutting it?

Expand full comment
Dec 27, 2022·edited Dec 27, 2022

if you're talking about mental health issues and mystic visions, then I'm (1) religious (2) have had *very* few 'spiritual experiences' (about one, maybe two tops that I remember) (3) have *never* had the big flashy ones and (4) do think that the first question that should be asked about people reporting big flashy experiences is "are you nuts in the noggin?"

So, absolutely the audience on here is selected for people who aren't religious and wouldn't quantify experiences as "spiritual experiences". *But* it is also selected for people who have used all kinds of drugs, brain-hacking, and nootropics, so there's a very good chance they have had the 'mystic trip I was talking to entities' experiences. That they put those down to "yeah well drugs fuck your brain up so you have those kinds of visions" rather than "it absolutely was the spirit of the drug enlightening me to the cosmic secrets" makes me trust their reports more rather than less.

Even I don't think that everyone who claims they have regular weekly chats with Jesus, the Blessed Virgin, or God Almighty about the state of the world are having what they say they are having; they may be sincere but deluded (nuts in the noggin) or they may be fakes and fraudsters scamming people. People who are mostly sane and are having genuine experiences are not that common.

Expand full comment

Right, but if you want to say that drugs can change the frequency of religious experiences and that this group is biased, then the selection effects may distort the value of the study.

I don't think any caveat was given in the original Tweet that mystical experiences had to be full on hallucinations. Most people do not hallucinate that vividly without some pharmaceutical help. And vague things like "I feel oneness" or "I feel calm" or "I feel spiritually empowered" are easily distorted by mental priors.

Expand full comment

An interesting solution to the problem that surveys are so easy to give online (creating strong publication/heard of bias) would be to setup a website where poll givers have to post a certain sized donation (say to givewell) to give the survey to duplicate the effect of offline polls being expensive to give thereby reducing publication bias.

Expand full comment

I’ve been thinking a possible online business / salve for democracy would be “a weekly election on what matters most to you.” Basically like a Twitter poll but slightly less crazy.

If people volunteer their demographic info, this would be very valuable for customers like businesses and politicians. End users get the satisfaction of someone somewhere finally listening

Expand full comment
Dec 27, 2022·edited Dec 27, 2022

I'm sympathetic to pushing back on lazy criticism, but also I think the context of how the result was produced is very important for calibrating how strongly one can take it as evidence. It's certainly true that all surveys are inherently "flawed" due to selection bias issues. There's a few ways to proceed from this:

(1) Throw up one's hands, declare the truth unknowable, and post a picture of an airplane wing with bullet holes.

(2) Acknowledge that this survey, like all surveys, is imperfect. But hey, the result sure is interesting, it makes some kind of intuitive sense, and there's no obvious reason why it really shouldn't generalize. Take the exact numbers with a grain of salt and hope that the first order effect dominates, as it often does.

(3) Do a lot of careful statistical analysis to attempt to correct for unrepresentative aspects of the sample. Compare results to literature for previous research into related questions. Submit to peer review and respond to critical feedback. Attempt to replicate.

Response (1) is the kind of lazy critique that this post argues against, and I agree that it is poor form and doesn't contribute much. Response (2) is reasonable for generating hypotheses and building intuition about the world, but it will also lead you astray a nontrivial fraction of the time. Response (3) is closer to what a professional researcher would do, but it takes a lot more time and expertise and will still be wrong sometimes.

I think the interesting conflict comes from conflating (2) vs (3). Someone accustomed to (3) and may look at people doing (2) as naïve and out of their depth, and also dilutive to more rigorous work because it may look the same to undiscerning lay people. Meanwhile someone doing (2) may look at people demanding (3) as gatekeepers with excessive demands for rigor whose preferred methods aren't exactly bulletproof either. This could easily degenerate into a toxic discourse where people just yell past each other. But provided they are given with appropriate context, I think both (2) & (3) can be useful ways to build knowledge about the world. Rigor is useful but it's not a binary where everything insufficiently rigorous must be discarded as useless and anything that meets the bar accepted as eternal truth.

Expand full comment
author
Dec 27, 2022·edited Dec 27, 2022Author

As I've said above, here is a typical well-regarded study from a top psychiatry journal: https://ajp.psychiatryonline.org/doi/10.1176/appi.ajp.20220267

They mention that they recruited the participants "through advertisements". Where is the careful statistical analysis to correct for unrepresentative aspects of the sample?

Here is a typical psychology study in a good journal, same question: https://asset-pdf.scinapse.io/prod/2001019597/2001019597.pdf

If these studies don't have any of 3, would you call them study non-careful or bad research for that reason? If you looked at it outside the context of this post, would you have thought to criticize them for not caring about selection bias enough?

Expand full comment

For the 1st study, this uses randomized assignment which I'd put in a different category from survey-based studies. I generally trust hypotheses testing via randomized assignment as being an inherently more robust form of knowledge production compared to survey studies (which I am more skeptical of) because the direct source of randomization buys you a lot, so I'm less inclined to worry about selection bias in such cases even if it could theoretically distort the results somewhat.

For the 2nd study, I don't think I can fully conclude whether it is a good study or not in a quick skim but I note that they did some things that help make it more likely to be a good study. They cite a lot of relevant previous work (which makes me think they thought carefully about the problem space). They went through peer review at a reputable journal (which makes me think they received and responded to 3rd party critiques). They provided measures of test-retest reliability. They used their IAT study to predict outcomes in a separate experiment and found it more predictive of those outcomes than other measures, validating their prior hypothesis. That said I am a bit more inherently skeptical of this one, though I may also be a bit biased against social psych.

I want to take a step back though to emphasize that I'm not trying to lay down "correct for unrepresentative aspects of the sample" as a core ironclad rule of good research, just one tool that can be useful in some cases. My point is more about a package of things that goes into producing rigorous academic research, where checkpoints like peer review help to weed out mistakes and ensure that the methods used are appropriate for answering the question of interest (while still being highly imperfect!).

Also I really don't think that amateur research on online surveys is bad either, I hope this came across in my comment. I do however think it is less rigorous than professional research. In both the amateur and professional research cases, I think your statement:

>In real life, worrying about selection bias for correlations looks like thinking really hard about the mechanism, formulating hypotheses about how you expect something to generalize to particular out-of-sample populations, sometimes trying to test those hypotheses, but accepting that you can never test all of them and will have to take a lot of things on priors.

holds true and is a good summary of the issue. I just also think that professional research has typically gone a lot further in that process of thinking hard about the mechanism, formulating hypotheses, discussing, testing, applying hard-won best practices, etc. As such I have more faith in professional researchers to have thought carefully about whether selection bias issues are important for their particular study, and attempted to deal with them if so. But it is the overall scientific process here more than any one silver bullet methodological component that I think is valuable. But this process is also cumbersome, slow, cloistered, and inaccessible -- all of which leaves a lot of space for amateur research to have a role in knowledge production too, even if it likely has a higher false discovery rate.

Expand full comment
author
Dec 28, 2022·edited Dec 28, 2022Author

When you talk about RCTs being more robust than surveys, and correcting for unrepresentative aspects of the sample, are you talking about controlling for confounders between two groups in the same experiment?

I agree this is useful (if you can do it right), but I think of it as very different from correcting from selection bias in the whole sample.

If you are actually talking about something different aimed at selection bias in particular, aimed at adjusting the results to make the study representative of the entire population, I would be very interested in seeing an example of a psychology, psychiatry, or medical study that does this, so I know more about what you mean.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

You're right that what I'm saying about the robustness of RCTs is not about selection bias in particular. I think there's two points here neither of which is exactly what you want re aiming at selection bias. One point is as you say, RCT lets us control for confounders between two groups in an experiment, which is a very useful and powerful thing but doesn't help for selection bias in the whole sample. The second point is subtler and perhaps a bit more speculative. Because RCTs are very good at isolating causal effects in relation to a specific treatment, they tend to sit closer to an actual mechanism and have fewer ways to be unexpectedly distorted by selection bias. In other words, one typically has to tell a more convoluted story to explain away an RCT with selection bias, compared to a survey. This is admittedly just a heuristic though, and not foolproof.

Expand full comment

"But generalize to the entire US population, and poor people will be more obese, because they can’t afford healthy food / don’t have time to exercise / possible genetic correlations."

And, to be impolite, because many of the same things that make them more likely to be poor make them more likely to be obese: lower intelligence, less ability to defer gratification, less ability to plan and follow through, etc.

Expand full comment

Ah yes, the undeserving poor.

Expand full comment

I don’t see how any of those things imply (or would even seem to imply) undeservingness? Since when is intelligence (or ability to plan, etc.) a determinant of someone’s moral character (notably distinct from the morally-relevant effects they have on the world).

Lower intelligence/planning ability/whatever may make someone less able to make the world a better place, but I don’t think it’s really relevant to their moral *character* (and so deservingness).

Expand full comment

Well it’s a Victorian phrase, the undeserving poor were those of the poor who didn’t deserve big Irish charity.

I realise it’s misleading here alright, so I retract it. Instead you have a belief in the deserving poor, the poor that deserve to be poor.

Expand full comment

I think the word "deserve" here is unreasonably freighted with moral overtone. The actual statement was more along the lines of "there are often reasons rooted in the nature of the individual that make people poor" -- which is important because it suggests that no amount of engineering of the society *around* the poor will eliminate all of (or perhaps even most of) their problems, because the source of their problems is to some extent their own nature and habits.

Whether one attaches a moral judgment to that observation is optional. Some people do -- it looks like you do, so OK. But many people don't. I don't. Everyone has his struggles. Some people have difficulty not getting fat, or staying in shape, or getting or staying married, or being happy. Some other people have difficulty in staying off the booze or pills, or holding down a job, or not being poor. None of these things are good, and everyone should do his best to overcome whatever flaws he got dealt by life, but it's perfectly possible and reasonable to judge the effort to do so independently of any success.

Expand full comment

"Undeserving" had both a moral and a pragmatic sense. The moral sense was, "he did something wrong; he should suffer the consequences." The pragmatic sense was, "If you help him out, he'll just keep doing the things that messed him up in the first place." (People who have stayed awake in an economics course probably just had the word "incentives" pop into their heads.) Some of the "undeserving" were the first, some the second, and some both.

Expand full comment

This morning I was reading Raymond Aron's Main Currents in Sociological Thought, volume 2: Durkheim, Pareto, Weber and came upon the following quote. Not exactly on point but related.

"Furthermore, in [Max] Weber's thought the theory of justice involves a fundamental antinomy. Men are unequally endowed from the physical, intellectual, and moral standpoints. At the outset of human existence, there is a lottery, the genetic lottery, and the genes each of us receives results literally from a computation of probabilities--each individual represents an improbable combination of tens of thousands of genes. Since inequality exists at the outset, there are two possible orientations: one that would tend to obliterate the natural inequality through social effort; and another that on the contrary would tend to reward everyone on the basis of his unequal qualities. Weber maintained, rightly or wrongly, that between these two antithetical tendencies--the adjustment of social conditions to natural inequalities and the attempt to erase natural inequalities with a view to a kind of social equalization--there is no choice governed by science; every man chooses his God or his devil for himself." (pp192-3 in the Routledge Classics edition)

Expand full comment

Has no one seen the musical "My Fair Lady"?

"I don't need less than a deservin' man, I need more! I don't eat less 'earty than 'e does and I drink, oh, a lot more." -- Alfred P. Doolittle

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

idk man my executive function (things like ability to defer gratification and ability to plan and follow through) is complete trash - I have ADHD and autism - and yet I'm neither poor nor obese. If I lost 20 IQ points I expect I would still be neither.

Honestly my most likely route to poverty would have been if I went to grad school and stayed in academia, which probably would have happened if I had slightly *more* intelligence and/or EF.

Expand full comment

Congrats on being neither poor nor obese. I suspect that your executive function is a lot better than you realize (perhaps with the help of one or more psychoactive chemicals).

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

I... really don't think so? If avoiding being poor or obese required showing up on time to work consistently, regularly choosing not to impulse-buy objects, not eating when I was hungry, stopping eating before I was full, exercising with any regularity, tracking what I ate, or consistently choosing to skip high-calorie junk foods, I would just fail at it.

(I have tried all of these things despite them not being required of me, in an effort to be healthier and more self-disciplined, and in fact just failed. And while exercise does not appear to be required of me to avoid obesity, I could be in significantly less physical pain if I did, and just don't.)

Like yes, I think it's correct that the poor and obese lack executive function, but my point is that a lot of rich and thin people also do and are just lucky. (Similarly - yes, if you give money to homeless people they will likely spend a bunch of it on drugs and alcohol, but you were also going to do that.)

Re obesity my luck is mostly genetic, perhaps also taste in food (I actively do not want McDonalds and soda).

Re money, I chalk it up mostly to the fact that some combination of nature and nurture have made me a person who is happy to sit around solving moderate-difficulty logic puzzles all day every day, and that I happen to live at a time in human history where that's a very lucrative thing to do. (Scott recently called this position "Senior Regional Manipulator of Tiny Numbers")

Expand full comment

Congrats on creating a niche where you are happy. It sounds like you have a job which pays reasonably well and doesn't require you to "show[] up on time to work consistently". And that the impulse buys that you make aren't big enough to do you harm.

Expand full comment

"moderate-difficulty logic puzzles" -- sounds fun, what do you do for work? Programming?

Expand full comment

Well, sure. I think the steelman argument is that selection bias is often much worse for a survey on the internet than a psych 101 study. No psych professor has to worry whether all their respondents are all horny, always online boys because they recruited by posting nudes on Twitter, or whether they’re all participating in the study just to fuck with someone’s results.

Also, your banana study title is killing me. It shows correlation, not causation, and as we all know…

Expand full comment

When I was diagnosed with pancreatitis, I immediately searched the internet for information. Unfortunately, the first serious-looking research paper I found declared the ailment had a 60% survival rate in five years.

I didn't like that one bit, so I kept looking. After a couple weeks I found another paper that declared the five-year survival rate was over 90%. I liked that paper a lot better.

Seven years on, my survival rate is 100%. So, Is my confirmation bias confirmed?

Expand full comment
Dec 27, 2022·edited Dec 27, 2022

I wonder if the mere fact that you restrict the sample on x axis, or y axis, causes the correlation between x and y variables to be completely different than in the general population.

For example: suppose that psychology students never eat less than one banana per year - other than that they do not have any fancy physiology or mental properties - wouldn't that alone restrict the "elliptic" picture of the x-y correlation to a fragment in which this ellipse has a particular slope?

I've made a tool to help me visualize this:

https://codepen.io/qbolec/pen/qBybXQe

in this demo there are two variables:

X is a normal variable with mean=0 and variance=1

Y depends on X, in that it is a Gaussian with mean=X*0.3 and variance=1

So, we expect the correlation to be positive, because the higher the X, the higher the Y in general and indeed the white dots form a slanted elliptic cloud. And the correlation in general population seems to be ~0.29.

But if we restrict the picture to the green zone in the upper right corner of the ellipse, I sometimes get negative correlation for such sub-sample, and I never get close to 0.3.

(Sorry, I could not get this demo to robustly show the negative value, though)

IIRC the https://www.lesswrong.com/posts/dC7mP5nSwvpL65Qu5/why-the-tails-come-apart was about this phenomenon.

Expand full comment

Scott is a clever guy, but here he is on thin ice, for reasons others have pointed out above. Testing correlations (hunting for causality) the way he did in the blog post he refers to (healthy people less often report mystical experiences) is a subtler version of what is commonly referred to as “red wine research”.

…lots of studies find a positive correlation between drinking red wine and scoring high on various health measures. Some researcher is then quoted in the news media suggesting a causal relationship: There must be something in red wine that improves health. And there may be.

However, drinking red wine is correlated to being upper middle class. And upper middle class people score higher on many/most health indicators.

You can do multivariate regressions and the like to reduce the problem, but the number of control variables will always be limited. Unobserved heterogeneity is always with us, in such correlation studies. The problem is particularly acute if you do not even have a time series (panel study).

The problem with the correlation between health and mystical experience is more subtle - it is not a straightforward 3rd variable problem. So it is not a straightforward “red wine research” problem (I do not want to insinuate that Scott is not aware of statistics 101). The subtler problem first has to do with possible selection in who among healthy people that are ACX readers & who among them that filled in the survey. Perhaps they are a particularly secular bunch of healthy people, who give secular explanations to “strange” personal experiences that run-of-the-mill healthy people would label mystical experiences. Secondly, it has to do with the possibility that not-so-healthy ACX readers who filled in the survey may be a more mystically oriented bunch of people than run-of-the-mill not-so-healthy people. If so, they might be more likely than other not-so-healthy people to interpret “strange” experiences as mystical.

…this is based on a speculative hypothesis that ACX readers are composed of two groups of people: particularly secular rationalists drawn to Scott’s writing on rationalism, and particularly mystically-oriented people drawn to his writings on, well, mystical experiences of various sorts. And that there are correlations with self-declared health between these two select groups of readers (who responded to the survey).

Who knows.

Expand full comment

I would have speculated that there's a group of ACX readers who are primarily attracted by Scott's writings on psychiatry (I mean, _somebody_ has to be reading all those psychopharmacology posts that I skip).

I would also speculate that the only groups of people particularly interested in psychiatry are psychiatrists and their patients, and that the latter group is larger than the former, so many people in this group are going to have some kind of mental health issues.

Expand full comment

That is a good point. Not least considering that those who responded to the survey are a very select group of ACX readers.

My follow-up hunch is that readers with mental health problems are more likely to have participated for several reasons: more time on their hands, participation as an indirect way to be “seen”, and perhaps less concerned with the risk that your replies might be hacked and end up public. (The last risk, although small, was the reason I declined to participate - the questionnaire contained some very personal questions, not questions where you would be very relaxed if e.g. insurance companies got hold of the answers.)

Further, mystical experiences are in themselves often an indicator of mental health problems. A single psychotic episode can give you enough mystical experiences to last a lifetime.

Scott does not discuss the direction of the causal arrow/s, so this is not a criticism of how he presented the correlation.

But it illustrates that selection can move in mysterious ways.

Expand full comment

"Further, mystical experiences are in themselves often an indicator of mental health problems."

Which contradicts the claim in the tweet that healthy minds naturally have mystical experiences and not having them means something is wrong, so the claim remains unproven. If both healthy and unhealthy minds can have mystic experiences, then having mystic experiences cannot be used as a sign of mental health, nor can absence of mystic experiences be used as a sign of mental illness.

Expand full comment

Well...Scott's title was deliberately misleading, as he stated, since the survey showed that, on the contrary, mystical experiences were more common among ACX respondents who scored low on mental health.

Expand full comment
Jan 9, 2023·edited Jan 9, 2023

>However, drinking red wine is correlated to being upper middle class. And upper middle class people score higher on many/most health indicators.

Would the result be different if red wine had indeed health benefits? Upper middle class drinks more of it, hence is in better health.

Or, as I've seen it once put: "if you do enough statistical regression, you end up finding the exact genes that help you speak chinese.".

Expand full comment

Is this because of all the comments on your last post?

The issue I had wasn't that selection bias is present in your survey, that's unavoidable. The issue I had was that you were far more conclusive than your survey allowed you to be. You misused your data and stood on a soapbox at the end there.

Expand full comment
author

What are the reasons you think this?

Expand full comment

In short, if you had concluded that "SSC Readers show no clear relationship between self-perceived mental health and self-perceived mystical experiences", you would have accounted for bias in your conclusion. Instead you claimed the tweet is false.

Here is my full comment from the original post if you want to read it. AFAIK you have differing thoughts about 1) and we don't have to get into that. 2 & 3 are more relevant to this, but I wouldn't have mentioned 2 & 3 if you had ended with something like my proposed conclusion. There wasn't much to say, I know you're aware of different biases and survey question misinterpretation as potential concerns.

==============================================================

I'm not agreeing or disagreeing with the conclusion, because I think your analysis is bad and I declare a mistrial.

1) Your defined categories don't align with what the tweet is saying

> Someone qualified as very mentally healthy if they said they had no personal or family history of depression, anxiety, or autism, rated their average mood and life satisfaction as 7/10 or higher, and rated their childhood at least 7/10 on a scale from very bad to very good. Of about 8000 respondents, only about 1000 qualified as “very mentally healthy”.

You are characterizing a person as healthy or not and this is not equivalent to a healthy state of mind. A person is not their state of mind. It is common for people's minds to occupy healthy and unhealthy states.

I'll also note, I would have to respond as "not healthy" since I have a severe trauma history... but I would describe myself as mentally health as a result of all the work I have done. 1-1 and Group Therapy (Modalities include: CBT, DBT, ACT, IFS, Art, Music), somatic experiencing and breath work, acupuncture, PT, long distance running and swimming, meditation and mindfulness, spiritual practices, more time outside, emotional support dog, improving sleep and diet habits... I could go on.

2) Sample Bias

It was a SSC Survey. It is important to note that this community is full of thinkers. I would posit that this community will have less spiritual experiences than a more representative sample of the overall human population, regardless of how they self-assess

3) Inability to evaluate respondent understanding of the survey and self-assessment

i.e. What does a "spiritual experience" and the definition you provided mean to someone else?

> So this tweet is false, unless you’re using some kind of hokey ad hoc definition of “the mind is healthy”.

And here, you share your result while dismissing other interpretations of the tweet as "hokey ad hoc". To me the other errors and complications are understandable, but this conclusion feels callous and close minded to me. It is an interesting idea and I would love to explore it more.

Expand full comment
author

1. I'm not claiming that only people by my definition were "mentally healthy", just that this is an artificial category correlated enough with good mental health to check what the sign is. To use another example, if someone said rich people lived longer than poor people, and I tried to test this using "owns a private plane" as a proxy for rich, this wouldn't be a perfect proxy - many rich people don't own private planes - but it's a close enough approximation that if people who owned private planes lived shorter than people who didn't, this would be strong evidence that "rich people live longer than poor people" is false.

2. This is exactly the objection I'm bringing up in this post. It doesn't matter if ACX readers have fewer spiritual experiences on average, so long as things that are true about other people's spiritual experiences are also true about ACX readers'.

3. This is true of all possible questions, including questions asked in person.

The original tweet made a very strong claim about the relationship of mental health to spiritual experiences (that ALL people with good mental health had spiritual experiences, and it was more common among people with good mental health than bad mental health). While we can never perfectly define "mental health" or "spiritual experience", using the proxy definitions available I found the opposite of the tweet's very strong claim (rather than all mentally healthy people having spiritual experiences, in a group of people selected for extra-strong mental health, 80% had no such experiences, and it seemed less likely than in less mentally healthy people). I think this is very strong disconfirmation, and we can poke around the edges of exact definitions but the finding was too strong for the subjectivity of definitions to matter very much unless you use a really bizarre definition, which is why I said "hokey" and "ad hoc".

Maybe a better way of coming to mutual understanding here would be for you to explain what kind of experiment you think *could* test the tweet's claim?

Expand full comment

You could work toward a mutual understanding by being more generous toward what the tweet was attempting at.

You didn't have to crunch a single line to say: I'm sure healthy people exist that haven't had a "mystical" experience. We'd buy it.

But by doing all the crunching and leaving it with a this tweet is false period, we're left wondering. What did they mean? Are mystical experiences healthy? Normal? Did this quick run at a correlation miss something?

> Is transcendence, awe, wonder - the ability to commune with the great mystery of Being - are these irreplaceable parts of the human condition?

Without access to them, is a person is deprived of something they deeply, intrinsically need?

If you interpreted the tweet like that, then the original analysis seems ill equipped to answer the question.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

Let's take a look at what the tweet said:

"My most controversial mental health take is that mystical experiences naturally occur when the mind is healthy.

If you’re not regularly encountering the strange, the numinous, the indescribably beautiful, something isn’t right."

Now, if Stuck-Up Prat had posted only the first line, there wouldn't be too much of an argument. They acknowledge that this take is controversial. But they had to go on to the second, which says that if you're not having (their definition of) mystical experiences, your mind is not healthy.

Forget Scott, this is insulting to every body on the planet (except maybe the Dalai Lama and I can't speak for his calendar of regular mystic experiences).

By what metrics did Stuck-Up Prat decide what is and isn't an experience of the "indescribably beautiful", that this is what everyone would consider a "mystical experience" and that this should be a measure of "the mind is healthy"?

I can feel sorry for someone who doesn't appreciate art or beauty, I can even feel that this is a deficiency in their experience of being human, but I don't get to say that this means "something isn't right" with their mind.

The same way I do not get to say - even my own Church rules on this - that if you're not having regular visions of the Blessed Virgin every week when you pray the Rosary, that "something isn't right" with your religious belief.

So this tweet is false in that it makes a universal claim: healthy minds have naturally occuring mystic experiences and if you don't, something isn't right. This claim can be disproven by "I have good mental health and I have not had a mystical experience", which many people on here did use when filling out the survey.

The tweet did *not* say "lack of access to transcendence may be a deprivation of the full experience of being human" - and it evoked the numinous, not the transcendent - it said that having mystic experiences was a mark of a healthy mind and not having these, by implication, means an unhealthy, which is sick, which is mentally disordered, mind.

This is the problem with tweets - too short to convey anything useful but lending themselves all too freely to controversy.

The definitions of mystic, etc. used were not defined, so I don't even know if Stuck-Up Prat means the divine, the supernatural, the outside the material confines of this world, or do they just mean "really big but natural feeling, high on life, wow isn't this a gorgeous sunset, makes you feel there's more to life than the rat-race".

(And if they don't want to be called a Stuck-Up Prat, then they shouldn't write like a Stuck-Up Prat).

Expand full comment

Reading this is like watching Russell Crowe in the arena in "Gladiator." A hundred quatloos on the Celt!

Expand full comment

I get that because it's so vague, unless it's interpreted generously, it's offensive. And he certainly did intend some of its ambiguity to provoke.

But what I quoted was his follow up with a more expanded explanation. It's roughly how I interpreted the original tweet. I would check it out. It provides a lot of clarity and depth to the topic.

I didn't interpret the initial tweet as an attack, or superiority posturing. Instead he claims that mystical experiences are normal, and happen regularly. No esoteric prescriptions needed. It's saying sacred connection belongs to everybody.

Mystical experience can't be precisely defined. But if you have an experience that affords appreciation of life and its potential and that experience causes a transformation within you to move towards that appreciation, you're on the right track.

Without these experiences, I think it's hard to argue that all's well.

Expand full comment

You are clearly offended by the "stuck-up prat" "with their head stuck up their own backside who like to look down their prodnose at ordinary people..."

and I think it is clouding your ability to have an open mind towards the tweet. Like Jacob, I advise you take a deeper look at the intent rather than clinging to the idea that they have offended, pathologized, or wronged you by expressing an idea of their own to the world at large.

Expand full comment

Well put Jacob, well put. You phrased it much succinctly than I by naming some of the unresolved questions of this approach.

> But by doing all the crunching and leaving it with a this tweet is false period, we're left wondering. What did they mean? Are mystical experiences healthy? Normal? Did this quick run at a correlation miss something?

Expand full comment

Thanks for the thoughtful reply.

1. Yes I understand your synthetic definition and I'm saying I think the proxy falls shorts. I have no idea how reliable these statistics, but according to this site (https://www.stratosjets.com/blog/private-jet-statistics/#111_How_many_private_jets_are_there_in_the_world), there were a total of 21,979 active private aviation jets in 2019. According to wikipedia (https://en.wikipedia.org/wiki/List_of_countries_by_number_of_millionaires), there are ~62.5 million millionares in the world. Again, the proxy falls quite short. You can claim its close enough, I can disagree, and there can be a struggle for alignment (as we saw in th comments of the og post).

2. ACX reader experience being akin to non-reader experiences is a key assumption. It can be made and I don't agree with it. I think the sample needs to be stratified across different religious and spiritual self-identifications, as I imagine that ACX is not representative of all major religions and spiritual groups. That is my hypothesis. Do you have data on the religious identifications of people from the survey that you used for this analysis?

3. Agreed.

I also agree the tweet makes a very strong claim. The poster knows it is controversial and we have seen the controversy in all of this conversation. Even though the claim aligns with my personal experience, I would not make this claim due to the strength of it and the way that it can trigger certain people who feel pathologized as unhealthy by the statement.

> Maybe a better way of coming to mutual understanding here would be for you to explain what kind of experiment you think *could* test the tweet's claim?

I am going to answer this with an experiment. First, it may feel like a non-answer, but I don't feel like this claim can be reliably evaluated. The main reason is that a mystical experience is an inherently internal experience and words can only do so much to communicate and describe these. Someone can have the expeience and not realize it, or not have the experience and say they did. If we want to design another experiment, Ihink we have to address this rather than assuming it away.

If I was going to give it a best effort, I would introduce the following:

a) stratify the sample across different religious and spiritual self-identifications

b) broader definition of healthiness with non-binary responses

c) self-assessment of confidence on health

d) frequency of mystical experiences

e) self-assessment of confidence on mystical experiences

f) whether or not an increase in the frequency of mystical experiences correlated to a period of time improving mental health

g) whether or not a decrease in the frequency of mystical experiences correlated to a period of time of decreasing mental health and/or trauma

h) my gut says there is perhaps more, though nothing is coming to mind at the time.

This is clearly a difficult and messy undertaking with plenty of pitfalls. I understand why one would want to use a simplified approach, it just doesn't do it for me. I'm happy sticking to my own experiences and not generalizing it with a claim like this.

Expand full comment

The tweet was claiming that if you're not regularly having mystical experiences, there's something wrong with you. That needs a whole heap of clarification.

(1) How do you define mystical experiences? Full-on divine revelations, drug trips, a sense of peace and communion with nature, 'I'm so superior in my lifestyle of mindfulness and expanding my horizons that I can appreciate things the common clay cannot'? That last was a big thing by the more educated/better-off about the poor or the working class or non-white people or non-male people, by the way; poor/working-class people are coarser in their emotions and sensibilities, they are not as badly affected by pain and suffering and they can't appreciate things like fine art and the life of the mind.

(2) How frequently should you be experiencing these mystical experiences?

(3) Define a healthy mind. Tell me how you can psychoanalyse and diagnose me with improper mental functioning, deficiency or disorder merely on the basis of "when was your last experience of the numinous and strange?"

People can experience a sense of great peace and communion with nature, feel awe at the majesty and beauty of the cosmos, be lifted out of themselves by experiencing great art (of all the arts) and yet not define these as "mystical".

So is our anonymous tweeter someone with a healthy mind so that mystical experiences naturally occur for them and they regularly encounter the strange, the numinous and the beautiful - or are they a conceited prat with their head stuck up their own backside who like to look down their prodnose at ordinary people who don't have the same superior experience of the world due to their unhealthy minds? You tell me!

Expand full comment

> The tweet was claiming that if you're not regularly having mystical experiences, there's something wrong with you. That needs a whole heap of clarification.

The tweet does not say "something is wrong with you", the tweet says "something isn't right". I have no conclusion for or against the idea that "not having mystical experiences" implies "something isn't right".

As a mental health professional, I would never ever say "something is wrong with you". The difficulties we have as people are normal responses to the abnormal and unnatural circumstances of the societal systems we've created. With the right tools and introspection, we can identify the root cause of the behaviors that make us think "something is wrong with us" and realize that "something isn't right" with the world.

I do know from my personal experience, that the issues I have from my childhood trauma inhibited my ability to have mystical experiences. I do know from my personal experience, that in attempting to allow myself to have mystical experiences (which is subtly different than attempting to have the experience itself) I have been able to identify and work through the related trauma. I can have several of these moments in a day or none at all, and the moving average keeps going up. I leave it at that, because I have no ground to stand on to make a broader claim like the tweet.

1) I don't understand what you're trying to say here in response to me. If you're defining mystical experiences for yourself, I still don't understand the takeaway and the introduction of education and class... To try and further the conversation... I'm of the opinion that people with less priviledge often, not always, spend more time sitting with the difficulties of their life and build greater psychological flexibility and emotional resilience.

When I was protesting in the summer of 2020, I was in the middle of a manic episode. Part of what shocked me out of it was seeing how calm and collected many of the people who were directly affected by the social injustices were... whereas I was dysregulated as all get out even though I could walk away without being directly affected.

2) How frequently should you be experiencing these mystical experiences?

I don't think there is a "should". That sets an expectation/judgement that is hard to meet given the adveristy we face.

I can answer "How frequently could you be having these mystical experiences?""

I think life itself can be a predominatly mystical experience rather than the opposite. To me, this is akin to higher degrees of enlightenment.

3) Define a healthy mind

One that is at peace with itself and able to relate to the world around it with fluidity.

I understand the purpose of mental diagnoses and I only find them to be useful up to a point, they're inherently limiting and wrong. I advise my clients not attach their identities to their diagnoses. They're western medicine's best attempt so far, and nothing more.

4) So is our anonymous tweeter someone with a healthy mind so that mystical experiences naturally occur for them and they regularly encounter the strange, the numinous and the beautiful - or are they a conceited prat with their head stuck up their own backside who like to look down their prodnose at ordinary people who don't have the same superior experience of the world due to their unhealthy minds? You tell me!

Um.... okay.

I think they are a person who is sharing a personal opinion, that they recognize in their own words as a "controversial mental health take", with the hope that some people will get curious and explore why they don't have the mystical experiences.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

Sample selection can be a problem for other reasons as well (i.e. Berkson's paradox).

Expand full comment

There's a specific circumstance where selection bias is fatal for correlations: when examining correlations on characteristics related to selection. Take your obesity example:

"in a population of Psych 101 undergrads at a good college, family income is unrelated to obesity. This makes sense; they’re all probably pretty well-off, and they all probably eat at the same college cafeteria. But generalize to the entire US population, and poor people will be more obese, because they can’t afford healthy food / don’t have time to exercise / possible genetic correlations."

The big problem here isn't that everyone's reasonably well-off, it's that because college selects for well-off people, people who aren't well-off and who end up in college anyway will have a bunch of compensatory characteristics that help them get selected into college. To make it extremely simple, we could imagine that whether you go to college is entirely a function of family income and something like personal grit/self-control. In this case, we'd expect that the minimum amount of self-control necessary to get into college would be higher for lower-income people. As a result, if there was no other relationship between self-control and family income, we'd end up with a negative correlation between the two among college students that was stronger the more selective the college was (and thus the more people are on the line between being selected and not).

So now when you do your obesity study, you'll get a biased estimate of the effect of family income on obesity because family income will be negatively associated with self-control, which is itself negatively associated with obesity. This will be true despite the fact that there's no relationship between self-control and family income in the full population.

In the case of the ACX reader surveys, this might mean that people who are least like other ACX readers (for instance, non tech people, women) will be more selected for ACX-ness than are the people most likely to read ACX.

My favorite example of this is basketball players and height, btw. My guess is that if you surveyed NBA players on how much time they spent playing basketball as kids, the shorter players would have spent more time playing basketball than the taller players, because short people need fantastic basketball skills to be NBA players while tall people only need decent basketball skills. This would be the exact opposite correlation you would get with any other group of people.

Expand full comment

Scott has previously discussed this kind of example (which makes me surprised he didn’t here!):

https://slatestarcodex.com/2014/03/01/searching-for-one-sided-tradeoffs/

“There is a fun legend I heard in a stats class – I don’t know if it’s true – of a psychology professor who got very excited about her new theory that the brain traded off verbal and mathematical intelligence – being better at one made you worse at the other. She got SAT Math and SAT Verbal scores from her students and found it supported her theory. A friend of hers did a replication at his college and found support for the the theory there as well.

But larger scale testing disconfirmed the theory. What the professors working off college samples were finding was that all of the kids in their college were equally “good”, in a general sense, so excellence in any quality implied a tradeoff in other qualities. Suppose the professor worked at a mid-tier college – students with SATs much less than 1200 couldn’t get in; students with SATs much more than 1200 could and did go to better schools instead. Then all her students would have SATs around 1200. Which meant a student with an SAT Verbal of 700 would have an SAT Math of 500, a student with an SAT Math of 800 would have an SAT Verbal of 400, and boom, there’s your “trade-off of verbal and mathematical intelligence”. Obviously the tradeoff wouldn’t be perfect, since there’s random noise and since students are also trading off less obvious qualities like attractiveness, wealth, social skills, athleticism, musical talent, and diligence. But it would be more than enough for her to find her correlation if she was looking for it.”

Expand full comment

This is such a perfect example--thanks for sharing this!

Expand full comment

The statistical phenomenon there is generally known as "restriction of range".

Expand full comment

I believe "Berkson's paradox" also refers to this

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

Yes, but Berkson's paradox is a subset of the general problem.

On the other hand, it's a much better wikipedia page; https://en.wikipedia.org/wiki/Berkson%27s_paradox has several detailed descriptions of what's happening, while https://en.wikipedia.org/w/index.php?title=Restriction_of_range is a redirect to a stub page that spends 55 words on the concept.

That stub description also appears to be incorrect; it states that correlations are attenuated by restriction of range, where I would interpret "attenuated" as meaning "drawn towards zero". Berkson's paradox is most typically the transformation of a zero correlation into a negative correlation, and can also easily show the transformation of a positive correlation into a negative correlation. And since it is caused by restriction of range, those are also effects of range restriction.

Expand full comment

This is a side issue but because of very high recent population growth this almost certainly isn’t true right: “And then generalize further to the entire world population over all of human history, and it stops holding again, because most people are cavemen who eat grubs and use shells for money, and having more shells doesn’t make it any easier to find grubs.” I’m not sure when the median person who ever lived was born but I bet it was sometime in the 20th century, no?

Expand full comment
author

This source says the median person lived around 1 AD: https://www.prb.org/articles/how-many-people-have-ever-lived-on-earth/ . There weren't that many humans around in the Paleolithic, but there were 200,000 years of it.

Expand full comment

Wow! Interesting, and very counterintuitive for me.

Expand full comment

The problem is not with the surveys themselves, the problem is with how people interpret their results. Yes, you're smart enough and savvy enough to mentally append "this is true for the types of people who follow Aella's Twitter account" to every conclusion you draw from her surveys. But I doubt that all of Aella's followers are also that smart and savvy. A lot of people will probably assume that the results are representative of the general populace, simply because they haven't even considered the fact that the results might be unrepresentative.

And for what it's worth, I really like Aella's surveys, and I genuinely think there's a lot of value to be found in them! I just also think saying "take internet survey results with a grain of salt" is a useful reminder, because not everyone takes their full context into consideration by default.

Expand full comment

"A lot of people will probably assume that the results are representative of the general populace, simply because they haven't even considered the fact that the results might be unrepresentative."

I think that is the major objection and it's the best one. The objections I've seen, though, seem to be trying to discredit the survey/her completely due to internal disagreements over the alleged results, with some people wanting to purge the whole thing because it doesn't fit their model of kink or sex-positivity or whatever the hell is going on, and others wanting to do the whole appeal to authority bit because 'but it's Aella'.

I think the most that can be said is indeed "this is representative of the kind of people who follow this account and are willing to fill out surveys about their sex lives" but no more, but a lot of people are treating it as if it is more than that, both pro- and anti-.

Expand full comment

"Internet studies have selection bias and academic studies don't" is a strawman. A stronger form of the argument is that it's typical for Internet studies to *select on the dependent variable* in ways that are much more concerning than the typical Mechanical-Turk or psych-undergrad samples of an academic study.

While academic studies often use WEIRD samples that are somewhat better-educated, richer, etc than the global average, Internet convenience samples—particularly those from blogs like this or Aella's that have a strong "flavor"—are biased along ideological, cultural, or interest-based affinity dimensions, in addition to selecting for literacy and Internet access in ways similar to psych-undergrad studies. Furthermore, a typical Internet study asks questions about topics specific to the interest(s) distinctive to the sampled population, which makes it much more likely that results will be unrepresentative and even correlations won't generalize.

Aella is a clear example of this: she's a former sex worker who has gathered a following by flouting normal social taboos about sex and sex talk, and she asks these followers about exactly these topics. It's certainly interesting to see what this large sample of highly-open-to-discussing-sex-in-written-English people thinks, but there are obvious reasons to think most people's thoughts about sex are more similar to those of the median Mechanical Turk user than the median Aella poll participant.

Expand full comment

This is the core issue; the bias is meaningfully relevant to the question (in a way that the Psych 101 students example rarely is).

Expand full comment

> (obviously there are many other problems with this study, like establishing causation - let’s ignore those for now)

I agree that failure to generalize out of sample can be fine-ish if you already know or don't care about the causal model, but when I see something criticized for selection bias, it's almost always to caution against making inferences related to causation.

Expand full comment

>It doesn’t look like saying “This is an Internet survey, so it has selection bias, unlike real-life studies, which are fine.” Come on!

Do you think people will to criticize your blog, generally trust shitty psyche studies?

Expand full comment

Is the data available for download? If so, does someone have a link? Thanks in advance.

Expand full comment

What data?

Expand full comment

Aella neither has to face peer review nor the scrutiny of replication by other, unaffiliated scientists.

This feels like a major shortcoming.

Expand full comment

I either have or plan on publishing my raw data so all my peers can check my work

Expand full comment
founding

Neither of those are true, unless by "peer review" you mean only what 'official academic journals' do. And even that 'peer review' is (often) extremely weak as-is. And, AFAIK, replication is rarely performed anyways and efforts to replicate studies and experiments is something only some people have 'just' begun to push to do more frequently in many fields – and that seems to be an uphill battle in almost all cases, for many reasons.

If anything, Aella's work seems MUCH better along these dimensions than almost all academic work.

Expand full comment

This is why I pay attention to anecdotal evidence, Amazon reviews, and dietary cults. Where humans are not homogeneous, people select diets/products/etc. based upon their individual circumstances. A diet/drug/product/meditation technique may be beneficial to some and harmful to others for an average effect of zero. Even attempting a scientific study to determine higher order statistical terms is double plus expensive. Anecdotal evidence is often mixed with revealed preference.

It takes a bit of human judgment to tease out the underlying truth of such data. People will cling to a diet or ideology even if it isn't working for them. But even there, the truth leaks out. For example, the fact that most Paleo and Keto advocates tout weight training and dismiss aerobic workouts is a pretty decent indicator that such diets are not optimal for running marathons. Conversely, carbo much advocates tend to be big on aerobics, the the tiny number of vegan body builders out there are big on "superfoods."

Expand full comment

This is true. Especially, for things you can try out like a diet or other life intervention, anecdotal evidence is valuable and does make sense to pay attention to it. If you are the type of person to do well on paleo you can do real well on it.

Expand full comment

Yep.

Just keep in mind that there are literally billions of people who can maintain lowish body fat on a carbo mush diet. So much for Atkins or Taubes. Think Japan or Indonesia.

But can an Eskimo or Viking comfortably go high starch?

I'll take anecdotes over small sample controlled studies.

Expand full comment

Aella's followers seem to be men. Who are mostly heterosexual, I guess, as they like to look at pics of her with not too much clothing on - if I understand correctly - with enough text in between so even Alex Tabarrok feels ok to read her tweets. I would not see this as a reason to consider her polls un-representative. - Now can please someone provide links to those fabulous pics - instead of tweets where she argues for "my samples are likely more reliable than most published social research" - which may very well be. Just gimme those pics! Please! And maybe some links to her legendary polls. ;)

Expand full comment

Most of her twitter polls have 4 options, formatted something like Yes/male, No/male, Yes/female, No/female, so that at least seems reasonably controlled for.

But a lot of the data from Aella people take issue with isn't the polls, it's her massive kink survey that actually went viral on Tiktok briefly and I've definitely seen it shared on unrelated gaming discord communities where 90% of members are young women. The data there is *at minimum* as representative as a poll conducted on undergraduates.

Expand full comment

The little I've picked up by osmosis leads me to believe (a) if you're polling kinky people about kink, isn't that at least representative of kinky people? I grant that if you try and extrapolate that out to the wider population you will likely go astray (either "more people are kinky than you think" or "there are a lot of disgusting horrible practices out there") (b) the objections seem to be from kinky people going "That's not what kink means/how kinky people work/what I do when I do kink" and is the kind of small group squabble between themselves that I'd expect in such cases.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

I once asked a bunch of Ivy league students working in a physics lab to try to draw maps of the USA from memory. The results were pretty interesting.

One thing that was apparent was that the students I thought were dullest produced the best maps. I think this is best explained by Berkson's paradox--smart students don't need as good a memory to get into Ivy League schools.

I worry about correlation in SSC surveys.

Expand full comment

> But real studies by professional scientists don’t have selection bias, because . . . sorry

Because they do have selection bias. Which is why psychology (most of it) cannot be trusted as a science. It doesn't get a pass.

Expand full comment

Instead of talking about Selection Bias in the abstract, as many commenters have, why not speak about it in the particular instance? Let's agree that there is some degree of selection bias at play in the correlations drawn from internet surveys of certain groups, what now? Should we totally throw out the results? That doesn't seem right if we're genuinely interested in learning something. I wholeheartedly agree with the Aella tweet that too many people on the internet use some of these biases as easy ways to dismiss research they don't like. Open up r/science on reddit and you can see countless examples of this, even examples where the researchers are accused of not controlling for things they specifically did control for. Similarly, in this comment section you can see people troubled by the selection bias in SSC surveys while simultaneously using faulty logic and personal anecdotes to make causal claims.

On a more meta-level, I think whenever statistical techniques are used to draw inference one either has to be very careful and specific about their conclusion (as Scott demonstrates in another comment) or one opens their analysis up to some methodological criticism. Unfortunately, particularly on the internet, there's little to no effort on the part of the criticizer to demonstrate that the cited bias (OVB, SB, etc.) is actually important here. Instead, they can merely claim it exists, write down some plausibly true example of it playing out and call it a day.

I'm not sure where Andrew Gelman would land on all of this, but to me this whole thing is propagated by the NHST framework that encourages a binary classification of research as either being good or bad. Lastly, I'll just say that despite years of studying statistics/econometrics I still find colliders hard to think about and that makes me mad.

Expand full comment

Do people selectively apply selection bias to Aella tweets? Or do her most engagable tweets draw in normies who consistently dismiss *all* of her "research"? For the average Twitter user, and probably person in general, you probably will do better off-handedly dismissing anything someone like Aella says vs taking any or all of it seriously.

Expand full comment

What makes you so confident in your last statement "you probably will do better off-handedly dismissing anything someone like Aella says vs taking any or all of it seriously"? Do we even need to take such a strong stance on the on the Aella surveys, where we either totally dismiss the results or totally accept them as true?

I'd imagine there are both people who selectively apply selection bias critiques to Aella tweets and also normies who see only her most engageable tweets and generally dismiss all her research. Clearly the first group of people is wrong to selectively apply methodological critiques to people/research they don't like, this is a form of the ingroup/outgroup thing that's been written about on SSC before. Is the second group of people correct in consistently dismissing everything from a certain person/field? To me it seems like even if the answer is yes, there often isn't enough evidence presented by the dismissers

Expand full comment

I know you know what a heuristic is.

Expand full comment

Aren't most studies by (respectable) academics these days done on Mechanical Turkers? Which is sure a skewed sample, but probably less skewed than say Cornell undergrads or Aella's twitter followers.

Expand full comment

Is there any data about this transformation in how respondents are found?

Expand full comment
Jan 9, 2023·edited Jan 9, 2023

Less than undergrads, sure. But less than Aella's twitter followers? How much does Mechanical turk pays?

"A 2018 academic study analyzed 3.8m tasks completed by 2,676 workers on MTurk and found that average earnings through the platform amounted to $2 per hour. Only 4% of all workers earned more than the federal minimum wage of $7.25/hour."

You'll find nearly nobody there that holds a job that pays over minimum wage.

Expand full comment

bet against.

Expand full comment

Perhaps banana eating is more popular at highly competitive high schools and geographic diversity criteria make a higher IQ necessary to be admitted from them.

Expand full comment

Nice rejoinder. I think of this as one of the Wikipedia memes, the argument conventions that come from people steeped in Wikipedia (or similar collaborative efforts) to the point that assorted WP rules seem like the only natural way to put rules on argument. So you get this one, and the wild overuse of "correlation is not causation" and assorted other logi-slogans, plus the belief that adding a citation to anything you say necessarily increases its logical force tenfold.

Arguably it's all a reason to restore the study of rhetoric to greater prominence in general education.

Expand full comment

It would be unfortunate if people who understand that causation is inferred from correlations where certain conditions obtain were to use that to defend blatantly illegitimate inferences of causation from correlation. Fortunately, in my experience, a person educated enough to do the former usually isn't prone to the latter. But, I guess I can imagine a person getting a lot of online flack for a poor inference of causation from correlation trying to defend it by going on a philosophical rant about how sometimes it is justified to infer causation from correlations and that short-hand slogan is an oversimplification.

Expand full comment

Sure, I could imagine all kinds of psychological rat's nests underlying why people say what they do. As it happens, though, I couldn't care less about any of that, so I never waste my time in psychoanalyzing why people say what they do. Seems like a weird and grubby kind of undertaking anyway, some distasteful cross between a taste for ad hominem and voyeurism. All I care about is whether what people say is true and reasonable or not.

Expand full comment
Dec 29, 2022·edited Dec 29, 2022

The more wordy, sophisticated version of admonishing "correlation is not causation" is that "one should not infer causation from correlation unless a suite of other conditions obtain, which does not appear to be the case here."

I think a lot of people saying the former very well know they mean the latter. Occasionally, you do see people ape them and not understand that correlations are evidence of causation in specific circumstances and misapply the slogan.

If someone were to take the existence of those more naïve people and attack their unsophisticated sloganeering with a more nuanced understanding to brush aside legitimate complaints that they are making a poor inference of causation from a mere correlation, I think you can criticize them for being unreasonable without having to bother "psychoanalyzing" them. We merely should know that people defending their poor ideas might be apt to do that because it a path of lesser resistance.

Expand full comment

Alas, I don't believe in your underlying assumption that there exists two distinct classes of people, easy to tell apart, one of naive idiots who use the phrase without understanding what it means, and one much more sophisticated that never uses it as a bullshit objection to an inference they don't like for personal reasons.

My experience is actually in the opposite direction: I find that well-read well-educated people are more likely to bullshit themselves and others about their reasons for disagreeing with stuff. The normie will just say "I don't like that" but the sophist will lay out many paragraphs of sophistry to conceal an essentially identical personal origin for his issue.

Expand full comment

This isn't my underlying assumption. All I suggested was that when people say "correlation isn't causation" as an objection to some causal inference on the basis of a correlation they often understand what that means in its sophisticated sense and the existence people who do not doesn't change that. I further suggested, as an analogy for the discussion going on here, that attacking the more naive understanding of that slogan is a rather poor way of addressing these kind of objections.

Expand full comment

There are also reasons to distrust surveys generally that have nothing to do with selection bias.

https://carcinisation.com/2020/12/11/survey-chicken/

> Comprehension is difficult enough in actual conversation, when mutual comprehension is a shared goal. Often people think they are talking about the same thing, and then find out that they meant two completely different things. A failure of comprehension can be discovered and repaired in conversation, can even be repaired as the reading of a text progresses, but it cannot be repaired in survey-taking. Data will be produced, whether they reflect the comprehension of a shared reality or not.

(And yes, I think we should somewhat distrust professionally done research too,)

Expand full comment

Look, sometimes you want to say "consider this way your data may be biased."

That doesn't mean "your data is trash, we can learn literally nothing from your trash contaminated data. sit in the corner and feel bad." it means "consider this way your data may be biased."

If a political poll gets retweeted by e.g. Contrapoints and no other big bluechecks, and so 70% of the voters are Contrapoints followers, that is really worth mentioning while people are trying to derive meaning from the poll!

Expand full comment

Imagine AOC and Ben Shapiro putting up the same Twitter poll. I think we can all agree they would get wildly different results.

Now imagine a person like Aella putting up a poll on sex frequency or kinks or polyamory and then like, a megachurch pastor. Selection bias...

I dunno how people can pretend this isn't an issue.

Expand full comment

Selection bias can be fatal to polls, but like many poisons it is all a matter of degree. How much selection bias? What kind of selection bias? The kind that has a big effect on the kind of question being polled for? What measures have been taken to minimize the effect of any selection bias in the poll. Since there's always selection bias, these are the important questions. There's a whole technology of minimizing effects of selection bias in polls and it works, but it's not always used because the purpose of many polls is to support a result rather than detect it.

With correlations, it's actually the same. It matters to the degree it affects the findings. The questions are, is the selection bias relevant to the conclusion, how much the selection has been biased, what has been done to account for selection bias. Yes, this posting does a good job in saying that the there's a different relation between a poll, which attempts to detect things like what a population believes, and a correlation study which attempts to detect relationships between characteristics and perhaps generalize from them. But this cannot mean, selection bias isn't important in correlation studies. It only means, it plays a different role. If a sample is very biased with regard to the matter under study is going to distort the result. And, in a good correlation study measures would be taken to account for the inevitable selection bias in all studies, as well as to the extent feasible, minimizing selection bias. But as with polling, its all a matter of degree.

As the posting convincingly points out, selection bias is always present to some degree in both these areas, polling and studying correlations. I'm not sure what is gained by trying to say that selection bias per se is a big deal in polls but not a deal at all with correlations, except to create a false dichotomy in support of ignoring some selection bias and overvaluing other selection bias.

Expand full comment

information is information. just modulate how you take it based on stuff like selection bias and sample size

Expand full comment
founding

Well, yes, of course! :)

But this post seems aimed at those that refuse to do so and want to (or attempt to) justify that decision.

Expand full comment

I wold recommend a deeper research before asserting and opinion. This is math related and in that field there are no room for opinion; hypotheses, yes. A reading I recommend as starting is the book Seeing through Statistics.

Expand full comment
founding

There are no simple (e.g. unarguable) isomorphisms between mathematical theories and 'real world ontologies' so your point seems very unrelated to the topic of this post.

Expand full comment

It kinda varies. When aella does an "imagine a random number generator study", probably some selection bias but no worse than academic studies. When she tries finding correlations by something like "do you think abortion should be legal | do you support bestiality", her audience bias is much, much worse.

Expand full comment

The alternative I'd compare it to is something like how opinion polling is done, where they put a major effort into getting demographically representative population samples and/or weight the final result proportionately. Obviously there's some debate about the best way to do this, but the general technique is accepted

Expand full comment

Stop Confounding Yourself <-> Selection Bias Is A Fact Of Life <-> On Bounded Distrust

Probably forgetting some other related posts, like anything which references the Elderly Hispanic Woman Effect. Yes, I remember that bit. Anyway, sorta feels like there's a more general principle that ought to cover all these cases, without going *too* meta (e.g. Knowing About Biases Can Hurt You, which of course is on LW).

Also, real-life studies suffer from selection bias - they disproportionately exclude Very Online people who'd never see a physical bulletin board. (Does it count if a real-life study recruits people using the internet, or vice versa? Maybe that's the secret!)

Expand full comment

This is a horribly take. She isn't just hampered by selection biases, she's actively engaging with people so as to seek certain results. It's bunk science methodology that politicians and corporations might use to discredit narratives. If you want to do sociology just do sociology, don't pretend you can generalize to all humans because EvErY oNe ElSe Is DoInG iT! lol

Expand full comment

You cannot eliminate selection bias completely (unless you have literally all people in a database, and you can force the random selection to take your test), but depending on how you design the study, the bias can be smaller or greater.

I think the argument is that on the scale from smaller selection bias to larger selection bias, Aella is not even trying.

Expand full comment

Imagine if Scott wrote this post but didn't mention Aella. I think both he and Aella would have been better off.

Expand full comment

As a point of reference, I didn't know who Aella is so I didn't notice the reference to her, and as a result my reaction to the whole piece was "Scott seems to be mad about something but I have no clue what it is."

Expand full comment

I haven't read anything by anyone named Aella, but I have to say I'm impressed with the hostility towards her seen here. What does she say that angers so many?

Expand full comment

Hostility? People saying she does sloppy work isn't hostility.

Expand full comment

Correct. The hostility comes in how they say so. I can say "that's a sloppy understanding of the word 'hostility' in this context" considerately, kindly, regretfully, cheerfully, mockingly, with hostility, with contempt, with respect, and many other things besides.

Expand full comment

I read “this internet study has selection bias” as “some subset of users are likely gaming your survey to produce amusing results.” Any system that doesn’t have robust anti-trolling systems in place is open to “Boaty McBoatface” brigade attacks or script kiddies. Is this an actual problem in your results? Given the way your surveys work I’d guess not but I think Aella’s Twitter poll format is more vulnerable.

Expand full comment

Would this post: "Will people look back and say this is where ACX jumped the shark? Let's do a poll." meet the 2 of 3 criterion?

I find it useful to frequently come back to W. E. Deming's important paper: On Probability as a Basis for Action, The American Statistician, November 1975, Vol. 29, No. 4

https://deming.org/wp-content/uploads/2020/06/On-Probability-As-a-Basis-For-Action-1975.pdf

Expand full comment

Your are missing collider bias where the selection mechanism induces a correlation. The classic example is the correlation between good looks and acting ability among Hollywood actors. You only get to be a Hollywood actor if you are good looking or a good actor, so we don't see any bad looking bad actors and there is a negative correlation between ability and looks which might not stand in the general population.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

Imagine a medieval peasant hearing that people will get obese because of poverty.

A few years ago, I lost 15 kg when my bank accounts were blocked and I had only to use some of cash to buy food for some months.

btw, how do you pronounce 'Aella'?

Expand full comment

I've seen Aella make the claim that "Aella's audience that responds to Aella's surveys" is pretty close to equivalent compared to other populations, but I'm not sure I buy it - "very online > twitter > rat adjacent > Aella" is a pretty strong filter; I'd expect among other things the normal skew towards "more autistic than most groups" that you end up seeing when you survey most rat populations. Ditto "is likely to be pretty accepting of a wide variety of sex stuff" and similar.

She claims this isn't a problem, but the *way* she claims this bothers me a bit. Here's two quotes from her main article on this (https://aella.substack.com/p/you-dont-need-a-perfectly-random):

***And key to this, I can see how their responses differ. I have a pretty good grasp on the degree to which “people who follow me” is a unique demographic. And surprise - for most (though admittedly not all!) things I measure (mostly sex stuff, which is the majority of my focus), they’re very similar to other sources.***

***I also am really familiar with my twitter follower demographics, so I can anticipate when stuff might be confounded or warped due to selection bias.***

These kinds of statements are essentially her asking me to trust her, but my general impression of Aella is that she's extremely eager to prove that most out-there sex stuff is very, very healthy and good and you should definitely be weird topical sex/relationship outlier X. I don't particularly trust her to be perfect at factoring this out.

I don't think this is unique to Aella - basically everything I've said here replicates in my views on, say, most diet studies/surveys, or any survey I see about some overton-window friendly marginalized group. I'm in the rarer "surveys in general are trash" group of people.

I think that gets worse when you start to get into the *kind of stuff Aella asks about*. Most people don't care a ton about how many bananas they eat - it's sort of a factual thing. And you can test them for IQ, so their bias doesn't enter into that part of it as much (or at least doesnt' have to). But Aella asks questions that often boil down to the general sphere of "Is Polyamory great, and should everyone do it?" questions that you'd expect to interact a lot with people's self-worth and tribalism.

To put that another way, I suspect that Aella's following is highly:

1. Autistic

2. Attracted to Aella specifically and trying to get her attention

3. Sexually liberal

And she asks a lot of questions that interact with that. I *do* expect that someone following Aella is more likely to want to impress her than most, and that they aren't unaware that she's incredibly pro-weird-sex-stuff. I *do* expect they are more autistic than most and tend to approach potentially disturbing/provocative questions more analytically than most. I do expect there's a greater amount of people who would be reluctant to report that their experiences with weird sex/relationship thing X have been negative because it would be letting other-tribe have an opportunity to count coup.

Again, this isn't unique to her. But her being the example at hand is a way for me to talk about how much I distrust surveys in general.

Expand full comment

Excellent points. I would add that if you were the hypothetical person who tried polyamory for a while and found that it really was a problem for you, you probably would stop hanging around blogs saying how great it is.

Further, we would be curious if the turnover rate in the community. If it takes say 2-3 years for most people to decide poly is just super bad for them, and people try it out at a steady rate, roughly the same rate as drop out, it is easy to have nearly a 100% “I hate poly as it turns out” rate but also have a large and steady number of people responding “poly is fine!” on a blog about how poly is fine. You are just surveying people who haven’t hit the 3 year burn out point yet, with new fodder replacing the outgoing people who leave the blog when they lose interest in poly.

Expand full comment

Very reasonable writeup.

I don't see a couple of real issues being addressed:

1) Structural biases due to the differences in social and/or economic class between the average online user vs. the overall population, and

2) Structural biases due to the web sites/email lists involved.

The former is an issue because the significantly more wealthy/educated average online user compared to the overall population has impact. Income differentials introduce large skews in health, in political views, etc etc.

The latter is an issue because no web site or mass emailing is likely to be random even above the inherent online vs. overall skew. Just as the average Fox online viewer is different than the average CNN online viewer - every web site has a largely self-selected population of like thinkers. Email lists also derive from something, somewhere and are just as likely to contain inherent structural biases.

This doesn't invalidate your main points but the different types of subtle structural fingers on the scales are very potentially problematic.

Expand full comment
Dec 28, 2022·edited Dec 28, 2022

Maybe it's me, but I don't recall having seen many criticism of Aella or Scott's polls that don't suggest any mechanism why the selection bias could be affecting the results. And I wouldn't expect tweets to flesh out every argument in the mind of whoever wrote it.

But yeah, I get that published psych papers deserve more scrutiny and it can be frustrating, and also no information should be fully discounted because one can think of possible bias.

Expand full comment

...so is this article an argument that everyone needs to specify "relevant selection bias" instead of just "selection bias"? Would that extra word satisfy the complaint?

Expand full comment

Re-read the last sentence. It’s that “selection bias” is not a reason to dismiss a survey. All surveys have selection bias. The difference is how you deal with it or what you are trying to conclude.

Expand full comment

Thing is the questions Aella asks really are polls, so the answers she's going to get are going to be subject to selection bias. And tbh, I'd expect Aella followers to be a particularly unrepresentative group because Aella herself is deeply weird.

But to be fair to Aella she's smart, and I'm sure she realises that the results from her survey questions don't generalise to wider society. Probably the response she''d get from your average normie is "why the hell should I care about this dumb hypothetical scenario?"

Expand full comment

> Sometimes the scientists will get really into cross-cultural research, and retest their hypothesis on various primitive tribes - in which case their population will be selected for the primitive tribes that don’t murder scientists who try to study them.

Given how (relatively) frequently scientists try this, and how few people live in primitive tribes, does that mean some primitive tribes are spending a significant part of their day-to-day life responding to scientific surveys?

Expand full comment

Selection bias is correlated with the topic of the online survey. When you post a survey online it gets passed around people with an interest in the topic you're surveying. If that banana-IQ survey gets passed around a forum dedicated to the banana-IQ hypothesis and populated by people who are going to give 110% on the IQ test because they care a lot about the topic you have a problem with the bias that a static group selection will never have.

This is actually great if you want to say, find out what the other beliefs of the banana-IQ believers are, but you can't test the banana-IQ hypothesis that way.

Expand full comment

"I think these people are operating off some model where amateur surveys necessarily have selection bias, because they only capture the survey-maker’s Twitter followers, or blog readers, or some other weird highly-selected snapshot of the Internet-using public. But real studies by professional scientists don’t have selection bias, because . . . sorry, I don’t know how their model would end this sentence."

Certainly there are many people who are inconsistent on this, but "it's fine because academic psychology does it" is only valid if academic psychology is actually fine. As a wise man once argued in "The Control Group Is Out Of Control" (https://slatestarcodex.com/2014/04/28/the-control-group-is-out-of-control/), academic psychology is not fine, and its methods and epistemics don't even suffice to disprove psi powers.

Even when used well, the method of running surveys to uncover psychological truths is generally pretty weak, and it is extraordinarily hard to use well. Most attempts produce noise and nonsense. This is as true of Twitter surveys as it is of academic surveys.

Expand full comment

Here is a study that successfully replicated studies based on convenience surveys, using representative sampling: https://www.pnas.org/doi/full/10.1073/pnas.1808083115.

Expand full comment

I think this is the first time I’m sure Scott is completely wrong about something.

Aella polls are completely and totally useless. If you take a sample of psych undergraduates, your result will be biased. But there’s a lot of diversity within that population of psych undergraduates. The results may not generalize, but if something is true of that population there’s good reason to think it may be true of people in general (obviously more representative samples are better).

In contrast, the type of people who follow and engage with Aella are nearly a distinct population. Her content is directed to such a strange consortium of techno-optimists/crypto people/sex workers/inter-sectionalists/etc that none of her poll results have any validity whatsoever. The type of person who regularly answers an Aella poll is simply built different, they are not representative of any population except themselves.

You can’t just throw your hands up in the air and say “Well, everything is biased anyway who knows what the truth is.” If I take a survey in front of the Hershey’s chocolate factory asking people what their favorite candy is, it is a shit survey. If you don’t try to control bias you might as well not bother doing a study.

Expand full comment

And you are deriving this from the 1% of people who comment on her work or based on what all of her followers tweet about?

Expand full comment

Thx, Kevin. Tough, I don't share your opinion. After all how can you tell there is not any bias? My comment was pointing to the fact that correlations are hard to detect but there will always apear.

Expand full comment