The app for independent voices

Here’s a little note about what’s on my mind: multiple comparisons! I think this is a potentially research-ruining problem where scientists have absolutely no consensus on what to do - and they often don’t even think about it at all.

Recently I’ve written about studies of the effect of social media, environmental toxins, and fasting. In each case I’ve looked through the tables and found tons of analyses. Many outcomes, in many models, with many different predictors in different combinations.

As you know, the more analyses you do (at least if you’re analysing them the standard “statistical significance” way), the more likely you are to find a false-positive result.

There are ways of dealing with this: you can use a different threshold for statistical significance (say, p < .005 instead of p < .05). Or you can apply a correction to your analyses.

But hardly anyone does this! That’s a big enough problem in itself, because it means loads of false-positives are lying in wait across the scientific literature. Often if you did correct for multiple comparisons, papers that claim to have found a statistically significant result (by their own lights) really haven’t.

But it’s worse than that: when scientists do correct for multiple comparisons, they often do it inconsistently across a study, applying it to some analyses/tables but not others and never explaining exactly why. Different studies use entirely different methods; some correct with just one method, others use several at once. It’s a mess!

The ideal thing to do would be to plan it all out beforehand: which analyses are you going to correct, with which method? But the reason hardly anyone does this is that there’s no consensus on what you should be doing and when.

You could also say “we as the reader can just adjust the p-values in our mind as we read the paper, if the paper’s authors haven’t done it themselves”. But the whole no consensus thing applies just as much here, too!

It’s made even more complicated by the actual statistics at play here. Everyone agrees some of the correction methods are too conservative if the several different models in your study are using related data. So you shouldn’t use the best-known “Bonferroni” method in this case. But there isn’t agreement on what you should be doing instead!

I think it’s pretty urgent that scientists and statisticians—and maybe scientific journals—come up with better guidelines on this - to return to the first point above, it’s really shocking how often it comes up when you read studies that are part of the current media discussion.

This problem really spoils our scientific understanding of crucial topics - and even more importantly, it makes me personally into a complete bore when I have to mention it every time I write about a study!

That’s it. I don’t have any answers here. But that’s what’s on my mind.

Apr 11, 2023
at
11:03 PM

Log in or sign up

Join the most interesting and insightful discussions.