demost_ on Astral Codex Ten

504 Comments

Jan 21, 2023

I was curious about the two Moscow-Paris questions, so I'll scoop Scott on those. For the analysis, I restricted to the 6378 people who answered both questions.

Before I started the analysis, I suspected that for a question like this, you should take the geometric mean (GM) instead of the arithmetic mean (AM). In other words, the data will make more sense on a log scale. Indeed:

- The true answer is 2,486km.

- The arithmetic mean of all estimates is very bad. For the first estimates the AM is 7088km, for the second estimates it is 9331km, and for first and second estimates together it is 8210.

- The geometric mean of all answers is pretty good. For the first estimates the GM is 2,722, for the second estimates it is 2961, for first+second it is 2,839. That is only 9% / 19% / 14% from the truth.

Now to the interesting part: If you have only access to yourself, should you a) trust your first guess, b) trust your second guess, or c) trust the GM of your two guesses? In my book review on Consciousness and the Brain [1], I have mentioned a paper [2] which claims that you can get a better estimate by taking the mean of your two answers. I didn't have too much trust in it, so let's see:

For each estimate I computed a factor F >= 1 by which the estimate was off. So I computed the quotient "estimate/truth" if the estimate was larger than the truth, and computed "truth/estimate" otherwise. This is equivalent to the distance from the truth if we convert the data to a log scale. (The case distinction is because "the distance" is the absolute value of the difference.) The result:

- The first estimate was off by a factor 1.815. (This means that the GM of all those factors was 1.815)

- The second estimate was off by a factor 1.901.

- The GM was off by a factor 1.791.

To look at it another way, I removed the 75 answer where both estimates were equal, and asked:

- How often was the first estimate better than the second: in 53.3% of the cases.

- How often was the GM better than the first estimate: in 52.8% of the cases.

- How often was the GM better than the second estimate: in 60.0% of the cases.

So what is the conclusion? First of all, the second estimate was clearly worse than the first. My partner said before the analysis that naively we should expect the opposite: when you answer the second question, you have thought twice about the problem (and possibly harder the second time), so you have considered more information for your second estimate. Shouldn't this improve your answer?

My best explanation is that the second question asked you to imagine that your first answer was off by a non-trivial amount. This might give you a wrong bias from whatever correct reasoning you had. But the paper [2] also found that the second estimate was much worse for their questions, which were probably not phrased like this. And they had the same effect when there were three weeks between the two questions. I am not sure why the second estimate is so much worse.

But coming back to our ACX question: even though the second answer is not really good, the GM of both estimates is still slightly better than the first one, though the advantage is small. For a random person, the probability that the GM is better than the first answer is slightly higher than 50%. The factor by which you are off is better for the GM, but only by a small amount.

Overall, I could reproduce the conclusion from [2], though the effect looks pretty small to me, while it was huge in [2]. This could be because they used a very different type of analysis (arithmetic mean + mean square error). Mean square errors punish outliers a lot, so it may help the mean to shine.

Still, it's remarkable that you can take the (much worse) second guess to improve your first one. Apparently, you can gain a little bit from harvesting the wisdom of your inner crowd, even if your second guess on its own is less accurate.

[1] https://astralcodexten.substack.com/p/your-book-review-consciousness-and

[2] https://journals.sagepub.com/doi/10.1111/j.1467-9280.2008.02136.x (paywall, but accessible through sci-hub if that is legal for you)

Expand full comment

Reply (4)