I can’t figure out how to share LM Arena answers without signing in but you’re welcome to replicate it. You can try to add your last comment at the end in case it makes a difference.
“A: I just don’t get how any EAs can believe in AI X-risk while simultaneously having the lived experience of being an EA. Afaict all their doom scenarios involve the computer gaining power by manipulating humans to do its bidding. But they, more than anyone else, should know that intelligence is not some skeleton key to influencing behavior. They’ve got like a 1.5 std IQ advantage and still smack into the wall peoples’ banal preferences and marginal self interests. Do they think that if they could just get to 6 std it would unlock the magic words or 11D chess moves to make us all start donating to Shrimp?
B: I think you’re imagining the type of persuasion that looks like decontextualized propositions that are backed by facts and logic. Like the AIs will write careful autistically accurate substack arguments for why you should give them more resources. I don’t think this is a realistic threat model.
In reality, I expect most AI persuasion to involve a lot of lying, see eg the Reddit CMV experiments except much better:
science.org/content/art…
A: The opposite. I’m imaging persuasion to mean actually changing human behavior as it exists within real systems. That is what’s hard. It’s easy to just get someone to agree or disagree in the abstract but ppl mainly act according to their incentives.
I’ve read the CMV paper and looked at the actual raw examples. Its the boiler plate counter arguments for whatever goofy position the OP has taken or lying. I don’t think there’s that much there other than most ppl have literally never heard an even moderately strong form of the other side and are not used to being lied to in that context. People adapt quickly.
B: A lot of the CMV stuff involved specifically lying about relevant anecdotes “As a sexual assault surviver”, “as a football coach”, etc.
“I’m imaging persuasion to mean actually changing human behavior as it exists within real systems. That is what’s hard. It’s easy to just get someone to agree or disagree in the abstract but ppl mainly act according to their incentives.“
Why do you think it’d be hard for AIs to come up with actions that are in line with ppl’s incentives? “Send this package for me in return for $1000 in bitcoin” is a much easier sell than “donate $1000 to shrimp welfare in return for being laughed at by half the internet”
A: Paying for certain things is totally in the normal space of behavior usually its mutually beneficial and thats fine. All the doomer scenarios involve something far more drastic. Systems and people are shockingly rigid, and to take this back to EA adjacent stuff, if money alone could buy outcomes the YIMBY movement would be way more successful.
B: “All the doomer scenarios involve something far more drastic.” Why do you believe this?
A: Because they require that somewhere in the chain there is a massive deception or miss judgement on whats in the individuals or societies best interest. Im not concerned about the AIs doing normal commerce. That is normal and mutually beneficial
B: Step-by-step, being paid to (eg) exfiltrate an AIs weights to a remote server by an AI is no different than being paid to exfiltrate an AI’s weights to a different country a geopolitical adversary, being paid to send an unmarked supervirus package is no different than being paid to send an unmarked mundane package, and so forth.
Who do you think is more accurate here?”