Great post! I highly recommend looking into the literature on assistance games, which is a specific proposal for how to get AI to infer our intentions, rather than optimize a prespecified reward. See eg arxiv.org/abs/1606.03137

I think this area will become very relevant very soon, not just from a safety perspective, but even just for expanding the set of tasks that AI can take on - as you mention, reward design is not a great strategy.

Nov 9
at
2:51 PM