All models were significantly better at picking the best solutions than they were at creating the best solutions. For example, Claude 3.5 Sonnet successfully completed 47% of management tasks but only 21.1% of implementation tasks. This suggests that AI might first aid engineering teams by helping with code reviews and architectural decisions before it can reliably write complex code.