opus 4.6 does still struggle a decent amount with research 'creativity'
have been running my own claude-powered rl research loop (~@karpathy autoresearch stuff) for the past 3 weeks and the one thing I have to continue nudging for is for it to try more exotic things
no amount of subagents or markdown seems be enough to pull myself out of that part of the loop
for now, I've settled on:
1. [weekend] align on a research plan (scope/budget, 'lets explore these 3 abstract questions, you get $1k of firework ai credit')
2. [week] completely hands off, let it loop, run the plan and spin off sub-plans to get as much empirical research signal as possible, converting compute into signal and hillclimbing a scoring metric
3. [weekend] take all the logs/reports from the week and align on a new research plan thats about a weeks worth of looping and $1k worth of gpu compute
aligns with my building thought that ai is best for taking 'larger leaps', pushing human effort to higher and higher level loops