this week I claudecoded a game I have been thinking about for awhile where you play as a new emperor and freely chat with a dozen ai npc courtiers that all have their own personalities, goals, factions, and so forth, with a couple loyalists, a few more traitors, and the rest persuadable either way. every one is an isolated agent with its own memories and compaction, you debate and act in open court and then after each day phase they can proactively chat with each other behind the scenes to plot and coordinate. the player can use various concrete mechanics like promotion/demotion/execution and ordering them around that affect their opinions and some stats about the empire itself etc etc. but the core game is every npc as a thinking loop that represents a live actor that behaves in its own unique and dynamic way
the game itself is basically functional. like all ai code I’ve genned, it's sloppy and does like 80% of the job but comes out remarkably quickly. I can't really complain and can see how bug and feature whackamole might make it feel reasonably polished
unfortunately the whole concept was a wash. as npcs the ais are just completely useless. the personality and memory can give them varied flavor and make them coherent over 10+ compactions with a fairly humanlike specific near-term and general long-term awareness of everything that happened. but they have no spark. they don’t do anything interesting. it never feels like talking to a courtier, it feels like talking to a helpful assistant wearing a courtier’s hat. instead of listening to 12 guys it feels like one guy making 12 different joke voices to disambiguate who he's supposed to be speaking as at each turn in the story he's telling. and his story sucks
it reminds me of moltbook. there was that one day where everyone was like “holy shit! the ais are using social media to express themselves!!” but after you read enough of the posts you realize it's ais producing text that would plausibly be written by ais using social media to express themselves which sounds like a philosophical quibble but it's really not because it is just a microcosm of everything they still won't meaningfully be able to do even with another oom of scale. the fact that the quality keeps improving yet remains recognizable as pantomime means we are improving on certain axes rather than generally
prose is a useful proxy for code because reviewing ten thousand lines of llm code is much harder than reading a hundred thousand words of llm prose. I think their failure to write points to a probable inability to design and maintain large software systems at the scale where creativity starts to actually matter. if code is purely mechanical then style and craft don't matter but I don't think this is true. the idea you just need to maintain a spec describing all desired behavior and translate the spec into code and use a formal methods magic wand to make it Correct is naive. there is still intent and judgement and horse sense required for all this. and anyway these specs would be so long people would be trying to use ai to write and edit them anyway and then the ai problem shows up there as well
the argument about whether llms can fully replace programmers comes down to whether you think you need a soul to program well and a lot of people think you don't but I believe you do