if you pretrain an absurdly small (.47M) language model on nothing but three of tim williamson’s books, it looks vaguely self-aware…