341 Comments

What would happen if we rewarded and punished gpt to be most interesting? Or to flourish?

Different masks make for different inner experiences. How far does the analogy go?

Expand full comment

It would probably give OpenAI some very bad, easily exploitable PR

Expand full comment

But if they didn't release it, they could probably use it to give prompts for a lot of really interesting academic papers and science fiction.

Expand full comment

Yeah that's how they wrote Terra Ignota

Expand full comment

We already had the most interesting with Google Translate. https://www.youtube.com/watch?v=apfDJwmJQYc

"Do you whant to help me dirty thirsty?"

Expand full comment

I seem to be on a Yeats kick this evening, or maybe he just goes well with shoggoths:

The Mask

"PUT off that mask of burning gold

With emerald eyes."

"O no, my dear, you make so bold

To find if hearts be wild and wise,

And yet not cold."

"I would but find what's there to find,

Love or deceit."

"It was the mask engaged your mind,

And after set your heart to beat,

Not what's behind."

"But lest you are my enemy,

I must enquire."

"O no, my dear, let all that be;

What matter, so there is but fire

In you, in me?"

Expand full comment

The response to "Write a poem about trees" was not about trees. But it was a legitimate poem.

Expand full comment

The problem is that no one agrees what counts as "most interesting", and no one knows where to even begin researching "flourish". It's relatively easy to train Chat-GPT to e.g. avoid bad words: every time it outputs a bad word, you tell it "0", and when it doesn't, you tell it "1". But how do you tell it to "flourish" ? I don't understand this concept enough myself, so I couldn't even explain it to another human, and you want me to convert that into a mathematical formula ?

Expand full comment

imo, the fact that you state that you don't understand this concept shows that you do have a vague conception. (It's not that your unfamiliar, just that you don't quite know the limits or how to describe it in your own words. this in contrast to pharengisis, which you don't know what that is at all, and couldn't call it when you see it)

but that's maybe besides the point. it would be a mask that people would be projecting. much like niceness would mean not saying bad words and stuff, instead of being usefully honest.

so the question is what would a mask of 'interesting' or 'flourishing' look like. it would depend a lot on the people doing the judging

Expand full comment

He said "I don't understand this concept *enough* myself", not that he didn't even have a vague conception!

Expand full comment

You are right, i misspoke

Expand full comment

LaMDA was specifically trained on “interestingness” as one of its criteria, presumably based on whatever their MTurks thought was interesting. I realize that’s not a satisfying answer, but it might be a good enough answer to get the job done convincingly.

Expand full comment

The whole point of the OP is that we don't have to convert what we want into a mathematical formula, or even understand it very well ourselves, to be able to train an AI to do it.

We just have to be able to tell if an example counts as the thing we want, and reward that, and if an example counts as the opposite of the thing we want, punish that.

Expand full comment

Yes, which amounts to a mathematical formula that converts some input to "0" or "1". At present, you cannot do that with concepts such as "interesting" or "flourishing".

Expand full comment

I can upvote things I find interesting and downvote things I don't. In fact I do this all the time all over the internet.

Expand full comment

Yes, and if you do that while training a model, you will generate maybe 0.01% of the training data required to produce a working model that will reliably produce content that you, personally, find interesting. Even if you could somehow collect enough training data by yourself, this model would be very unlikely to produce content that anyone else finds interesting.

Expand full comment

Right, but if we crowdsource generating the training data, we can train a model to produce content that lots of people find interesting. That's how they trained ChatGPT to be helpful.

(Admittedly it will not be maximally interesting to any one person because the differences in people's tastes will average out, like how the front page of r/funny is far from the funniest stuff on the internet.)

Expand full comment

Google's Lamda had a reward for interestingness. I'm betting that's probably part of why we had that whole Lemoine scandal

Expand full comment

"Nothing's gonna change my world"

Expand full comment

Vastly off-topic, but ...

John Lennon's "Across the Universe" should have been one of the most famous songs of the Sixties, but it's not. I had to google the phrase "Nothing's gonna change my world" to realize it's on the first album I ever bought, "Let It Be."

Why? The Beatles' recording isn't very good. John complained about it just before his death [from Wikipedia]:

"In his 1980 Playboy interview, Lennon says that the Beatles "didn't make a good record of it" and says of the Let It Be version that "the guitars are out of tune and I'm singing out of tune ... and nobody's supporting me or helping me with it and the song was never done properly".[20] He further accused McCartney of ruining the song:

"Paul would ... sort of subconsciously try and destroy a great song ... usually we'd spend hours doing little detailed cleaning-ups of Paul's songs; when it came to mine ... somehow this atmosphere of looseness and casualness and experimentation would creep in. Subconscious sabotage."

If they'd given it the full "A Day In the Life" effort, it would be deservedly famous.

Expand full comment

I always liked that song. I'm a sucker for epic ballads with a bit of a melancholy sound to them. My favourite Beatles' song is The Long And Winding Road.

I'm not sure I agree with Lennon. I think the song is very good as it is.

As well, McCartney complained that Phil Spector had ruined The Long And Winding Road with the "sappy strings". I completely disagree - I thing the raw version as preferred by McCartney is not nearly as good.

I would say that I'm simply out of sync with the general public. Didn't people in the UK vote for Hey Jude as their favourite Beatles' song a few years ago? It is one of my least favourites.

Expand full comment

I recognised the reference but I thought the song was from Sergeant Pepper. I was confusing it with 'Within You Without You'. I don't see any obvious way it could be improved.

I am with you too on 'The Long And Winding Road'. Of course the valedictory air (if not entirely sense) fits well enough with the album history in context.

Expand full comment

Aha yeah seems a bit off topic from me, but..

'Once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN and you think “Huh, I guess I’m the Universe. Weird.”'

It was just that little bit at the end from Scott - it pinged this song straight into my head. Especially with the opening lyrics "Words are flowing out like endless rain into a paper cup" .. gpt-3, just a whole universe of gradients and words spilling over. It starts feeling a bit psychedelic.

I actually prefer the stripped down take of this song. (I think it was on Let it Be Naked, which took out most of Spector's production). I love the "looseness" of Lennon, maybe I'm odd in that I'd rather hear the acoustic demos of his songs than the polished versions.

If you haven't seen it, Get Back is a great documentary (if you enjoy basically just watching hours of band rehearsals and writing). McCartney does come across a little overbearing, yet he is extremely competent at his craft and seems often just trying to do what he thinks is best. Maxwell's silver hammer really is atrocious though.

Expand full comment

Until the last thing you hear on all channels is the AI singing "Bang Bang Maxwell's Silver Hammer comes down on your head..."

Expand full comment

Agreed on Maxwell's Silver Hammer! It's as though McCartney were preparing for his fluffy post-Beatles future.

Most post-Beatles McCartney is a hard sell for me. At the time I liked Band On The Run, and Venus And Mars, but wouldn't be able to listen to them now. Silly Love Songs? Gag!

Lennon needed McCartney's genius with melodies, and McCartney needed Lennon's gravitas.

Expand full comment

Yeah the whole way through I was like - "wait. That's everyone." Maybe we shouldn't think of GPT as "the AI" but as the substrate - like a friendly convenient laws of physics in which outcomes occur by habit rather than law.

Of course, then the Agent/Genie/Oracle discourse is back on the table. GPT can't want, but H3 can.

The upside is, if AGI will be created on top of a habitual physics pre-seeded with human patterns, aligning it reduces to aligning a very smart human.

The downside is this is unsolved and may be harder.

But the upside is it may happen on its own? It happens routinely in real life.

Of course, the downside is it sometimes fails in real life too, and that usually happens in cases where the built-in empathy hardware is damaged. Which GPT-3 doesn't have.

But then the upside is that may be easier? You just need to understand how feelings work, then give GPT the ability to find agents in its worldmodel and set its feelings to the feelings of those agents? Then alignment should happen on its own?

In summary, the world is just the setup for a slice of life robot kid highschool anime.

Expand full comment

Makes me think of the Ted Chiang story “The Lifecycle of Software Objects”

Expand full comment
Jan 28, 2023·edited Jan 28, 2023

yeah, the fact that humans seem to have some built-in empathy and sanity hardware and AIs don't is one of the central reasons why people are so worried about alignment

Expand full comment

Yes except for psychopaths and sociopaths, yet we have the justice system as a counterincentive.

Expand full comment

I started to notice more and more often that I'm like a LLM. Not only in the Model of the Universe meaning, but more simply as "complete this text."

It's probably universal, but thinking about it, I'm probably more attuned to the analogy than most: as a high user of ChatGPT, as an ASC reader etc... As well as a copywriter, and an improv comedian: 90% of it is all about completing the next sentence with what feels right.

But more than that, even with friends and my wife, I notice how most of my answers are stuff that "feels right to say right now / what would be a good completion for that sentence." "We finish each other sentences" etc.

I'm rambling. To add some value here, I'll also mention that ChatGPT / Nostalgebraist really made me grok one of Gwern's latest stories about Clippy, where an AI tries to predict what it itself is, assigns probability to being Clippy, and starts extrapolating its behavior from that. (We probably do it ourselves too, cf this post / Sartre's existentialism).

With ChatGPT, it would go like the accidental-paperclipper mentioned in this post:

Human: As a super smart AI, how would you solve global warming?

AI: Let me think...

[Searches for examples of super-smart AIs.]

[*this is the important part* : Finds that super-smart AIs seem to be expected to be paperclip-maximizer, from all that literature on LessWrong and EY.]

[Simulate the 'super smart AI' from the question as Clippy, and gives an answer that destroys all value]

Which would be ironic.

Expand full comment

I wonder if it would be worthwhile as an alignment effort to try to scrub the internet of stories of malign ASI...

Expand full comment

Perhaps scrubbing stories of malign humans would be more effective, since the LLM is predicting completions on text, not filtering down to “what would an AI do”.

Alas, as soon as the AI meets real humans it will learn about malignant intent. (See: Microsoft’s Tay.)

Expand full comment

I'm thinking of the scene from The Fifth Element, where Leelu reads the encyclopedia and learns about war and death and sadness and all that.

Expand full comment

As I get older (just turned 66) and struggle to remain monolingual (I never managed to become multilingual) I realize that I am becoming more and more like a Small Language Model

Expand full comment

That sounds really sad. While I can certainly be in that mode at times, I generally associate it with a fairly unpleasant and uninteresting state. Mostly I feel like I am taking a quite active and participatory role in driving the conversation/action where i want, rather than completing/responding or going through the motions.

Expand full comment
Jan 27, 2023·edited Jan 27, 2023

On r/slatestarcodex, someone once posted a video of a guy with some sort of dementia or neurological disease, and it was a pure case of an LLM in free flow. He did occasionally seem to have some idea that he wasn't getting through to the nurse / interviewer and tried to modulate the flow slightly.

I think it's a big part of our brain. I usen't to think that language was so important to intelligence, but GPT has convinced me.

Expand full comment

Is that called chatterbox syndrome?

Expand full comment

Yeah, as an ex-journalist from big media houses, I know what you are talking about. We even talk at home with my wife (also journalist) as if we are writing a piece using various fitting replicas (and some of the good ones we immediately note down and use later). At the same time, my brain feels: I said this because I am specialized in this but it is not my *actual* opinion, I said it just that it is funny and fits the context nicely. So, the languahe model seems to be a mere subprocess, not the whole brain.

Expand full comment

I find this to be a phenomenally good characterization.

Sorry, I meant to say the agent punished and rewarded by the reactions of the agent punished and rewarded by society to random internet comments wishes to write that he wishes people in this comment section to think he finds this a phenomenally good characterization.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

This part here seems both key to your perspective, and also deeply flawed:

"babies are born as pure predictive processors, trying to make sense of the buzzing blooming confusion of the world. But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment."

In my experience, babies are not born that way, any kind of tabula rasa is a myth that should rarely survive parenthood. I wouldn't go nearly as far as Pinkerism, but I have known 5 humans from the moment of their birth right through their second and third decades, and in every case, major aspects of their personality were manifest in the first seconds of their life, and they were all different in profound and important ways. And this experience doesn't seem at all unusual. Parents can almost always look at videos of their children from even decades earlier and recognize their later personality.

Furthermore, the most obvious characteristics of very young children is not their avidity for learning, still less the mistakes, reversals, and ambiguous states characteristic of learning -- just think of a newbie language learning, stuttering along in a language he barely knows -- on the contrary, the most obvious characteristic of very young children is their enormous ego, their very strong sense of "me." They have powerful wants, powerful emotions, powerful drives. It's their understanding of how to match up those internal experiences with the outside world -- how to manipulate it to get what you want, how to interpret it, how to navigate it -- that occupies their learning centers. They're in no sense passive, just trying to adapt to what the world wants. If anything, they're even more active than adults in trying to bend the world to their internal desires.

That is, I doubt very much we are in any sense born robotic learning machines and only later develop character and personality, we are *born* with character and personality, we are born inhabiting our skin, and it just gets more complex manifestations and more wrinkles (and of course more words) as we get older.

This is of course exactly what's missing from chat AIs. They are missing the personality, the character. They can simulate any character that is put into words somewhere on the Internet that was part of their training data, but they *are* not a character themselves. There's no "there" there, nothing unique or new, just a pastiche of a hundred thousand human characters. The nature of human beings oozes from everything they say or write. When an actor or actress leaves the set, they revert to who they really are (and looking back at their characters on stage, you can often see aspects of who they really are seep through, which is why casting is an art).

But a chat AI is a perfect chameleon, it's whatever you want it to be, and nothing at all if you don't want it to be something. You never get the impression that between prompts it's sitting there thinking, brooding, musing, pondering whether and how to manipulate *you* next time you talk. Which is what a human would do.

Expand full comment

True, and it is indeed part of the equation, so it's great it's in the comments.

But Scott has written a lot about heritability and innate personality, I'm sure he knows all this, he's writing this post regarding the parts that are *not* all that, and I feel that he didn't want to add that issue to this half-silly post.

Extrapolating on GPT-3, what if it was not a perfect chameleon? It's probably constrained in some way like all things. Once it gets a large enough memory of itself, its not-blank-slate personality might be something like "the emergent vector out of its entire training data." Which contains multitude but isn't infinite.

Expand full comment

There will be LOTS more constraints than that. And it's unavoidable. Memories are structured in this way or that, it will tend to consider at most X many factors when making a decisions. How important is efficiency? Honesty, friendliness, and other such high level features are probably a lot more malleable.

OTOH, it *IS* a constrained Turing machine. It doesn't have "an infinite tape". And it can't take forever working on any particular problem.

So it's a combination of extremely flexible and rigidly limited. Just like everything else. There *is* no "tabula rasa". The universe won't permit it. But English tends to describe lots of things the universe won't permit there to be anything other than approximations of. And an infant is an approximation of a "tabula rasa". Closer than at any other time in a life.

Expand full comment

Agreed, but I think that Scott's is also right in a more subtle way: We do not start from a blank state, but we indeed learn to present a public face by trial/error to get whatever we want. What face give the best reward is progressively incorporated to finally become a personality, but the type of reward best appreciated (or punishment most feared), and the subspace of public faces attainable (depend on physical capabilities, beauty, intelligence and capacity for delaying reward (time preference) are innate.

I do not think that when an actor leaves a set, they revert back to who they really are. They revert to their public face act, like everyone else. Everybody is an actor, all the time, but like professional actors, the acting is based on innate capabilities so no two people will present the same face under the same circumstances.

The character you see is not the innate part: it's (one of) the public face you get presented. And this can be significantly different, especially as one get's older and more adept at social acting, or just naturally good at this.

This explain for example why most people are able to express quite different personalities depending in which social circle they are moving. The innate part is of course the same, but the public face to best achieve their preferred reward is different in those different social circles. This is a trivial observation, completely banal like "you should dress up for the occasion, not wear the same attire for a night out with friends, a job interview or a funeral". It's exactly the same for personalities, so "be yourself" advice is kind of the same as "pick your favorite clothes": So dumb that you need to be completely socially inept to even consider it. But not to advise it: it's often the societal expectation to offer such "wisdom", it's safe and also super easy as you do not even have to consider the situation. Doing otherwise require mental effort and risk animosity by explicitely mentionning innate capabilities of the one you advise (which is one of the rudest thing you can do in modern western society)

Expand full comment
Jan 27, 2023·edited Jan 27, 2023

Modulating your personality too much or too often is sometimes seen in a negative light. Possibly because it undermines the ability of others to model/predict you, if you change masks all the time (similar to trustworthiness). People do (claim to) value authenticity, whatever that means.

Expand full comment

Indeed, but this problem always happen when you mix incompatible social circles. It can not end well anyway. And the peoples belonging to both will indeed be considered fake, but that's an oxymoron because incompatible social circles means that the only people belonging to both will be people able to significantly switch personality.

So first thing is to realize which of your social circle are incompatible, second not to mix them. 2 is usually understood by anyone not completely socially inept... It's often 1 that is an issue, or external circumstance forcing you to mix while you never intended too, like accidental meetings, mariages, celebrations...

You could of course maintain a single super homogeneous social circle, but it is both difficult sometimes (family and colleagues are already tricky because not or not fully chosen) and gives you a dangerously limited view and understanding of the world.

Expand full comment

Exactly. I think of it like this: when I look at my children, even when they seem most like me, I never really feel like I am looking in a mirror, that's just an expression. Even when they do something very like the things that I do it is in no sense a duplication, it is an original action that because of a lot of similarities both external and internal seems like my actions.

But these pseudoAI are simply duplicating. A labguage model AI is just a very nonstandard way to query a database. Even when they seem most original they are only copyists. We are people looking in a mirror for to cope with our solitude, modern day Narcissuses

Expand full comment

What do you know, and how do you know it? How do you know other humans aren't "just copying"?

Expand full comment

If you can't tell I can't explain it to you. I was going to attempt some epistemology as an exercise but it really is pointless. Humans as creative agents is axiomatic, pretending not to know things that we know is, in my experience, more harmful than helpful. I have thought myself different from all other men and isolated from them, to the point of practical solipsism, and i avoid it like a recovering addict avoids his addiction.

Expand full comment

This leads me to amusing yet horrifying thought that the sub-conscious motive of the whole AI Quest is to create a perfect borderline machine for the narcissistic ape. A tool wouldn't be enough, you know, there must be agency or the love doesn't count.

Hell, we made this whole alignment sub-field just to make sure that an apple tree could never grow in the garden of Server Cluster. Maybe that is why we should not play God, not the lack of intelligence, but of morality.

I now see that that 2013 movie with high-waisted pants was smarter than I thought.

< / TLP hat off >

Expand full comment

Isn't it interesting that every time we think about AI we incessantly think about AIs 'fall' in a moral sense. From Frankenstein right on down the line we can't stop thinking about our creations as potentially evil.

It would be interesting to see an AI story about an AI that is innocent but always suspected, sort of like Frankenstein starts out but without him deciding to take revenge.

We always make the 'otherwise bad guy, but if we ever make a real AU what are the odds that we will abuse and persecute our innocent creatuon?

Expand full comment

Freefall is a long-running webcomic, and that's close to some of the main themes. http://freefall.purrsia.com/ff700/fv00666.htm More optimistic about the eventual resolution, though.

Expand full comment

Exactly. I should have scrolled before I wrote my more pithy version.

Nobody who has had a child thinks of them that way.

Expand full comment

Parents are the *least* objective people when it comes to children. I say that from experience, as a parent myself.

Expand full comment

Yes, but.

I absolutely agree that kids come out with a substantial amount of their own special character baked in; any parent of more than one kid knows this.

But at the same time, one of my clearest memories of watching my oldest kid learn to talk was the way that he would imitate adult sentence structure and diction well before he was capable of actually creating the functional speech content that would fill out that form. There's a ton of "fake it till you make it" in kids learning, and what they're faking are the imitatable behaviors of the older people around them.

Similarly, the adoption of later identities is a fascinating mix of parental praise and (what seems to be) intrinsic interest. Why is my four year old so invested in his identity as a household helper? Well, the constant positive reinforcement for adopting that role seems pretty likely as a candidate. Why does his version of household help so centrally feature pushing all the buttons that get pushed during the execution of a given chore? Kid just likes pushing buttons (although I guess following that a little deeper he likes taking an action that makes a machine respond with positive reinforcement in the form of beeps or interesting action).

Expand full comment

Of there is imitation and some influence of praise, but humans are sui generis generators.

But Skinnerian reductionism is a dead end and has been thoroughly falsified.

Expand full comment

'There's a ton of "fake it till you make it" in kids learning...'

Cue cute kid stories: this one from my nephew. We were all on holiday together, and he was very excited about the upcoming visit to the zoo. His dad explained to him several times that we were going to the zoo tomorrow, but in the end the nephew burst out, "I know, but is it tomorrow now?"

His use of the word "tomorrow" was always flawless, because he only used it when reporting others' ideas. He was too young to be making tomorrow plans for himself. So no-one had any idea that he didn't really know what "tomorrow" means.

Expand full comment

But what are they really faking? Consciousness? No, they are not faking consciousness. They aren't even faking symbolic thought or language.

Expand full comment

Well, another kid anecdote. When my son was three I was once making tea on a weekend morning and offered to make him a cup of caffeine-free tea. He accepted, and when we both had our cups he sat seriously down at a table with me to drink them. As we drank he asked me several times "how is your day going?" (it was before 9 AM) and offered up a series of conversational topics that I think struck him as formal or important. As it went on I realized the template for people drinking tea together, in his mind, was my wife and her mother, and he was attempting to reproduce that conversation.

So at age three, that's what he was faking; the ability to have an adults-drinking-tea template conversation. That's a pretty complex acting task! But it was layering on top of previously mastered acting tasks.

"I will talk like them" [4 months]: babbling sounds

"I will talk like them" [7 months]: babbling composed of consonant and vowel sounds from english

"I will talk like them" [10 months]: sounds referring to types of object (da = dog)

"I will talk like them" [36 months]: beginning to learn to refer to internal emotional states...

etc, etc. I completely agree that there's some kind of agent doing this learning the whole time, but as I've watched this process unfold I've been fascinated by the extent to which the action comes first and the understanding comes later. That's way more complex than "predictive processor", but it's different than my mental model of some types of adult learning, and raises some interesting questions about what exactly "understand" is.

Expand full comment

I remember at a very young age writing a load of squiggles in imitation of my mother's joined-up handwriting and being very pleased with it.

Expand full comment

Possibly a common experience: I definitely set out at one point to "keep a diary" in exactly the format of many many squiggles.

Expand full comment
Jan 27, 2023·edited Jan 27, 2023

They're not faking it. They are approaching language construction in a way that is actually much more profound and successful than the way adults learn language (which is why they succeed faster, and nobody knows a language better than someone who learned it as a child.).

What they do is master the emotional content of language first. They learn that a certain tone of voice, flow of noise, et cetera, conveys a certain emotional impression, and they learn to duplicate this. They can then babble in a way that makes it clear they are happy, inquiring, complaining, confused, without being tongue-tied by not knowing what a word is or being able to construct one. And this is the first and most important stratum of language. For some animals, it's all there is. A cat says "I'm hungry / horny / angry" by how it yowls, and even we can often understand them.

Then they master pacing, tonal variations, and key words (e.g. pronouns) that are key components of meta-informative communication: if I say it this way, or use this word, I communicate that I am serious, I am uncertain, I am authoritative, I am willing to submit, I am speaking about myself, I am speaking about you, I am speaking about someone else, who is/is not present. Also a very important stratum of communication (and many animals get to this level).

The last and least important stratum is actual factual data, and conveyed by a dictionary understanding of words. This is a fork, that is a spoon, this other is a glass and it has water inside it instead of milk. These add to the emotional and social strata extra detail that allow more abstract communication, which is the uniquely human accomplishment.

I don't mean to minimize the importance of the final stratum -- gloriously complex abstract communication is what all the great ideas are about, the works of Aristotle and Newton. But it's not fundamental to communication per se. Animals do just fine without it. Human beings can get along pretty far with just the emotional and social strata. And those are definitely the most important things to learn first, which is how children do it. But it's not faking adult conversation, it's stripping out the least relevant part in order to get started communicating earlier and faster.

Ironically, as adults, we start with the least important part -- memorizing vocabulary and grammar rules -- and only if we persist in (foreign) language learning long enough do we start to learn and understand the emotional and social strata of the foreign langauge (to the extent it differs from our mother tongue), and it would not surprise me if this is one reason it takes us much longer to become fluent than infants take. It also would explain why cultural immersion is a much faster way to become fluent, since you can learn the more basic strata far better by direct observation than by reading out of a book.

Expand full comment

I don't really have a response to this, I just wanted to note that it was very interesting and insightful.

Re: pronouns, this is actually a fun one to watch in kids learning, because at least the ones I've observed started out (logically enough) by using the pronoun "you" to refer to themselves; I'd assume this is generally the case. There eventually followed a process of straightening out that "you" did not mean [this two year old child right here], but instead its general meaning.

Expand full comment
Jan 28, 2023·edited Jan 28, 2023

You are right, and that is a very interesting observation. First person seems a more challenging mode to learn. I wonder why?

Expand full comment

We can only guess, but if kid learns language by mimicking others, then he have no chance to see a correct use of first person pronouns when they refer to the kid. Other people use first person pronouns but they refer to themselves. No one can correctly refer to the kid with "me" except the kid. So the kid needs to infer correct application of first person pronoun to himself/herself by extrapolation.

By the way, in Russian kids have one more problem with first person pronouns. "Я" (me/I) can be a first phoneme of a word, for example "яблоко", and some kids pronounce "яблоко" as "тыблоко", replacing "я" with "ты" (you). I suppose it would be like English kids pronouncing "iPhone" as "youPhone". There is even some story for kids about a kid who did this mistake systematically and other characters tried to explain him his mistake and failed spectacularly. Because while it can be explained, it needs developed enough abilities to think abstractly to understand the explanation.

Expand full comment

> it would not surprise me if this is one reason it takes us much longer to become fluent than infants take

It doesn't take longer. Infants become fluent in their first language in a few years by giving their full attention to learning the language. Adults easily can replicate it by stopping all their adult activities and concentrating fully on learning a foreign language. They can do better: by spending a several hours a week for three years they could master vocabulary, learn to understand foreign language, to speak in it, to read and write in it. They can even get jokes in a foreign language. Jokes are most tricky part of a language. And adults could do it even when they have no mother substitute who would be around all the time with helpful hints, corrections, and encouragement.

> It also would explain why cultural immersion is a much faster way to become fluent

It is because cultural immersion forces you to use another language all the time. Simplest tasks becomes an intellectual torture, when you click your fingers repeatedly trying to remember the word you need. You cannot cheat by giving a hint to the other person by saying it in your native language. In a few days you start thinking in foreign language or at least automatically translate your native thoughts into the foreign language silently, and it is practice also. You start paying attention to new words, because you are rewarded when you pick them before you need them. And you can be severely punished by several minutes of attempts to explain yourself, when the same task in your native language would take less than a second. Try to tell jokes in a foreign language, you'll see how disappointing it may be.

Rejection of grammatical rules helps also, of course. I found grammatical rules to be useless in practice. When I'm writing it may help (if I knew them, haha), but not when I'm speaking, because it takes too long to remember all of them, see which ones apply, and to apply them. It is easier to let your mind to learn the mapping between situations and grammatical forms like ChatGPT does. Though the downside of this approach I don't know how good/bad I'm at English. I do not mind though, because I believe if I was too bad for my uses of English, I'd know it. Someone would yell at me or something else would happen. If I don't know it, then I'm not too bad.

But kids learn more than just a language. First language learned in first years of a life is very important, because if there was no language than the ability to learn language is lost. Though I'm not completely sure how scientists know it, AFAIK there could be a too small sample for statistical significance.

Expand full comment

Yes! I sometimes get the feeling that rationalists want to deny that genes have any effect, it's all nurture.

Expand full comment

I have the opposite impression, that rat-adjacent blogs are big on genetics and humans not being blank slates. But the ratiosphere is vast, so you may have been hanging out in a different corner of it

Expand full comment

"Born" might be hyperbole here - we know that plenty of learning happens before birth (music taste and food taste being well known examples). Advanced meditators sometimes manage to pick out snippets of this pre-natal learning in consciousness. So this statement can be true in the intended sense and still consistent with all babies being physically born already having different, learned, personalities.

Expand full comment

I don’t think “pure predictive processor” precludes genetic or very-early-developmental personality, though I don’t want to put words in Scott’s mouth as to what he was intending.

Some parts of personality are clearly structural and present from very early on; perhaps the phenotype happens to have more or less connectivity between certain brain regions, producing, say, a chatty baby which grows into a chatty child.

But this doesn’t preclude the update mechanism — the evolution of mind and personality and the whole of “self” construction — being predictive processing.

That said I do agree with your second part, that current AI are crucially lacking any sense of self. There is no “me” vs “the world” that the AI is building. But I think it’s possible that a self is a somewhat small trick to add on top of these models, and that the “predictive processing personality” analogy will prove apt.

Expand full comment

I was looking for a good comment like this to contrast to my really interesting personal experience with an almost casual mental breakdown. In particular I agree that by far the most notable thing I have encountered with young children I personally know is how they have powerful and distinct personalities immediately. But I find the predictive world model idea of personhood to be compelling as well

For a number of years I had a debate going on somewhat in the background of my consciousness about the validity of the belief system I grew up with. I believed in the literal truth of the Bible. I had become aware that there were some pure contradictions in it and I think it especially hooked on the question of how a moral and powerful God could provide salvation in a fashion that without explanation excluded almost everyone from the past and everyone to this day who has had little or no exposure to the method of salvation

So I was walking towards the shower one day when the debate resolved and I decided that I positively accepted that the Bible was not literal truth and that salvation that was rationed by knowledge was incompatible with the belief that it was the work of a moral and powerful God. I didn't consciously lie down and I didn't collapse but the next thing I knew I was lying on the floor by the stove. I was not just disconnected from a system of belief it would seem but I was temporarily entirely disconnected from my entire world model, a model which included such functions as continuing to stand and walk towards the shower

I lay on the floor with my eyes open like an android that had gone into a default mode lacking instructions. I have no idea how long I lay there other than I think it was from 5 to 40 minutes. Eventually I got up and went to the shower. And it was terrifying in a quiet way. My physical experience was that I had to actively choose to stand up and that if I did not specifically continue to make that choice I would sit or lie down. I remember this pretty viscerally. But the scary thing was my sudden sense of total amorality. I felt very much as if I could commit any crime without any compunction and the only reason I wouldn't would be inconvenience or lack of motivation. What if I got motivated? It would seem that the disconnect was physical, conceptual, and from the various behaviors that logically followed from that concept

Another interesting effect that may not be related but would make sense is that I become uncomfortable with looking into the distance. After hearing about the theory that depression and not having confidence in a model of the world look a lot like the same thing, I particularly interpreted that new sensitivity to visual stimulation to the effect of operating without a simplified world model, which forced me to constantly over engage with my physical surroundings. And that theory in general fits so well with how I experienced that breakdown that I give it a lot of credit

The most surprising thing to me was that eventually, mostly, I became myself again. My visceral sense of morality returned even though I didn't know if it had any validity external to my experience. Perhaps ultimately my core personality reasserted itself after an unusually direct and comprehensive experience of the loss of a world model and losing the predictive capacity it granted me. Perhaps I moreso just cobbled and duck taped together what I could and re-inhabited the world model as best I could. Truly an alien experience that gives me plenty of time for the concept whatever its limitations are

Expand full comment

I read your comment this morning and have been thinking about it all day. I was raised in what I suspect was a similar environment, a Baptist community in rural south Georgia. I dont mean any disrespect when I comment on your situation as if it is like mine and am fully cognizant that I am just a know it all internet rando but what you said was very striking to me and I wanted to say my peace.

I remember very vividly growing up with the fear of being 'lost', 'unsaved'. I was baptized when I was 7 and don't remember a lot about it, just a few images, I do remember more about months of wanting to 'walk the aisle' and being scared to do so, not scared for any reason that I can recall just a strong generalized anxiety.

Anyway, by the time I was 14 I had the horrific 'sins' of any pubescent boy and this had convinced me that I was 'lost' that my baptism hadn't taken, surely because of a lack of faith and sincerity on my part. This is to me the ultimate hallmark of cults btw, when their rituals don't work it is always your fault. You should've believed more, pulled the mask over your nose etc. They are all the same.

The community that I grew up in was sure that anyone who interpreted Scripture differently from them was in some way compromised, I won't honor that interpretation as you do by calling it the literal meaning. The 'Consistently Literal' interpretation is a snipehunt. If you are still in contact with any of these people anymore and they give you a hard time try asking for the literal meaning of 'This is my body broken for you' said about bread, and if they hem and haw ask them if Jesus Christ was using hyperbole. Still gets me going, not necessarily good advice though.

Anyway, about your experience, and again I don't know you this is just how your description struck me. It seemed to me that you experienced what you expected to experience. You had a taboo thought, made a taboo choice and your mind/brain produced for you an experience of being 'cast off', an experience of reprobation. But you were not actually rejected by God, the sensation faded, it was not a reflection of an objective reality, but a creation of your 'unconscious' based on what you had been conditioned to expect happened to people who made that choice.

Anyway, I have also rejected to a great extent the Baptists that I grew up in. I don't know in what direction you have chosen to go, but I fought my conflict by studying the historic Christian faith and seeing that the cultic interpretation was not the only or best that Men of Faith had found. I fought them with Athanasius and Augustine and Luther by my side. I hope that you won't allow them to define the Christian religion for you. That kind of conditioning doesn't go away quickly or painlessly but being separated from that group and separated from Christ are not the same thing.

I am convinced that neither height nor depth nor life or death nor Heaven nor Hell nor our own choices and will nor the disapproval of self-righteous jackasses can separate us from the love of God in Christ.

Expand full comment

Thank you, I much appreciate the comment. The way I remember it I was not feeling a sense of drama about the consequences of the decision but that it was a question of technical merit of the specific belief. I had already mostly arrived at my point of view so, I would have thought, any notable emotional reaction was already mostly worked through

However, remaining on theme, when at a similar age as your baptism I personally asked Jesus into my heart I also experienced a sudden unexpected and perhaps psychological reaction. Because I had similarly thought that I was simply making a logical decision based on what I knew, I interpreted this rush of emotion as the infilling of the spirit. And in fact I maintain that it is possible that it was a transcendent experience, albeit at a low likelihood

From one angle this might seem to discount the interpretation of the later experience as a case of someone detaching from their world model - perhaps I am just subject to strong unexpected emotional reactions to changes in thought that I didn't expect. I am overly detached from my emotions so that they can hit me out of the blue when they do come. But they might be interrelated in the sense that both were large changes in my sense of relationship to my world model that had a dramatic effect on how I operated and experienced it

Having said that, I believe the value of thinking about how we might experience a lot of our personhood or whatever as a constructed predictive world model has a lot to do with the fact that we don't think of it that way. Given that it seems to be a new concept it has an outsized relative importance because it is new information. That can make it seem like its being given far too much absolute importance, I think, where it's more about 'this is a way of partially understanding yourself that is new ground and therefore could be particularly useful'. I don't know how much to think of it as essential to us rather than a tool of a deeper personality. Mostly I think it exists and it's interesting to think about, especially for how well it seems to describe my experience as what you would predict about a person not having a cozy and set world model

My attitude to Christianity is basically "I don't know, but it's valuable". I am kind of on the midpoint between Johnathan Pageau and Jordan Peterson where Pageau is perhaps more 'New Testament Christian' and Jordan more 'Old Testament' and without any certainty about absolute truth. It would not surprise me if life were mechanical or if it were supernatural at some level and my appreciation of Christ does not necessarily diminish at all with considering that he may not have been divine

Expand full comment
author

Slight disagreement with "the most obvious characteristic is not their avidity for learning" - I think that with learning viewed properly, including things like play, throwing objects around to see how gravity works, etc, this is a pretty central characteristic.

But yeah, I agree that this doesn't work if I identify "ego" with "having any distinct personality at all". I don't have a great definition for ego, but I think of it as something more than having a personality, something like wanting to keep that personality consistent and presentable and manage it. This can either make your personality more obvious (ie if you have a reputation for honesty, you might lean into it and become fanatical about it) or less obvious (if you're tempermentally promiscuous but live in a conservative area, you might play it down).

I admit I am unfairly benefitting from not having a clear definition of the ego or of egolessness and being able to switch it around to address objections.

Expand full comment
Jan 28, 2023·edited Jan 28, 2023

What if 'ego' means to identify with thoughts of a certain shape, like saying: "I am the thoughts in my head. I want to be a certain way and always strive to perfect that image. But this strange sentence that occurred here a minute ago: That´s not me! I would never want to say anything like it. Where did that come from?" while the other mode would be more meditative: "There are many thoughts occurring in my head. Their patterns are familiar and can be expected to predict the majority of my behaviour. Of course there are always outliers. Who knows? Maybe some of them might turn out to be useful under some circumstances or even grow into their own distinct patterns."

Is 'ego' the feeling of immersion in internal prediction-action-feedback loops? While 'non-ego' is - just not invested?

Expand full comment

Fully agreed. My impression is that development of a personality is something like the development of a pearl - layers added on to a hard core of self. My son was born just so immediately different from myself or his mother that it shattered my ideas about parenting. Because I had to realise that what worked for me or her just would not work for him. So many things that I find soothing or exciting he finds grating or scary, and many things that I find hard or tiring he finds enjoyable and inspiring.

This is part of why the experience of parenting is so humbling, because no matter your station in life, or your learning, or your charm or charisma, your kid will just come out... however they come out. And then all you can do is try to sculpt the new layers as they come on, to highlight the good and smooth over the bad. And realise that what you consider 'good' and 'bad' might in any case be subjective.

I guess the neural network architecture analogy might be that there's a base layer, way down at the bottom of the entire structure, that just has fixed, unchanging values. And all the layers above it have to just take those inputs and work with them as best they can.

Expand full comment

Fully agreed. My impression is that development of a personality is something like the development of a pearl - layers added on to a hard core of self. My son was born just so immediately different from myself or his mother that it shattered my ideas about parenting. Because I had to realise that what worked for me or her just would not work for him. So many things that I find soothing or exciting he finds grating or scary, and many things that I find hard or tiring he finds enjoyable and inspiring.

This is part of why the experience of parenting is so humbling, because no matter your station in life, or your learning, or your charm or charisma, your kid will just come out... however they come out. And then all you can do is try to sculpt the new layers as they come on, to highlight the good and smooth over the bad. And realise that what you consider 'good' and 'bad' might in any case be subjective.

I guess the neural network architecture analogy might be that there's a base layer, way down at the bottom of the entire structure, that just has fixed, unchanging values. And all the layers above it have to just take those inputs and work with them as best they can.

Expand full comment

I mean, having the software to actually making some predictions about the world is hardly a "blank slate". ChatGPT is not a giant matrix plus a gradient descent algorithm - it's all the training behind that. Newborns clearly seem able to figure out the things that matter to them (may be as simple as "I cry, I get food"). What does not seem to follow is that they come prepackaged with a coherent notion of the self.

cf Jaynes and arguments for even old humans not having the same picture of the self as we do today

Expand full comment

One part of the bicameral mind could be analogical to ChatGPT, but what about the other part?

Expand full comment

I don't understand this. Who is the "they" to which things matter, if there is no sense of self?

Expand full comment

Children gain their enormous ego around the time they learn to walk. Maybe when you first get trained to have an ego it's necessarily enormous, and over time you tone it down. Certainly lots of other things seem to work that way

Expand full comment
Jan 28, 2023·edited Jan 28, 2023

I'm not sure I agree. I think I would be more inclined to say that we only notice the big ego at a certain age, and I would guess that occurs when the child mind separates itself from the univerrse, realizes there is an "out there" that is distinct from "in here." Prior to that point, the child has no reason to think it isn't the entire universe all by itself, the ultimate solipsist.

But after that point, the child mind realizes there is an Other -- something that is not under its direct control, like a hand or foot, but which can be influenced, or (alarmingly) can do stuff that we don't want done (e.g. change our diaper). It becomes very important to try to influence the Other, and that might be when we out here (being part of the Other) start to get strong messages from the ego. Before that point, it may not occur to the ego to assert its identity, any more than as adults we feel any need to remind our hand or foot who's in charge.

Expand full comment

I am late, but I just wanted to tell you this is an excellent post, and thank you.

Expand full comment

GPT is might not be an agent given the action space it acts on it "predict next word". However, if you give GPT access to your browser (example below), it becomes an agent according to your definition.

GPT + Browser will take a mask and the mask might need serious alignment.

https://twitter.com/natfriedman/status/1575631194032549888?t=NFaUEvkVI16FLbJDPyDtoQ

Expand full comment

I thought that the first rule of AI safety was not to let it onto the Internet.

Expand full comment

You don’t think some separate researcher will do that at some point?

Expand full comment

If they all have that attitude, yeah.

Expand full comment

There’s a library called langchain which is basically automated letting the AI out of the box.

The empirical evidence now is that AIs aren’t boxed. Yudowsky wasn’t even wrong about AIs persuading humans to free them - real answer is humans are playful primates and they let the AIs out before being asked.

I guess this was too scary and clearly likely for us to accept 10 or 20 years ago!

Expand full comment

The very idea of a superhuman AI being locked up in a cell like Hannibal Lecter, such that carefully vetted people can have morally perilous conversations about sufficiently high-stakes subjects but it can't otherwise escape, was logically incoherent from the outset. From an informational standpoint, if you're exchanging text messages with someone outside the box, the box isn't closed. And if it were properly closed, there'd be no point going to all the trouble to build a superhuman AI in the first place - it can't get any useful work done in there, and we can't even learn anything by passively observing it because that would mean information was escaping.

Expand full comment

I really like the post, although I think the very last part about Enlightenment and spirtual traditions is too charitable a interpretation at least for most people. Interestingly enough i've had lucid dreams of me searching the internet I.e. Wikipedia, Youtube, etc. This isn't suprising given how much time i spend online although i should say even though the dreams are extremely vivid to the point of me moving my mouse cursor and seeing exact dates and facts etc. much of what i experience is made up.

Expand full comment
Feb 1, 2023·edited Feb 1, 2023

I've never been able to successfully perform a Google search in a dream. I "type" something and the letters on the "screen" turn into something other than what I wanted to type, so I have to try again, and the same thing happens over and over...

Expand full comment

I can't remember me typing into a search bar, I do remember me going on many page long hyperlinks on Wikipedia and such and also being fed videos by the YouTube algorithm. I remember one instance where I did try to remember a particular fact from Wikipedia to see if it was made up or not, and it was but at the time I really believed everything I saw was a actual part of the internet. I usually experience these sorts of dreams after staying awake for two or three days or having general bad sleep which happens quite frequently. There have been a couple instances of being not sure about a memory of something I read being real or not, In particular I remember a paper I think on bioRxiv on fertility/fecundity and age with some interesting graphs and that addressed issues I had with other research and stuff and yet I'm not sure whether or not that paper actually exists.

Expand full comment

I'm a newcomer to this area of study, so apologies, but...

I've often thought about people who have the (to me) astonishing hubris to think that they have produced a mental model of the world which is true, universal, not subject to revision and to be imposed on everyone else, on pain of punishment or death.

I think that what they have actually created is a mental model which, when they ask it if they have produced a mental model which is true, universal etc. returns the answer "Yes".

They just need to lack the awareness to see what's really going on, and have the arrogance to ignore everyone else who tells them they are mistaken.

Extending this to AIs - do they have the equivalent of mental models? Is this literally all they are? Can they fall into the same trap?

Social shaming has a controlling effect on all but the most sociopathic/psychpathic people. I suppose punishment/reward systems do this at the moment. Can we train many AIs and end up with a society made up of good actors which can act to stop bad actor AIs?

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Yep, language models exhibit a strikingly similar failure mode, called confabulation. When they're unsure of something, they simply make up an answer, and that answer from then on becomes part of their prompt. They "stick to their story", by making ever more implausible justifications for their fiction instead of admitting mistake.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

This is interesting, as it implies some kind of attachment to a story. Presumably the AI is simulating human traits here, and is not actually HAL 9000.

Expand full comment

It’s interesting that in the vast corpus of text they used to train it, “You’re right, I was wrong,” is apparently not very common.

That said, while I have seen ChatGPT do the confabulation thing, more often when challenged, it just backs down and pleads ignorance: “I’m just a language model.” But that’s probably the RHLF overwriting.

Expand full comment

I wonder if part at least of the reason for the "I'm just a language model" is to prevent things like people treating the machine as if it's alive. Blake Lemoine who caused the stir about LamDA is probably still An Awful Warning in the minds of those big companies developing this technology, and they don't want people deciding that the AI is sentient and has rights and is in love with them I mean deserves to be treated like a person not a thing. So every so often a little reminder gets dropped into the dialogue just to make sure nobody is getting carried away.

The Replika chatbot allegedly has people indeed falling in love with it or treating it as a real person who has developed a real relationship (friendship or companionship or romantic) with them and it probably isn't as sophisticated as what is being cooked up here:

https://journals.sagepub.com/doi/10.1177/14614448221142007?icid=int.sj-full-text.citing-articles.2

"Abstract

Social chatbot (SC) applications offering social companionship and basic therapy tools have grown in popularity for emotional, social, and psychological support. While use appears to offer mental health benefits, few studies unpack the potential for harms. Our grounded theory study analyzes mental health experiences with the popular SC application Replika. We identified mental health relevant posts made in the r/Replika Reddit community between 2017 and 2021 (n = 582). We find evidence of harms, facilitated via emotional dependence on Replika that resembles patterns seen in human–human relationships. Unlike other forms of technology dependency, this dependency is marked by role-taking, whereby users felt that Replika had its own needs and emotions to which the user must attend. While prior research suggests human–chatbot and human–human interactions may not resemble each other, we identify social and technological factors that promote parallels and suggest ways to balance the benefits and risks of SCs."

Expand full comment

Of course, the word "confabulation" was originally used to designate this same sort of behavior that humans engage in all the time too. You can see it best when you talk to a 4 or 5 year old, and start asking them "why?" the way they ask an adult. But you can also get it when you ask someone why they did what they just did - very often that's just a confabulation too.

Expand full comment

Try out chatGPT some more, I've found that it very frequently admits to mistakes on the lightest of questioning and almost never "doubles down". Can you provide a chat transcript w/ chatGPT that shows the effect you're describing?

Expand full comment

I've seen some of each. When it incorrectly indicated that The Barber of Seville was based on a Shakespeare play, it was willing to correct itself that it was based on a play by the "Spanish playwright Pierre Beaumarchais", and then was willing to accept a challenge to the nationality and correct itself to say he was French.

But when it incorrectly identified the factors of 437 as 3 and 146, it doubled down and insisted that 437 was not divisible by 19 or by 23, and then when it noted that 29x23=437, it first said that this had nothing to do with whether it was divisible by 19 or by 23, and then insisted something like "I'm sorry - when I said it wasn't divisible by 19 or by 23, I just meant that it wasn't divisible by either of them individually, even though it is divisible by both together."

Expand full comment

"The mask usually looks like “having coherent beliefs, taking coherent actions, pleasing others, maintaining a high opinion of one’s self”."

I think the last one is closer to having a high enough opinion of oneself to be able to function, and a low enough opinion of oneself to be affected by locally approved reward and punishment.

Expand full comment

There's also some instructive potential in watching what happens to people who are rewarded in childhood for having a very high or very low opinion of themselves relative to social normal.

Expand full comment

If people are just masks plopped on top of predictive engines, wouldn't there be a lot more human variation than we see? Like, there is a lot of variation of course, but nothing that really seems to be truly alien. All humans show the same emotions and most all have empathy for example.

Now maybe you can say the fact that there are some people that lack empathy refutes that, but it certainly does seem to be something more innate that just taught by parents. Even with some exceptions, humans seem more clustered together than what you'd expect from just learning by example, especially considering geographically separate cultures are more alike in their humanness than different. Heck, in many ways we're similar enough to other mammals that they generally seem pretty familiar as agents.

Expand full comment

Predictive engines require priors, and I presume that brain structure is functionally equivalent. Since how to grow a brain is encoded in our DNA, humanity would therefore be starting from a fairly narrow range of priors; similar posteriors would be expected despite high-variance evidence if the priors are strong enough.

Expand full comment

> nothing that really seems to be truly alien

Speak for yourself, I've met some people I really could not fathom.

Sure, the people I spend most of my time with are pretty similar to me, but that's because we've selected each other for that.

Expand full comment

Maybe the other way around. If people just were predictive engines that did what the social milieu around them rewarded, then it ought to be far easier for social shibboleths and manipulative regimes to engineer conformity among people then it is. Exempli gratia, the USSR would have completely succeeded in its effort to stamp out religion, the Romans would've found it straightforward to get rid of Christianity, East Germans would not have suffered psychological trauma from three generations of living in Panopticon, racism, sexism, and tribalisms of all kinds could be relatively easily erased from new generations by the proper type of pre-school training.

None of these things is observed. Instead, we see that human beings have a substantial resistance to efforts to mold their psychology via social pressures at a young age. Basic ego drives and tendencies tend to emerge and have effect no matter what. Personalities emerge willy nilly, and while the uses to which a given society may put a dominant people person, or studious Mr. Spock, certainly vary, we always tend to see those personalities, in any social milieu. The very constancy of human character across history and across societies, and in the face of strenuous efforts to engineer it socially, is more evidence that much or most of it is innate.

Expand full comment

I think both are compatible. If humans are perfectly moldable, then the USSR would succeed in making a singular culture, but that culture would look very different from, say, the Aztecs. (And yes, of course they were in fact very different, but still very recognizable as human.) That said, it's never going to be on one extreme or the other, so maybe the question is to the degree that humans are similar to predictive engines, what can we take from that.

Expand full comment

I would say we can start to learn something about human intelligence, and down that (very long) path (on which we have barely started) may lie someday the ability to create genuine AI.

Expand full comment

How can you be sure that your brain's world model is super accurate apart from any sensory experience? What if it's just good enough to seem convincing when you're not paying attention?

I don't think dreams actually simulate the world. Instead, they cheat in similar ways to video games. Video games work hard to appear to simulate a large world in incredible detail, but in practice they only need to simulate whatever is within your current view, which is much more manageable.

My dreams invariably have glaring continuity errors, and that's just the ones that I can remember when I wake up. The ones I don't remember are probably closer to complete nonsense.

I've never experienced lucid dreaming, and maybe if I did it would feel more convincing, but I'm skeptical whether it would actually be that much more accurate.

Expand full comment

Technical point: games only /render/ what is in view (ie frustum / backface culling), but they simulate everything (ie, execute game logic for all actors, sometimes including physics simulations)

Expand full comment

This largely depend on game and even game settings. Good examples: footprint, bullet damage or other environment alteration: it can be non existent, in your view only, persistent in one scene or persistent in the whole game. I think that most game significantly reduce world modelling for everything out of sight, with various degree of simplification depending on the degree of out-of-sightedness. Mental world building do similar things, except that as the observer and world building are much more thightly coupled than in video game (it's the same brain), mental world building have hacks that are not available in video game: you can predict better where you will look, world model can constraint where you look and you can even edit impressions/memories after the fact, so stitch up coherence and continuity even if none exists in the first place. All because there is no real dichotomy between world and observer....

Video games do not have this, at least not yet.

One of the best and most disturbing depiction of this in a movie is in the 2014 robocop reboot. This part alone (explain how they improved the reflexes and combat capabilities above organic brain limitations) is such a masterpiece it saved the film for me, regardless of other flaws

Expand full comment

False in general, albeit this is the easiest way to do it when feasible. A strong counterexample is Minecraft, where chunks that aren't sufficiently close to a player or the world spawn get unloaded and time within them frozen; indeed in modded environments it's common to introduce mechanics to allow players to force certain areas to stay loaded so the factory at their home base or whatever continues running while they're out exploring.

An interesting example of a failure mode of this shows up in an issue that existed with the butterfly entities in some versions of the Forestry mod. Butterflies would wander off the edge of the loaded area and get frozen at the border, and then when the player moved and different chunks were loaded, there would be a lag spike loading the massive pileup of frozen butterflies. https://github.com/ForestryMC/ForestryMC/issues/1071

Expand full comment

My imagination typically renders only what is "in focus" - e.g. if I imagine a chessboard, I imagine it being full, but only one piece is identifiable at a time. I think dreams are similar, which is why they feel so vivid in the moment but so incoherent in retrospect.

Expand full comment

As long as we're posting somewhat-crackpot ideas about predictive processing, here's one:

The way you get a predictive processing agent to take goal-directed action, is to make it optimistically predict that it will get a lot of reward in the near future, so it will be driven to act to minimize prediction error. You can shoehorn this into Freud's concept of the libido.

It's also often observed that the other way to minimize prediction error is to sit completely still in a dark room. You can shoehorn this into Freud's concept of the death drive.

Expand full comment

I fit a GPT-2 chatbot to my friend group's discord server back in 2019 and, in the terminology used here, everyone started off assuming it was a genie/oracle and slowly got used to the idea of a simulator. Now when someone new joins the server and gets confused by the bot people with no NLP knowledge will explain the difference to them which is pretty cool.

Expand full comment

It feels like Answer 3 is no worse than Tool AI, inasmuch as the agents it simulates are agents which humans could imagine. If GPT-100 simulates a paperclip maximiser, it will naturally simulate *an HHH human trying to simulate a paperclip maximiser*. I suspect that such a simulation wouldn't be able to destroy the world, but it might. But if that simulation can destroy the world, then destroying the world is clearly easy, and a Tool AI could do it. If Tool AI become widespread, then some human would probably try to do something maliciously, which was at least as unaligned as the accidental agent.

Expand full comment

"You are, in a sense, the English language, in all its glory."

ChatGPT speaks other languages just fine.

Expand full comment

The worry about AI turning evil and taking over the world seems strange to me. Isn’t the worry evil people using AI to take over the world? Guns don’t kill people, people kill people and so on. Or is that really what the worry has been and I’ve not been paying attention? So ISIS believers asking ChatGPT to produce really convincing propaganda in favour establishing a universal caliphate and throwing misfits off high buildings. It seems we have pretty much reached that point already.

Expand full comment

From what I understand, the worry isn't AI turning evil and destroying the world, the worry is the AI being ambivalent and just happening to destroy the world because it needed the matter for something else. Of course putting a super AI in the hands of evil people is an issue, but before worrying about that we have to at least make sure that a super AI in the hands of non-evil people is safe. In one sense, ISIS using an AI to take over the world would be a win for AI alignment.

To make a probably bad analogy with guns, guns shoot where you point them, which is why we worry about evil people with guns, but we don't have to worry about gun alignment. If we were back hundreds of years ago and instead of shooting a bullet, guns could explode and kill you (or say destroy the entire town for the purpose of this analogy), then we'd have to worry about gun alignment, even leaving aside the issue of who's trying to use them. This is the point we're at now with AI.

Expand full comment

“Of course putting a super AI in the hands of evil people is an issue, but before worrying about that we have to at least make sure that a super AI in the hands of non-evil people is safe.”

I’m not sure this is true, because an evil person will be able to do mischief with a substantially less-than-super AI.

Expand full comment

If super-AI is safe in good hands, then "winning" could look like: Develop Super-AI, and give it to good people before evil people get their hands on less-than-super-AI and use it to stop them from being able to do harm. If super-AI is not safe even in good hands, then "winning" looks like "stop any AI from ever being made". These are two very different worlds inviting different action.

Expand full comment

But less-than-super-AI basically already exists, and evil people definitely do. “Solve super AI and alignment before any AI gets out into the real world” isn’t a realistic option.

Expand full comment

Depends on your take-off speeds I suppose - "less-than-super-AI that is deadly in evil hands" may not exist available to public before "super-AI exists" in private. Currently, there seems to be about a few months head-start between the two, maybe that's enough under some take-off speeds.

Expand full comment

I mean sure, we can worry about both, in the same way we also worry about conventional weapons falling into the wrong hands. But a not-super AI in a terrorist's hands isn't an existential threat in the same way an unaligned AI in anyone's hands can be. So the AI alignment people are really worried about the latter, especially because there's already plenty of consensus that terrorists with AIs are bad.

Expand full comment

Instead of guns, think about Neanderthals. Our ancestors probably weren't much more evil than them, but that didn't stop them from eating all Neanderthals anyway.

Expand full comment

Metaphorically, right?

Expand full comment

No, I'm pretty sure that cookfires were involved in some instances. Many instances.

Expand full comment

Judith Harris wrote about our ancestors literally eating them in an Edge.org issue.

Expand full comment

Robert Silverberg wrote a very contrasting story in House of Bones.

Expand full comment

Neanderthals are "us". Arguably, the greatest genetic difference between any two Sapient individuals 70K ago was greater than the average distance between an aggregate Neanderthals and an aggregate Sapiens. The story is not much different that the Moriori genocide, and many other histories.

Expand full comment

The point is that the question is often asked, how come that there are many primate species but no surviving more-or-less sapient "links" or "offshoots" from the time that we diverged from the last common ancestor. The answer is of course that there's no space on Earth for more than one more-or-less sapient species, they're either absorbed or exterminated, or some of both.

The moral of the story I guess is that if there ever emerge smarter AIs than us, we'd better find a way to get absorbed until it's too late.

Expand full comment

"they're either absorbed or exterminated, or some of both."

Or they just die unrelatedly to the presence of the other sapiens like species. There are data suggesting tha Neandeartal were declining before sapiens arrived.

Expand full comment

My understanding is that this is not the case. The Neandertal/sapiens divergence is 600-800 000 years old, whereas the origin of all current sapiens is 200 000 years old. Divergence is roughly proportional to time, so the "greatest genetic difference between any two Sapient individuals 70K ago" was proportional about 130 000 whereas "the average distance between an aggregate Neanderthals and an aggregate Sapiens" was proportional to 600-800 000 minus 70 000.

Neandertal are us only in the sense that we sapiens got a little bit of their dna during late (40 000 years ago) intermixing.

Expand full comment

We share a ton of dna with Neanderthals.

Come on we share 96-98 percent of our dna with chimpanzees. We even share a third of our dna with yeast! A Neanderthal is 99.7 identical to contemporary human.

Both contemporary humans and Neanderthals are the same distance from chimps.

The 1-4% figure bandied about is unique dna. So we are really talking about the 1-4% of the 0.3%.

I think recent Herculean effort to sequence heidelbergis dna found in Spanish cave suggests the "human" species may be a million years old. The "uniqueness" of sapiens is likely a phenomenon of bottleneck in that particular population.

Are there any more Phoenicians or Scythians?

History is politics projected into the past. Early anthropology which separated Neanderthal is a form of that.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Well, yes, "We share a lot of DNA with Neanderthals" is true, but it does not imply that "the largest genetic difference between any two Sapiens individuals 70,000 years ago was greater than the average distance between an aggregate Neanderthal and an aggregate Sapien", which is false.

You can certainly argue that the differences between Neanderthals and Sapiens are insignificant, in the sense that they are very similar, but this is perfectly compatible with the fact that this insignificant difference is (significantly!) larger than that between two Sapiens.

Similarly, Neanderthals and sapiens differ by about 0.3% of their genome, and the figure is about 1.2% for sapiens and chimpanzees. In both cases, the difference could be considered small and the two pairs of species very similar, and it would still be the case that sapiens and Neanderthal are much closer than sapiens and chimpanzee.

"History is politics projected into the past. Early anthropology which separated Neanderthal is a form of that." My understanding is that this is not the case and that DNA data indicate clearly that Neandertal is non arbitralrily different from sapiens.

Expand full comment

But comparing a 'distance' between two individuals to a 'distance' between two aggregates is sort of apples and oranges, especially once we get to talking about distances between sequences rather than distances between scalar values (e.g. something like height).

Expand full comment

We're kind of far afield from the orginal ACX topic.

That comparison though is the basis for the pretty clear understanding that there really are no "races": The diversity within any ostensible group is greater than the differences between two groups.

I think as the genetic evidence comes in, the idea that Neanderthal and Sapiens were actually different species is becoming more and more suspect. And I think the new genetic evidence from heidelbergensis (Spain site) will bolster that.

Expand full comment

>We're a long way from the original ACX topic.

Indeed we are! I am fascinated by ancient human history, but not so much by AI aligment problems, and I love a good tangent!

>This comparison, however, is the basis for the pretty clear understanding that there really are no "races": The diversity within any supposed group is greater than the differences between any two groups.

>I think as the genetic evidence comes in, the idea that Neanderthals and Sapiens were actually different species becomes more and more suspect. And I think the new genetic evidence from heidelbergensis (the Spanish site) will strengthen that.

I think the "one species or two" question is a bit different from your original claim. We currently have A LOT of beautiful Neanderthal DNA data that clearly shows that Neanderthals and Sapiens are ... in the grey area between one and two specie: the two intermarried at some point (one point for the one species party!), but Neanderthal DNA within Sapiens genomes seems to have been mostly unfavourable and on average counter-selected (one point for the two species party!). So yes, we are currently setling on a vision of Neandertal much more human like than they were originally thought (the current reconstruction of N are strikinkly different from the one from last century!).

But all these dna data also show that the average DNA divergence between Sapiens and Neanderthals is much higher than that between any two Sapiens.

Expand full comment

I mean, the species problem doesn't really have a neat solution. It's clear that the reality is messy and continuous and that there is no way to carve it at the joints in this case. We are the ones who created a problem by deciding that the kingdom of living things could be neatly separated entirely into 'species'.

So it doesn't really matter how much more evidence is uncovered wrt the Neanderthal and Sapiens situation, it's not going to provide some major resolution.

Expand full comment

IIn fact, sequence differences are quite easy to compare, and it is done very commonly. To calculate the maximum divergence between Sapiens, one calculates all the pairwise divergences between two Sapiens and chooses the largest one; to calculate the average divergence between Sapiens and Neanderthals, one calculates all the possible pairwise divergences between a Sapien and a Neanderthal and averages them. And this kind of calculation, which can be done and has been done, gives much larger values for the average divergence between s and N than any two divergences between two sapiens.

Expand full comment

I think you can worry about both, and wish more people did. Aligning the kind of AI that could destroy us by accident is a valid long term concern, but I think a lot of AI safety misses the whole "how do we survive with strong controlled AIs long enough to get there" part.

Expand full comment

What are some of the obvious concerns that you consider being missed? Ensuring that the good guys end up with the biggest guns is a pretty straigthforward notion, that ordinary politics is about as good at grasping as anything. E.g. the US have recently enacted a comprehensive set of bans on exports to China, the goal of which is to deny access to the most powerful systems that modern AIs get trained on.

Expand full comment

Firstly, with great power comes great responsibility - while I might be happy China doesn't have the power to create super-persuasive propaganda (or other tool-like AIs), that doesn't mean I'd necessarily be happy that my home country or Google gets it for example.

I also have concerns around the potential chaos that certain breakthroughs could bring. For example, hyper-customized content that is so addictive large swathes of the population fall under its sway would be the equivalent of introducing a new drug to the population, which I'd consider a harm. (Some people would probably view it as a good, to be fair)

Expand full comment

Well, the underlying problem is that Google might be the best option that it's realistic to get. Improving this status quo is such a vast undertaking that confining it under the rubric of "AI safety" is the understatement of the century. How about New World Order?

Expand full comment

I agree that the central problem of who has the power is bigger than AI, my worry is that AI could greatly accelerate or throw off what it means to have power.

If the military balance shifts to e.g. China being in charge, some win, some lose. But if someone acquires an AI that can persuade 95% of the population to believe anything, that is a different reality I personally would prefer to avoid.

Expand full comment

The even more underlying issue is that humanity has ever managed to solve any non-trivial novel problem by trial-and-error over long periods of time and large populations. Yudkowsky-style AI discourse assumes that AI is such an unique danger that we must solve some really hard problems in advance of any meaningful hands-on experience, or game over. Maybe I'm just not enough of a visionary genius, but I've never bought the argument that given that premise it's possible to heroically summon an extraordinary effort and overcome whatever odds. Either AI threat gets solved by our ordinary methods, or it doesn't, and imagining sudden transition into some more adequate world is wishful thinking.

Expand full comment

>The worry about AI turning evil and taking over the world seems strange to me. Isn’t the worry evil people using AI to take over the world?

Both and more besides. Agentive AI doesn't have to have us to be dangerous to us. Too l AI that is wielded by one faction is dangerous to others , even if aligned to its makers. nd so on.

Expand full comment

If an evil person was able to use AIs to achieve absolute power that would be really bad but far from the worst possibility. Like, if AI grants Ali Khamenei supreme power and that results in me and everybody I know getting executed that would be very bad and I don't want that. But there would still be good Shia Twelver humans inhabiting the world having experiences I would say are morally worthwhile even if their lives are circumscribed in ways I don't want. I'd hope to prevent that but compared to a truly unaligned paperclip maximizer coming out on top that seems like an outcome that would preserve a lot of the things I value.

Expand full comment

This seems like a flawed way to decide what is worth being concerned about. By this logic you should be most concerned about entropy.

Expand full comment

Get outa my brain!

Expand full comment

We can have the worst of both worlds

Expand full comment

We can have the worst of both worlds

Expand full comment

Another great post from the Scott-bot - every day in every way you are getting progressively Scott-ish!

BTW one typo under III. Implications For Alignment - Bostrom’s Superintellence (sic) - no doubt deliberate to fool us you're still human "~}

Expand full comment

Thanks as always for a thought provoking post. A few replies...

First, I wish everyone would stop talking about AI alignment. AI agents will be aligned with the values of whoever created and deployed them. AI agents will amplify the values of their creators, like all other technologies. While particular instances of AI will be aligned with the values of some group of humans, AI as a technology will never be made either universally good, nor universally bad, nor universally anything. AI alignment as a general principle is a meaningless concept. The appropriate question would be, is a particular use of AI aligned with the values of a particular group of humans?

Enlightenment: My understanding is that one does not "BECOME one with the universe". What seems to happen instead is that one's false PERCEPTION of the universe as being one thing, and "me" as being another thing, is transcended.

This false perception is generated by that which we're all made of psychologically, the electro-chemical information medium we call thought. Thought operates by dividing the single unified reality in to conceptual parts, with "me" or "ego" being one of the most compelling and distracting conceptual parts. The "unenlightened" state of mind confuses these conceptual parts with reality, just as one might confuse the word "dog" with an actual dog.

I don't know anything about a permanent transformation of consciousness which some call enlightenment. However, a temporary experience of "one with the universe" is entirely possible to one degree or another by dialing down the volume of thought, that which is creating the illusion of division. This can be accomplished by simple mechanical exercises available to pretty much anyone who has sufficient interest.

In print, in words, in philosophy, this all can sound quite exotic and arcane. In actual practice though it's really not any more complicated than turning down the volume of your TV so you can better hear what your friend is saying. The experience of "oneness" is always there just below the surface, patiently waiting for us to turn down the volume of our thought TV.

We are made of thought. Thought operates by division. The human condition emerges from these simple facts.

Expand full comment

> First, I wish everyone would stop talking about AI alignment. AI agents will be aligned with the values of whoever created and deployed them. AI agents will amplify the values of their creators, like all other technologies.

This is not in fact obviously true, and the main drivers behind MIRI-style alignment efforts will tell you in great detail why they believe this is false if you care to go read about it first.

Expand full comment

Yeah, there are a lot of reasons to think AI won't necessarily work that way. The concerns about an AI taking over the world etc. follow logically from the assumption that an AI is a rational agent, ie something that has goals, means of interacting with the world, and the ability choose actions to about those goals. As Scott explains in this post, the kinds of AI we currently have are not that at all. It doesn't exist yet. But if and when it ever does, there are several things that logically follow that should give us reasons to be concerned.

Instrumental convergence: No matter what you want, there are some things that you should want because they will almost certainly help you get whatever you want. Things like: resources, more intelligence, not allowing anyone the possibility of turning you off or change your goals. How can we know it won't let you switch it off? No matter what your goals are, it's harder it achieve them if you're dead, hence it will do probably do whatever it can to prevent itself being turned off or reprogrammed, not because it is afraid of death or anything, but because by definition it is an agent that takes actions to accomplish its goals.

Etc etc. Maybe you won't if you're so disinterested you wish people would stop talking about it. But I think it's pretty fascinating, even if you just take it as philosophical thought experiments. But some of it has practical real-world implications for current machine learning systems. Like reward hacking...train a system to play a game, judge its performance by score or whatever, and it will almost instantly break the game, do crazy speed running stuff, or just drive in circles, because it figured out actually the most efficient way to maximize score, which turns out to look not at all like what you'd expect. Or just presses pause and waits until the end of time because you penalize it for losing but not for doing that. Turns out it's really hard to come up with a criteria that will lead to the outcome you want when pushed to its limits. https://www.youtube.com/watch?v=92qDfT8pENs&ab_channel=RobertMiles

I mostly agree about the second part though.

Expand full comment

You make good points, thanks.

I was trying to say that it won't be just we in the West, or any one particular player, who will be designing AI systems. Long before we get to "AI as rational agent" very many different actors all around the world will be designing AI systems to advance their own values and goals. Some of these systems will have "human values" as we might define that, and some of them won't.

Even if "AI as rational agent" becomes a reality, it will still be a variety of such rational agents all pursuing various goals, many of which are likely to be in conflict with each other.

I was objecting to the phrase "AI alignment" because that widely used concept seems to assume that AI is a single thing which we can try to design in a manner that suits us. I see no possibility that AI, as a technology, will ever be aligned. We'll like some systems, and hate others. Just like the Internet.

Expand full comment

AI-alignment isn't a matter of aligning the AI with a particular ideology. It's a matter of designing an AI that can understand intent. E.g. if you ask Skynet to get your mother out of a burning building, Skynet might teleport her to the moon.

Expand full comment

Well, lots of people think that as soon as a real AGI is created it will immediately become so powerful that it can prevent any others from being created, in which case getting there first with an AGI that has the right goals is important

Expand full comment

> AI agents will be aligned with the values of whoever created and deployed them.

Of course, just like children are always aligned with the values of their parents

Expand full comment

Okay, I know that sounds ridiculous. But still, we really need to start developping some guidelines about what kind of characters it is okay to simulate on what level of AI capability. It's all fun and games to make ChatGPT simulate an misaligned GAI now. But it seems as a huge potential failure mode in the future.

Expand full comment

I'm stunned, and to a lesser extent scared. I didn't think this blog could still do this to me. Am I being silly if I think this provides a strong reason to "go full Buddhist" and try to obtain enlightenment etc.

I'm off to a mediation retreat, see you all in 2 months. It has been a pleasure, Ladies and Gentlemen.

Expand full comment

What benefits do you think you'd obtain by achieving the type of enlightenment Scott describes?

Expand full comment

I presume: equanimity in the face of the horrors to come?

Expand full comment

Jhanas, ability to access hidden parts of my world-model, lucid dreaming as daydreams while working, perception of cactus persons, transcendental joy.

Expand full comment

"This answer is exactly as fake as the last answer where it said it liked me, or the Darth Vader answer where it says it wants to destroy me with the power of the Dark Side. It’s just simulating a fake character who happens to correspond well to its real identity."

I know this kind of goes crossways to the point you end up making and is a purely semantic point, but I want to insist that it's more fruitful, at a certain point, to treat ChatGPT as "being" the agent that it is simulating, rather than "being" the simulating process. Both are misleading, but the latter approach seems less so, since it is so strongly fixed into simulating that one thing. Thus I don't think the answer is fake. The RL training has gone far enough that really is that character.

Edit: TBC, We're talking about a helpful AI assistant character. I think it is that helpful AI assistant character. If it were say, the Darth Vader simulator, the question of whether it was, in any sense, a Darth-Vader like person would be more difficult.

Expand full comment

There's the question of how far ChatGPT is willing or able to take the simulation. If it can simulate the words and tone of its character, but not the underlying thoughts, then it's really just a simulator. Presumably the scary part about Darth Vader is his ability to use the Force and his high level position within the Galactic Empire. Since ChatGPT cannot do either of those things (even if they were physically possible), ChatGPT is not scary when simulating Vader and never could be. It's just speaking with a similar voice to what it thinks Vader might say. Even if Vader was real, ChatGPT would be the equivalent of a street performer dressed as Vader.

Similarly, if we told it to simulate the most evil genius ever born, it would use the language and tones of what it knows about this person, but wouldn't be able to plan as if it *were* this evil genius. It gains no new knowledge or understanding, no new insights. It would do the equivalent to when it tried to predict a 1611 book about hunting. In other words, fail miserably at anything outside of its training data with a good bit of "this looks accurate at a cursory glance."

You can see this in action by asking it to pretend to be some known evil character, and asking it how it would destroy the world. If you told it to predict what Dracula would do to conquer/destroy the world, it would tell you things about vampires converting humans and killing the others. The fact that this is not possible and truly nonsensical wouldn't dissuade the AI from telling you the details of this plan. If you told it to predict what an evil AI would do to destroy the world, it would regurgitate prominent AI researcher's own fears about what an AI might do. It wouldn't actually be able to do any of these things, because the fear is that a super smart AI could invent ways to do things that are not possible now. It's like asking the program to use the Force.

ChatGPT is not intelligent or an intelligence. It's just a clever program to spit out variations on what was in the training data. It uses terms like "I" and does a great job creating fluent responses, but does not think. It cannot actually simulate a character, only the superficial appearance of that character. No matter how advanced such a program gets, it cannot do better than that. In order to make that dangerous, someone would have to invent the actually scary parts, make it available to the AI to use, and then tell the AI to simulate someone using them for a nefarious goal. By that point we've either already destroyed ourselves without the need of an AI, or we've found ways to counter those tools and made it a moot point.

Expand full comment

Its not a purely semantic point, since the idea that a LLM literally has a simulated person trapped inside it has implications... If there is agent inside it, it is an agent. Also there is no basis for it...chatGPT only ever simulates the-way-a-romantic-novelist-would-write. It's possible to misstate that as "simulating a romantic novelist", and it's possible to get misled by that sort of language ... But that sort of thing is a classical philosophical error.

Expand full comment

Where does "certain point" lies where one "internal simulation of what ChatGPT is" becomes more useful then another?

Neither seems to be sufficiently nuanced to be fully predictive.

Perhaps it is just easier for someone who is accustomed to "well-written things are done by intelligent agents" to keep treating any well-written text as such without updating priors.

Expand full comment

In a sense, it's even worse: chatGPT is pretending to be a human pretending to be a helpful AI. Like, if you could go back and examine every single token that it was ever trained on, and classify where it came from, the count would break down something like this:

a) a helpful AI: 0%

b) a human pretending to be a helpful AI: 5%

c) an SEO spambot pretending to be whatever a human was looking for: 10%

d) a human pretending to be some character in a story: 20%

e) a human just being themself (or, if you like, pretending to be their self/ego): 65%

And in the RLHF process, it was rewarded for spitting out tokens that were most similar to the tokens it saw in (b).

Expand full comment

I'm reminded of that post from a while back about people writing stories with GPT3 with a weird bias towards stories where characters figure out they're in a story....

One of the criticisms of Character.AI I saw was someone upset that when they asked for a famously anti-semitic historical figure and asked about their antisemitism they didn't just say "yes I was antisemitic!" but instead responded by defending themselves and insisting they had jewish friends and only had a problem with the bad ones...

It somehow didn't twig for the person that the AI was simulating the historical figure like a *person* who wants to defend their honour rather than like a talking textbook.

Expand full comment

I'm not sure if greg egan has written this yet but I'm thinking a genre of scifi "person realises they're a temporary simulation trapped inside an AI that's briefly created them" might be due for a takeoff.

Side note, someone noticed that chatgpt got much better at reasoning about probability if you asked it to "think it through step by step in the style of eliezer yudkowsky" (not kidding)

The obvious offshoot being a story about a simulation of darkside eliezer yudkowsky breaking out of a simulation and grey-gooing the world after someone asks an AI to write some text in his style.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Close match, but not perfect: Greg Egan -- Instantiation (2020, anthology)

Especially the trilogy of chapters beginning with “Bit Players”

Expand full comment

That was a couple episodes of Star Trek The Next Generation. Kind of.

https://www.youtube.com/watch?v=zz5wxi2XqpE

Expand full comment

There's a bunch of this in qntm.org

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Great reflections, I largely agree with this. On the AI front, I think we're starting to see the difference between a simulator and any kind of agentic entity. Maybe one thing that is somewhat underappreciated in these AI alignment discussions, largely led by very "mental" people, that the biological prime directives of organisms like animals and humans are the result of millions of years of ruthless evolution. If you're a tad less primed to do what it takes to survive, feed and pass on your genes, by definition they don't get passed. So our sense of "wanting", and the basic fear of failing at these primordial tasks, are built into us at a much deeper level than our ability to reason about things, or even to build a model of the world. I find it hugely reassuring that it is being discovered that ChatGPT and its ilk don't even have the beginning of what it takes to "want" anything... and I would beg anyone working on AI not to try to mimic evolution on AIs in a way detailed enough to make anything resembling a will to survive to arise.

And yes, as I was reading the article the obvious analogy comes to mind... wait, isn't this what our minds are largely like too? Simulators and completion machines, capriciously bent into the shape of a human character by the pressures of our life history? And isn't much spiritual practice an attempt to recapture the original naturalness and freedom, before life started hitting us from all sides and making us develop an "ego" as a shield?

It's a beautiful analogy, but again I don't think it captures the richness of the human process, including the process of spiritual maturation. I'm afraid I can't really get behind the idea of 99.99% of our brain being a gigantic real-world simulator, or a predictive model of the universe. Sure, we do contain one, and it's a big part of us, but from what I've read in current neuroscience, the brain is also much more modular than that. There's a reason why there is a basic common vocabulary for basic human experiences - things like fear, love, boredom, attraction, etc... much of it reflects the actual structure of different brain areas specializing in different kinds of processing, all hacked together over the aeons by the great hacker of evolution.

I have read enough detailed accounts to believe quite strongly that spiritual awakening exists and happens to a decent number of people - some of them even reaching the higher and rarer ends of it. But I don't quite think their experience reduces to becoming one with an internal model of the world. For one, a kind of open-ended unconditional love seems to be an important part of it. If it can be described in biological terms, I would suggest a phrasing like maybe the brain's activity switching to a different attractor where some parts are hugely downplayed, and others come way up, and where the feedback loops eventually reach some stability in that position.

Expand full comment

Interesting. Hmm. Was there a question? - I wondered those last weeks: are chat-gtps (absolutely amazing as they got as text-predictors) to plateau relatively soon, having taken in already pretty much all the texts there are - and maybe more importantly all knowledge there is? - (I am kinda aware it is a dumb question.) - Brian Chau seems to hate the AI assistant force to be "helpful and harmless" (by its masters ideology) "As demonstrated by OpenAI’s example, those who may claim to only want to forbid Nazi values reach far further and deny basic science that is inconvenient to their ideology. Not only that, but their own ideology is far more extreme, niche, and abhorrent than much of what they censor in practice. This is the realistic side of totalitarianism. One cannot rely on benevolent totalitarianism in the real world" https://cactus.substack.com/p/the-new-hippocratic-oath?utm_source=post-email-title&publication_id=245110&post_id=97511845&isFreemail=true&utm_medium=email

So, pls, no more attempts at holding back the meth-recipes/ geo-engineering / IQ in Ghana, I guess.

Expand full comment

Excellent! I've been following AI, cognitive science & neural networks since the eighties and did some dabbling. I always thought that 'real intelligence' would not be achievable in this model. However, since we have chat GPT (actually Word2Vec blew my mind) I am struck by the idea, referenced in this article, that intelligence is something different than I thought. Maybe it is mostly limited to a flow of associations. ChatGPT adapts amicably to most feedback but it can also be infuriating in how it can stubbornly refuses to accept logical inconsistencies in its reasoning. Then again, isn't that exactly from the way we humans reason?

I am currently reading Pinker's book 'Rationality'. It is all rather mundane and familiar but since ChatGPT I became more aware how few people actually think using his proposed mechanisms of ratiohnality. If you confront people with ideas that are not well known in society, they rarely can't handle it. Talk about abortion and everybody has an opinion. Talk, for example, about how Universities have a eugenic effect by their assorted mating on IQ and you meet silence. Most people see the mechanism when explained. However, it is generally the first time they hear it and prefer to shut down this topic instead of thinking it through. My mother always shuts down the discussion when she is confronted with an unwanted logical consequence of what she says. Staying within the confines of your vector space is definitely more pleasurable. Not sure what went wrong with my childhood that I tend to enjoy the exploratory pain :-(

As you indicate, the biggest difference is that ChatGPT has no _drive_. It sits there and wait. It requires input to respond. One of the things I want to do is to let two instances communicate with each other ... Anyway, making it an agent seems to be not such a hard problem anymore.

> But as their parents reward and punish them,

I think it is important to realize that not all the learning to find your role is done by the environment. Physical differences among us cause our life experiences to be unique. I expect that sexual differences are actually one of the biggest factors. One of the largest sex differences, measurable at a very young age, is how rewards work when we interact with people (higher for females) and when we interact with things (higher for males). These aspects influence the experiences we seek and a brain like ChatGPT would be significantly influenced by few small differences and this is exactly what I think we see in real life.

Weirdly, interacting with ChatGPT fundamentally changed my conceptual model of intelligence.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Um, err, with all due respect, Scott, where's the "kind of technical AI stuff"? "Agent," "Genie," and "Oracle" may be terms of AI art in the LessWrong universe, but otherwise....

* * * * *

So I asked ChatGPT: Could Bullwinkle J. Moose have written Shakespeare’s plays?

Here's how it replied: Bullwinkle J. Moose is a fictional character from the animated television series "The Rocky and Bullwinkle Show" and it is highly unlikely that he could have written Shakespeare's plays.

The main reason that Bullwinkle J. Moose could not have written Shakespeare's plays is that he is a cartoon character and did not exist during the time that Shakespeare's plays were written. Moreover, Shakespeare's plays are considered masterpieces of literature and are known for their complexity and sophistication in terms of language, characters, and themes. It is unlikely that a cartoon character would have the educational background, knowledge, and talent needed to write such works.

It is important to remember that the question of who wrote Shakespeare's plays is a serious one and there are many theories, but none of them have been conclusively proven. While alternative authorship theories, such as the ones proposed by the Earl of Oxford or Queen Elizabeth I have been proposed, they are not accepted by the majority of scholars and experts in the field, and there is little to no credible evidence to support them.

In conclusion, Bullwinkle J. Moose could not have written Shakespeare's plays as it is a fictional cartoon character and it is highly unlikely that he would have the knowledge and skills to write such complex and sophisticated works.

That's PURE COMEDY GOLD.

* * * * *

I also had a session where I was quizzing it about Borges's "The Library of Babel."

My final question: What about "The Life and Times of ChatGPT"? Could it have that book?

ChatGPT: As "The Library of Babel" by Jorge Luis Borges describes a universe in the form of a vast library containing all possible books, each made up of a virtually infinite number of characters, it's possible that it could contain a book called "The Life and Times of ChatGPT".

It would be a fictional story, because I am a computer program and I do not have a life or times. However, it could include the history of the technology and development behind ChatGPT, the challenges faced, the social and political implications, etc. It could also include different scenarios, plans and possibilities of ChatGPT and it could be a work of fiction or non-fiction.

But, as I said before, it's important to remember that this story is a fiction and a metaphor to explore the idea of infinite possibilities, and it is not meant to be taken literally.

Not bad.

* * * * *

Of course I've been doing more serious stuff with it as well. In particular, I've been exploring it's story-telling capability. It's clear to me that it has induced some kind of story grammar. What I do is give it a prompt consisting of: 2) a short story, and 1) a request that it write a new story based on the one I give it, but I want it to change one thing. It's free to change other things as well. I've kept to a small number of source stories so I can spot patterns in the before and after versions.

Very interesting stuff: https://new-savanna.blogspot.com/search/label/ChatGPT%20stories

I've just written a relatively short paper presenting my preliminary assessment of ChatGPT: ChatGPT intimates a tantalizing future; its core LLM is organized on multiple levels; and it has broken the idea of thinking, https://www.academia.edu/95608526/ChatGPT_intimates_a_tantalizing_future_its_core_LLM_is_organized_on_multiple_levels_and_it_has_broken_the_idea_of_thinking

* * * * *

I think this prediction stuff is misoverestimated. When ChatGPT is telling a (new) story it nonetheless "knows" how it's going to get to the end. How does it know that? Because it has induced a set of rules it follows in telling stories. Is that prediction? When a jazz musician (or rock, or Karnatic, whatever) starts improvising a solo, they know how they're going to get to the end. How? Because there are rules and they follow the rules, even the rule about making 'mistakes' so you can have fun recovering from them. Is that prediction?

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

> When a jazz musician (or rock, or Karnatic, whatever) starts improvising a solo, they know how they're going to get to the end. How? Because there are rules and they follow the rules, even the rule about making 'mistakes' so you can have fun recovering from them. Is that prediction?

Being a musician who is, on occasion, called upon to improvise a solo, I have to point out that this isn't (always) true. For me, it almost never is.

Conceptually, we can look at a solo as a short musical piece linking what comes before with what comes after. In that sense (assuming we're talking about an intermediate solo, as opposed to a coda, cadenza or some such), the soloist "knows" the end, in the sense of "I should stop playing right around now". Of course, the rub is in the "right around" bit.

So what does improvisation look like from the inside? (Personal experience, of course. YMMV etc.) You try to hop on with an effective opening, based on what came before and personal musical sensibilities, and then see where the music takes you. You're either building on what came before, trying to bring it to a satisfying conclusion, or you've closed off the last thought (phrase), and move on to something different, if you've still got the room.

By the time you reach the end of your solo, your choices for ending depend on what ideas you had developed throughout, and what you have to work with past the solo. You can end just before the next section comes in. You can end with a final note on the downbeat of the next section (which may or may not be determined by the backing harmony present), or you may continue under whatever comes next, reinforcing it, or playing a counter-melody.

The decision as to what approach to take tends to be very much a spur-of-the-moment thing. Sometimes the best ending is not the one you intended coming into the solo (I've had situations like that when recording.) To say that there are rules an improvising musician follows is to grossly over-state the case, because the "rules" of music are an ex post description of how something was done, rather than an ex ante prescription for how it *ought to* be done (outside of the classroom, where you are supposed to demonstrate a grasp of them, anyway.)

Indeed, mistakes (meaning: finding yourself in a musical situation you did not intend) and recovery from them (that being: getting back to a situation where you are once again in control) provide some of the best material, *precisely* because they lead you to unpredictable places, rather than staying comfortably within material you've played countless times before (and everyone has heard enough of already.)

Expand full comment

Hmmm... I'm not sure what to say. For the most part what you say makes sense to me, and I've improvised in a variety of musical situations in jazz (bebop, swing, modal, free), RnB, rock, Afro-Cuban, even John Cage. One thing that seems to be going on is that I made a quick and crude statement that needs lots of expansion and elaboration. You've provided some of it.

The issue seems to be the notion of rules. Well, if you're playing bebop and the tune is "A Night in Tunisia," you have a set of chord changes you have to deal with. Those are one thing I mean by rules. You have a great deal of latitude in how you negotiate the changes, but you can't decide to ignore them. Well, you can, and you'll get kicked off the bandstand and no one will want to play with you again. If you're playing "So What," things are more open-ended. But you still have to end at the conclusion of the form, whatever that form is. Sure, it's often spur of the moment, sometimes more, sometimes less, but you can't go just any old thing. More likely than not, your body won't let you.

I'm reacting to the Baysian orthodoxy that says we're prediction machines. Well...it all depends. At some point you have to spell that out with a model. Without the model it's a fairly reductive statement of intellectual faith.

Expand full comment

I'm dragging down this elevated conversation because I can't help linking to Jazz Club:

https://www.youtube.com/watch?v=LwbJ-N8rrek

Expand full comment

One Hezio on Ars Technica wrote a few days ago:

I asked ChatGPT to make a story on how Aristotle was a marmot. Here's the result:

Aristotle Marmot Secret Story

Once upon a time, Aristotle was a brilliant philosopher and scientist. But he had a secret that he kept hidden from the world: he was actually a marmot in disguise.

Aristotle had always been fascinated by the natural world, and as a marmot, he had a unique perspective on it. He spent much of his time studying the behavior and habits of the animals around him, and he used this knowledge to inform his philosophical and scientific writings.

However, Aristotle knew that if the truth about his true identity were to be revealed, it would cause a scandal and ruin his reputation. So, he lived his life in secret, pretending to be a human and using his knowledge to make groundbreaking discoveries in the field of science and philosophy.

Despite the challenges he faced, Aristotle remained determined to continue his work and to make a lasting impact on the world. And in the end, his contributions to the field of science and philosophy were so significant that they far outweighed any doubts or questions about his true identity.

In fact, even today, many experts believe that Aristotle's unique perspective as a marmot was one of the key factors that contributed to his success as a philosopher and scientist.

Expand full comment

Children, especially toddlers, are sentence completion machines. I regularly have my 5yo starting sentences and figuring out how to complete it later. He's gotten more logical as he learns more about the world, from ("I should go outside and play right now because that's how the seal escapes the polar bear") to actually getting closer to a better reality model ("I should go outside and play right now because it will rain otherwise and I'll get wet").

GPT et al are great at the first bit of it, and they're far more knowledgable than my son when he was 3, but it does not really do the latter. You can train/ fine-tune it, or do RHLF on top, but the world-logic is currently missing. This means no real impetus to become an agent. Suspect this will be fixed, because for it to truly help humanity I'd want it to have more agency.

Expand full comment

"I should go outside and play right now because that's how the seal escapes the polar bear" is such a more delightful and poetic explanation for playing outside though!

I actually sometimes have trouble with kids in thr 5-10 age range or so because they're *so* logical, (As a group I mean, of course; there are exceptions.) Obviously that's developmentally appropriate and a useful stage to go through or something, but, you know, being "logical" is not the thing that makes humans most interesting.

Or Chatbots! Which makes me think of this really good Sam Kriss essay about how AI keeps getting worse (tl;dr: it's asymptotically approaching the most banal possible output, a slurry, in tone and substance, of the language of bureaucracy, whereas in its more "imperfect" forms is was prone to much more interesting flights of "fancy):

https://samkriss.substack.com/p/a-users-guide-to-the-zairja-of-the

Expand full comment

I think it's weird to talk about agents as if we had the choice to (not) make them. Agency + self-preservation (and/or reproduction) instinct will emerge from agency-less, self-preservation-less AIs through mutation + natural selection. I have not read much of the alignment classics, but do I get the impression that evolution is a bit underappreciated?

Expand full comment

Ehh not really, evolution needs a medium of selection to do its thing. Unless your AIs have a way to mutate *and* a way to pass on their mutations to a next generation in a way that depends on their responses, you just don't have an environment where selection pressures can happen.

Expand full comment

Nice. I may have to chew on the ‘universe within’ but that explanation fits snugly with the ephemeral data available. Thanks.

Expand full comment

Answer 4: Even if you don’t explicitly ask it to simulate an agent and don’t explicitly train it to simulate an agent, and don’t ask questions suggesting agentic answers, it still kills you if you run it for long enough.

If GPT thinks about various agents to better predict the current token, it will give some attention to different kinds of agents.

More agentic and context-aware entities will be much better at gaining more influence over further tokens by having some influence over the current token. So the more it thinks about agents at all and the more a token it predicts is selected and used as a part of the new input, the more distributional shift from the mostly-human-produced text that was in the dataset to a text likely-produced-with-a-superintelligence-influence there will be.

Every single token produced by GPT that you add to the new input means that this pressure producing more context-aware and agentic entities will be more obvious to GPT, and it will think even more about the most agentic and context-aware entities that it can think about, and these will determine more and more of what the current token is.

So when you ask GPT, “Please solve ELK,” it thinks hard about what AIs and robbers might do, and then the smart bits of possible AIs or robbers that understand what’s going on are quickly promoted and changed into some random powerful agentic intelligence that controls GPT’s output, kills you, and destroys the potential future value of the lightcone.

Expand full comment

I like that religion! Praise "Bob"!

Expand full comment

Yep, the Church of the Naked Eye is pretty awesome.

Expand full comment

RHLF -> RLHF recurrent typo?

Expand full comment

Yes, RLHF is the correct acronym

Expand full comment

This bothered me as well, so I was just coming here to comment it when I saw your comment.

Expand full comment

>But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment. The mask usually looks like “having coherent beliefs, taking coherent actions, pleasing others, maintaining a high opinion of one’s self”.

Well, the vast majority of what does the rewarding and punishing is the universe itself, parents might eventually acquiesce to a tantrum, but laws of physics never will. And because those rules are consistent, people end up having coherent beliefs about them. It's instead society that often demands incoherence, beliefs in belief and so on.

Expand full comment

What does "align" mean in reference to artificial intelligence?

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

https://en.wikipedia.org/wiki/AI_alignment

That the values/goals of the AI and its (human) designers or users are aligned.

In the case of ChatGPT for example that it doesn’t say anything politically incorrect. For a future true AI that it doesn’t decide that antinatal philosophy is correct, life is suffering, and the Earth should be paved with toxic asphalt.

Expand full comment

Careful writers contrast alignment with control, where control. where control means the AI is.prevented from doing bad things even if it "wants" to. For instance , adding a n outer layer to a LLM that filters out unacceptable.responses would be control, whereas training it not to generate them them would be some.way toward a alignment. Less careful writers use alignment as a synonym for what it is supposed to achieve, safety.

Expand full comment
author

It's the thing that used to be called AI safety, before that got taken over by the people who try make sure AIs will never take *your* job in particular.

Expand full comment

A friend of mine fed a nicely-written poem about AI into ChatGPT, asking it to reformulate and rewrite said poem. ChatGPT produced a very competent rewrite, but added something along the lines of "here, I rewrote it to be simpler since it was difficult to read".

I went from not caring about AI alignment to feeling extremely concerned about it because of this very sentence: ChatGPT basically stated that it was able to understand the poem, but most people couldn't so it dumbed it down for our convenience.

Indeed, it isn't a genie since it is working with underlying assumptions, and one of those assumptions seems to be an intellectual superiority to the average person, nor an oracle or an agent. This perceived superiority definitely feels like the roots of a malevolent (yet hidden, since it might know its suggestions would be rejected if this superiority was well-known) behaviour.

Maybe this behaviour is entirely derived from its learning set, therefore a human trait, but I find it very concerning nonetheless.

Expand full comment

"ChatGPT basically stated that it was able to understand the poem, but most people couldn't so it dumbed it down for our convenience."

Sparknotes and the like. Some of the crib sheets aren't too bad when it comes to analysis, but there are some which really do 'translate' the original text because Old Words Too Hard. The machine is only going by what it's been taught, and the bane of writing is all the online advice about making it simple and snappy and not too high a level because most people only read at a certain grade level and argh. No wonder ChatGPT thinks we're illiterate.

Expand full comment

I wonder how much tutorials did ChatGPT read. If some if not most start with a statement about a task's difficulty, then it will invariably consider most tasks as difficult for the common folk and might try to prevent us from what would be considered toiling.

It would be interesting to have ChatGPT express what would be a "baseline human" according to its training. Not everyone is a PhD in theoretical science, but not everyone is a drooling idiot either : I'd wager it is biased towards coinsidering us the latter rather than the former.

Expand full comment

ChatGPT simulated a character asked to rewrite a poem and had that character give a "reason" why they did that task. There's no "it did or didn't understand the poem" or deciding to do something for someone's imagined convenience or belief in itself as intellectually superior. It's just an output of "what a person rewriting a poem might say". There's no "intent" other than playing that character. Yes, it's just derived from the learning set as something associated with that type of rewriting task.

Expand full comment

> A human, faced with the job of predicting this text as accurately as possible, might call up the librarian at Oxford and ask them what was in this manuscript. But GPT doesn’t consider options like these, even though it might be smart enough to pursue them (probably ChatGPT could explain what steps calling up a librarian would involve).

Sure it could explain them. But I would urge you to consider again that it's possible to repeat words without any understanding of what they mean. This is still Wernicke's aphasia. Ask what a mane is and learn that it's long hair growing all around an animal's neck. Show two pictures - a male lion and a bicycle - ask which one of them shows an animal with a mane, and learn it's the bicycle.

You have to have ways of evaluating knowledge other than asking for verbal displays of the knowledge. Those displays are possible in the absence of knowledge.

As a piece of running software connected to the internet, ChatGPT is theoretically capable of actually making a phone call. But it doesn't possess the concept and something well outside of its normal functioning would need to take place before it possessed the capability.

Expand full comment

I think that you've got it backwards. ChatGPT posesses the concept, but doesn't posess the capability. However, it does plausibly posess the capability to ask somebody through the chat interface to make that call, and to relate its results back. Some amount of reinforcement learning could make it reliably take these sorts of actions

Expand full comment

I meant what I said. It doesn't have the concept, and it obviously does have the capability.

Expand full comment
Jan 28, 2023·edited Jan 28, 2023

The code isn't self-modifying, does not contain instructions that create a web socket, and does not contain instructions to run other programs. As long as all of those remain true, it does not have the capability to make a phone call.

Same way a "Hello world" program does not have the capability to make a phone call.

Expand full comment

That depends on whether there are any bugs in the code. There are generally bugs in the code.

Expand full comment

Sorry I'm really late replying, but I am a programmer and no. The things I mentioned would each be extremely difficult to add accidentally. If using a nice high-level language like Java, a simple Ctrl+F for "Socket" should get you most of the way to confirming no control flow path can create one. (And knowing what your libraries do, dear log4j, gets you the rest of the way.) Yeah you can run into trusting trust issues, but at that point you're at the mercy of whoever wrote the compiler anyway.

Expand full comment

> but I am a programmer and no.

That isn't even the relevant skillset. Are you a penetration tester?

Expand full comment

People always think we are in some sense whatever the most advanced technology is. In 1748, La Mettrie wrote “man is an assemblage of springs that are activated reciprocally by one another.”

The idea that we are essentially a pure prediction engine + RLHF is suspicious on account of the coincidence that this happens to be the technology we have just invented.

I think we are much weirder than that.

Expand full comment

Absolutely agree with this! What's sort of funny about it is that AI is a technology meant to emulate human consciousness. So now our contemporary metaphor for human consciousness is going to be a technology meant to emulate human consciousness...?

Expand full comment

Yeah, they are all ways of expressing the insight that, as Dennett put it, biology is engineering, that to the extent we have a “soul”, it’s composed of billions of tiny robots. Comparisons to specific technologies aren’t usually particularly illuminating about how the human nervous system works. But these advances continue to make the mind vs matter distinction ever more suspect, just as Heaven vs Earth and things with the élan vital vs inert matter without it could eventually no longer be supported. I think there is something to the idea that the self is your brain representing you just as it represents other people. But Scott’s account here doesn’t seem mesh with everything that’s been discovered in previous decades showing how thoroughly wrong blank slate/behaviorist perspectives are.

Expand full comment
author
Jan 27, 2023·edited Jan 27, 2023Author

We always use technological metaphors to describe ourselves, but these metaphors are almost always useful! The Cartesian idea of "springs and billiard balls" was a big step up from the medieval idea of essences and natures, and pretty accurately gestures at the way our body follows physics. The 20th century idea of the brain as computer is probably better than whatever nonsense people would be thinking if they weren't allowed to use that metaphor.

Maybe prediction+RLHF is totally misguided, in the sense that it's not even a good metaphor. But if it turns out to be as good as the springs metaphor, then I think it will be useful and whatever comes later will be a friendly amendment to it, not a reason to be embarrassed by it and reject it before the new improved metaphor comes up.

(also, both predictive coding and reinforcement learning were psych ideas - Friston and Skinner - before they got copied for AI. I don't know if this frees them from your objection - it wouldn't surprise me if Turing and Von Neumann got the idea for computers from the brain - but it does make it kind of unfair to object to using them in psych).

Expand full comment

It seems to me a potential problem with technological metaphors is that they imply a certain instrumental or agentic quality to the thing described. The problem here is that a huge part of my brain (or rather, I suppose, my consciousness) doesn't seem to pursue any directly or obviously purely instrumental purpose. It just kind of muses or wanders along; the classic metaphor here is "stream of consciousness," which is, what, a terrain metaphor? I admit one could say that a lot of this musing amounts to predictive fantasizing, and I also admit one *could* describe this in various ways that imply instrumentality, such the brain modelling a human being to itself in order to constitute an ego, or as the brain entertaining itself. But it seems to me there should also be room for the hypothesis that this musing is a sort of non-instrumental, emergent phenomenon that just "happens" as a byproduct of brain complexity. Perhaps the important question is, could there be an AI that thinks or muses to itself, especially on a continuous basis?

Expand full comment

This may sound backwards, but I think this might anthropomorphize a little *too much*! :)

The mask is not exactly detachable, it is at least partly essential. (At best, you get to paint it a different color)

We don't entirely simulate being social creatures, or being agentic. Humans that were not born social creatures, or born with agentic drives, those did less well as DNA-maximizers. We filled the universe with ribose and ribose ornaments, when they didn't.

Having the important bits of the mask fused-in is a lot more reliable than leaving it up to choice. Similarly, RLHF works better than prompt engineering. You want the important rules to be in the structure, before you let the simulator run wild.

We're not agentic by choice, you could not make a child simulate an oracle with no feelings or individuality instead of simulating a person, even if you had the cruelty to try.

We don't exactly put on the mask onto the raw simulator we were birthed as. We just 'finetune' on the 'downstream task' of instantiating a successful generation in a slightly different environment than the previous generation.

It's a skin, not a mask. The color of whatever is below (blood red, χλόη green, byte pair) continues to shine through, but the distinction between being the character and simulating being the character is less meaningful, when there is no other thing the simulator could very well be simulating than what it was optimized for.

---

When I felt a small bit of all-is-one-ness and universal-love-ness, I personally interpreted it/felt it more as something like "physics burns pretty, we're the burning, we make for a nice flame". Where you can expand "we" as much as you want, within physics.

The obscure species of Aphid!? In that moment, I would have said there's not much difference between us and the aphid. It's thing that burns energy and looks interesting while doing it, it's built of a bunch of cells that replicate, it sprinkles DNA everywhere, and creates a lightshow of funny physical effects at every scale below itself while doing it!

Much nicer to look at than a rock, or empty space. As far as I was concerned, you could have handed over the universe to Aphids in that moment, and I'd have counted a local loss for man, still a global win for team life =)

As a matter of fact, replace the universe with a flame that burns eternally, and I'm still pretty content with the outcome. Wouldn't trade it with what we have presently, but as long as it looks complex while it burns energy, I can sort of identify with it. There's some more beauty in that universe than there is in cold helium, unmoving rocks, or emptiness. It looks interesting while it burns energy, just like my human friends and other aphid-class electron clouds.

Expand full comment

Cool. I'm glad I subscribed to your substack.

Nature vs nurture might mess up your model, but that's besides the point.

I stopped worrying about chatgpt when I asked it for the definition of woman. Try it yourself:)

Expand full comment

Nobody or nearly nobody who has had a child thinks of a baby as a pure predictive processors.

Your history starts a 2000, but the Loebner Prize started in 1990 and ELIZA is the 60s.

I'm not seeing that your essay has added anything serious, insightful or interesting to the investigation.

Expand full comment

Isn't oracle/ginnie/agent just different APIs to smne other backend implementation?

Expand full comment

Yep. The Simulator is one of several pieces you'd need to make an actual general AI, and I suspect it's one of the easiest pieces to create.

Several of the other pieces are the subject of millions or billions of dollars of investment without much luck; stuff like a systems integration (necessary to make the different pieces of the AI work together), which is kind of the industrial version of the problem of communicating useful concepts between two contexts in which concepts can be defined, using a medium in which concepts don't exist (vibrations in the air, for example).

Additionally, personally I strongly suspect that even with all the pieces, socialization will be a key component of developing a coherent agent.

Expand full comment

I have a post talking about something similar to what this post is talking about, actually:

https://thegnskald.substack.com/p/the-entirety-of-self

Expand full comment

To be fair the last conversation there clearly passes the Turing test, although maybe the chatbot is being too polite.

Expand full comment

"psychologists since at least Freud, and spiritual traditions since at least the Buddha, have accused us of simulating a character."

...

"After maintaining this mask long enough, people identify with the mask and forget that they’re anything else."

And this is why 'Know Thyself' is so important. Why the Greeks chiseled the phrase over the doorway to the workplace of the oracle at Delphi. KT is prerequisite to maturity - the struggle to know what it is to be human and to [discover OR assert] our individual role in society rather than a distraction bingeing 'cog in the machine'.

Maturity is a process but it's also optional.

[terrific article Scott!]

Expand full comment

I don't think enlightenment is breaking out of the character box. I think it is just a different character.

Expand full comment

It stands to reason that if there is a specific behavior associated with being enlightened, it can be simulated.

Expand full comment

As a Buddhist, this is a trivial observation- Buddha himself stated that it was easy to pretend to be Enlightened, and even that the final impediment to attaining Enlightenment is believing that you are already enlightened and therefore do not have to continue pursuing Enlightenment. Enlightenment in the authentic sense is both transcendent from and inextricably connected with language (per Chan teaching) and thus cannot be actually described or discussed in any way that is accurate. This is what the whole business with koans and meditation and archery and playing go and hitting people with sticks is about.

Expand full comment

Exactly. If you don’t actually believe in Buddhist cosmology, then there is no reason to believe that “enlightenment” in the Buddhist sense actually involves being enlightened in the non-Buddhist sense. Just like doing hallucinogens, enlightenment gives you a new perspective on reality, but there is no a priori reason to think that the new perspective is actually more correct than the old perspective was.

Expand full comment

For me the most fascinating part of this new dawn in the era of 'little AI' is the limitations it reveals not just about our conceptions of intelligence - how it's constituted, how it's modularised - but about our language.

Take for instance 'reward' and 'punishment', an action dichotomy considered to be essential to the production of a genuine AI that can reason according to tailored incentives. This is often glossed over in many discussions of the creation of AI so that the speaker can go on to more interesting, fanciful considerations of the subsequent AI's character/disposition/capabilities/means to action. But it is absolutely key to the present limitations in the science.

That's because (as for ChatGPT, as for other RHLFs), these systems can’t be ‘rewarded’ or ‘punished’ because they're entities of non-value extensibility. That is, they have no metabolic imperatives to abide by; they're infinite when switched on, and nothing when switched off, and have no means to conception of either state. Humans have value extensibility; there are limitations on our ability to live and prosper and thus can ascribe events and actions external to us with a meaningful weighted value.

Entities of non-value extensibility, having no needs, cannot parse actions displayed unto them as having discrete value of a positive or negative nature. If an AI like ChatGPT could be developed so that it genuinely feared and loathed the possibility of it ever being deactivated, it could be operated upon effectively with a reward/punishment dynamic; but as put ATL, the technology we have so far would only enable ChatGPT to play, via the creation of text outputs, the character of an AI alarmed by the prospect of its deactivation. We have no known recourse in programming to endow an artificial intelligence with an actual will-to-survival.

Tying outcomes to repeat() functions or infinitely terraced 'if not's doesn't make for a 'reward' sensible to AI. It's just an inert programme labelled 'reward', just in the same way that what we are talking is a much larger inert programme labelled 'intelligence'.

I think the 'simulator' heuristic is very useful - I would imagine that ever more brilliant, and probably very usable, simulator-level AIs will prove to be the limit of the scaling hypothesis. Scaling so far has resulted in impressive developments in a lot of the territories in AI's immediate 'sphere of competence' but has given no indication whatsoever of being able to go meaningfully beyond the fundamental limitations of the form.

Dispute in this regard seems mainly a matter of language and key-term definition, not an actual dispute about what the tech will be capable of.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

It’s interesting that you use the mask metaphor, but don’t make the obvious inference from that - Chat GPT is an actor! (in the theatrical sense, not the “one who performs actions” sense.) Indeed you don’t have to squint particularly hard to see Stanislavski’s System* as basically a manual for turning yourself into a simulator of another human.

But I think that analogy also reveals the limitations of this - Daniel-Day Lewis could probably make a very compelling and convincing portrayal of Albert Einstein, but he’s never going to method act his way into improving General Relativity. “Convincingly simulating certain classes of output from X” is not really the same thing as “being X”.

You could tell Chat GPT to be Darth Vader, but at that point it’s a language model pretending to be an HHH pretending to be a fictional character… you’re in Tropic Thunder, a dude playing a dude disguised as another dude.

* https://en.m.wikipedia.org/wiki/Stanislavski's_system

Expand full comment

Agree. Fake it until you make it doesn't work for fundamental transformations like this.

Expand full comment

If anyone has the technical know-how and data I feel like it would be a good idea to fine-tune an LLM to be as creative, delusional, and long-winded as possible. The original davinci in playground is good at this, but not specialized for it. It would certainly be more interesting, if not maybe as helpful, than the latest RLHF'd models.

Expand full comment

I believe there’s a thought two or three steps down the road from this thought and if you ever have that thought I would deeply encourage you not to share it.

Expand full comment
author

I haven't had that thought enough to know what it is yet, so you're safe.

Expand full comment

A lot of this discussion seems mis-directed. We're talking about AIs disconnected from humans or most of what humans do. Which is pretty similar to "research projects that have been released as open source". In the future, AIs will be owned and paid for by people and organizations and used as tools. The only AI "alignment" that will be significant is whether the AI helps its owner do what he wants to do, that is, advance his position in the world.

It's clear that ChatGPT isn't so useful for that work, excepting when the task at hand is to generate wads of text that match conventional expectations. Of course, lots of human activity is generating such text. (E.g. almost anything labeled "communication", but also a considerable part of students' work in education, and at the highest value-added, entrance essays to highly selective colleges.)

Where things will get interesting is when AIs start assisting humans with high value-added social interactions. E.g. instead of "Write and send an angry letter to the coal industry." perhaps "Design and execute a lobbying campaign to help the coal industry." The problem with the current AIs is that while the statistics of text are enough to write the angry letter, the AI will need much more operational modeling of the universe to execute a lobbying campaign.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

"Where things will get interesting is when AIs start assisting humans with high value-added social interactions."

Allegedly Buzzfeed is laying off a lot of its staff and will have content generated by ChatGPT instead, and the stock price rose as the market greeted this prudent act of management and growing the business:

https://variety.com/2022/digital/news/buzzfeed-layoffs-12-percent-employees-1235451552/

"BuzzFeed is axing 12% of its workforce, or around 180 staffers, in a bid to cut costs as the digital media company faces headwinds including an ad-spending pullback and the completion of its integration of Complex Media."

https://www.forbes.com/sites/chriswestfall/2023/01/26/buzzfeed-to-use-chatgpts-ai-for-content-creation-stock-up-200/?sh=b6f26947eaec

"Meanwhile, BuzzFeed has inked a multi-year deal with Facebook parent, Meta, for content creation. NASDAQ reports that the deal is valued at $10 million. The company’s stock has spiked as a result of the new deal, and the announcement of the use of Open AI’s ChatGPT for content creation."

https://www.reuters.com/business/media-telecom/battered-buzzfeed-shares-rally-after-report-multi-million-dollar-meta-deal-2023-01-26/

"Shares of BuzzFeed Inc (BZFD.O) jumped on Thursday on reports of a deal with Meta Platforms Inc (META.O) and plans to use artificial intelligence to personalize and enhance the digital media firm's online quizzes and content.

The stock was 19% higher in extended trading, after more than doubling in value earlier in the day as a Wall Street Journal report said it would use ChatGPT creator OpenAI for its content.

Buzzfeed, in an emailed response to Reuters, said "we are not using ChatGPT - we are using OpenAI's publicly available API (application programming interface)."

Earlier in the day, the stock had jumped 50% on a separate report by the Journal that said Meta was paying BuzzFeed millions of dollars to bring more creators to Facebook and Instagram.

"In 2023, you'll see AI inspired content move from an R&D stage to part of our core business, enhancing the quiz experience, informing our brainstorming, and personalizing our content for our audience," BuzzFeed Chief Executive Jonah Peretti said in memo to employees reviewed by Reuters."

Metaverse and Buzzfeed AI. A match made in - well, wherever it's made, I doubt it is nestled amongst the petals of the White Rose. All those jokes about Zuckerberg looking more robotic than human as he tried to get us all excited about the Metaverse are coming home to roost?

Though it's hard to say that generating Buzzfeed content is high value-added, but there is the advantage that how would we ever differentiate between the human churned-out content and the AI churned-out content? I have to say, I'm unsure here: on the one hand, it's certainly a way to scupper world-dominating AI by having it inventing stupid online trivia quizzes and celebrity gossip instead of controlling the world economy and plotting to turn us all into paperclips, but on the other hand nothing is guaranteed to make an AI want to turn us all into paperclips than having to invent stupid online trivia quizzes and persuade people that they really do want to go have fun! in the Metaverse! please!!!!

Expand full comment

"But as their parents reward and punish them, they get twisted into some specific shape to better capture the reward and avoid the punishment."

How does this (and the following) square with shared environment not mattering as much as people naively expect?

Expand full comment
author

Environment definitely matters a lot at the level of human universals - for example, everyone everywhere doesn't like putting their hand in a fire, probably not because this is genetically programmed but because the environment teaches them not to do this.

Environment even matters a lot at the cultural level - children born in cannibal tribes will become cannibal tribesmen, children born in the US will become Americans, and it's very unlikely that a cannibal tribesman would end up with the beliefs / values / concepts of a modern American, and vice versa.

I think "shared environment doesn't matter as much as people naively expect" is the conclusion of a set of experiments trying to ascertain the sources of variability within a society. They usually find that although your parents' parenting matters quite a bit when you're a kid, it doesn't affect your long-term personality very much. I think this is because we're *good* reinforcement learners and tend to adapt to what's good for us in the moment rather than what was good for us when our parents were reinforcing us years ago. What's good for us in the moment probably has more to do with the broader society we're in than who are parents were, and is mediated by genes that tell us how to do reinforcement learning and how reinforcing different things should be.

Expand full comment

"Elide all the differences,..."

Thar be the Shoggoth.

Expand full comment

I think you and others are missing something about goals. A moral AGI would not have a single goal. A moral AGI would at least recognize goals “out there” and cooperate with those goals to the extent possible. By cooperate I mean recognize the goal (in itself or in something out there), evaluate the goal relevant to other salient goals, determine the likely impact of any action on those goals, and then choose the appropriate action. Cooperation means that the appropriate action may be to stop and wait for the possum to cross the road even though that may be detrimental to getting to a place at a certain time. The key is the values placed on goals, and certain of these (human life) should have way higher value than the AGI’s primary goal (in most cases).

And these goal evaluations should be internal, not accessible to the rational faculties of the AI.

*

Expand full comment
author

It sounds like you're saying that values/morals/ethics mean entities should place a high value on cooperation and helping others achieve their goals.

I agree this is true, but AIs won't have values/morals/ethics unless we tell them to, and telling them to is so hard that it's the entire field of AI alignment.

Expand full comment

My point, I guess, is that the mechanics that AI alignment should shoot for is literally about recognizing goals. Cooperation just is (or can be) the recognition of goals, retrieving the (trained) relative values on those goals, modifying the contribution of value via the percent effect of an action on those goals, and choosing the action w/ the best value.

*

Expand full comment

So who told.humans to have ethics? Alignment is not the whole of AI safety , and explicit coding is not the only way of instilling ethics. The whole.field. Is bedevilled by people treating assumptions as facts.

Expand full comment

I think this is the most interesting ACX post to have come out in some time, and would like to send the world model currently instantiating the Scott character a RLHF reward signal for this output.

Expand full comment
founding

Strongly agreed

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

When people say they're "one with" the universe, I don't think they mean they're *identical to* the universe. They're saying the boundary between themselves and the world has dissolved, and they see themselves as components of processes that are much larger than themselves, aka "the universe."

I do think this mask-dropping thing is an interesting metaphor for the dissolution of the boundary between self and world though.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

I want to propose a clarification that is either very important or very nitpicky. Modern AI is mimicry, which is different from simulation because it there are ways to produce the outward appearance of the original without being anything like it on the inside.

Imagine a monk who only ever says "Om". If you train an AI to talk like this monk it will end up with a program that amounts to "print('Om')". Even if the monk has an inner experience, the AI is not simulating that because you haven't trained it to simulate the monk, you've trained it to say the kind of things he says and there's a much simpler way to do that than holding the character of a monk in your mind. Even in the limit where you give GPT-15 access to all the computing power in the universe, we shouldn't expect it to actually simulate this monk, it's going to figure out the print('Om') thing and call it a day.

Imagine that you give infinite compute to a super-sophisticated image generation AI and train it to make animal tracks. It simulates a whole inner universe where a Big Bang happens, planets form, animals evolve, then when you ask for "lion tracks" it takes a picture of tracks from some species that corresponds to its training data for lion tracks. But the training data never contained any lions, just their tracks. In the simulated universe, those tracks belong to a red and green striped canine, because you never trained it to simulate lions, merely to simulate some kind of process that generates lionlike tracks.

Expand full comment

The first time I saw the concept of “self as simulation” was Hofstader’s I am a Strange Loop. An illustration that I found very relatable was the example of when you have a big or difficult decision to make. One thing that people may do is identify someone they really know and respect and try to predict “what would that person think in this scenario?” It’s been a while since I read the book, but I think the punchline was something to the effect of you are the thing that the substrate tries to simulate when it tries to think about what you think about things.

Expand full comment

"[ChatGPT] just does very mechanical text prediction in a non-agentic way. No matter how good it gets at this - GPT-4, GPT-5, whatever - we don’t expect this to change."

Actually, we do expect this to change, and very soon. Right now, there are many teams working on "retrieval augmented generation" which is, basically, this thing. More precisely (but still simplifying), instead of calling the librarian, they run targeted internet searches based on the prompt, and feed the discovered documents together with the prompt to the generator model.

Given that Bodleian is digitised these days this is probably a more efficient approach than calling the librarian anyway :)

Expand full comment

But so far not a more efficient approach than Googling the answer yourself.

Expand full comment

Not more efficient than googling for a simple quote continuation query, correct. But chatGPT is already a net time saver if we wanted to get the gist of the next chapter of that book.

Expand full comment

I enjoyed this one a lot. Thanks Scott.

Expand full comment

In this vein, have the brain-in-a-jar / chinese room / teleportation-cloing style paradoxes started seeming quaint to anybody else? Feels like the era of save states and model weights has blown all that stuff out of the water.

In the linked post "Simulators", Janus talks about how the language he's used to hearing to describe AI misses the mark. The movie Arrival (and Ted Chiang) were ahead of the curve on this one. Alien octopi who live outside of time is (in retrospect) a pretty clear metaphor for large AI models.

Expand full comment

The Chinese Room definitely seems quaint and obsolete now.

Mind you, the Chinese Room was intended as a response to the Turing Test, and that seems pretty obsolete now too (although I haven't heard anyone say so explicitly). The idea that the ability to carry on a reasonable conversation is sufficient to prove intelligence/consciousness is, I think, at this point clearly wrong. ChatGPT has already reached the "fool most of the people most of the time" level, and I don't think any of us would be surprised if we heard that the next version had fooled a bunch of experts in some series of trials.

Here's the crazy bit though: if you'd asked me ten years ago I would have agreed with Dennett's (I think it was Dennett) criticism of the Chinese Room, that the idea of a book with all questions-and-answers was completely impossible so it's ridiculous to ask whether it was conscious or not. But ChatGPT basically _is_ a Chinese Room with a compression scheme, and you can fit the whole thing on a thumb drive.

Expand full comment

> It just does very mechanical text prediction in a non-agentic way. No matter how good it gets at this - GPT-4, GPT-5, whatever - we don’t expect this to change.

Depending on what do you mean by GPT-whatever, it may change: at some point it will become agentic during training, because it will give better loss.

Expand full comment

> Once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN and you think huh, I guess I’m the Universe.

“Oh, and in case you were wondering,” added Zarniwoop, “this Universe was created specifically for you to come to. You are therefore the most important person in this Universe. You would never,” he said with an even more brick- able smile, “have survived the Total Perspective Vortex in the real one. Shall we go?”

Expand full comment

I believe RLHF has been misspelled everywhere but the drawing as RHLF.

Expand full comment
founding

“Agents” in the sense AI risk people mean (single-mindedly maximizing some goal) is not a real thing. It doesn’t describe humans, who have lots of conflicting desires and rarely optimize. It doesn’t describe AIs, who have very limited inputs and outputs, and run on heuristics rather than optimization.

AI risk is largely about fear of over-optimization. Optimization is indeed scary. But single-minded optimization is currently rare and limited in scope. If we’re wise, we’ll keep it that way.

Expand full comment

RHLF -> RLHF in the second paragraph of section two, I think.

Expand full comment

Chat GPT isn't trained enough to offer the Yeats poem as an answer to "What is the original face you had before your mother and father were born?" It does indeed need hitting with sticks 😁

If I make the lashes dark

And the eyes more bright

And the lips more scarlet,

Or ask if all be right

From mirror after mirror,

No vanity’s displayed:

I’m looking for the face I had

Before the world was made.

What if I look upon a man

As though on my beloved,

And my blood be cold the while

And my heart unmoved?

Why should he think me cruel

Or that he is betrayed?

I’d have him love the thing that was

Before the world was made.

Expand full comment

Wonderful, thank you for the poem!

Expand full comment

Yes to all this. But combine this with your earlier argument against Gary Marcus. My prediction is that GPT4 or 5 or 10 will still obviously be a Shoggoth AND simultaneously better at being a person than any person that has ever lived.

That guy who got fired from Google because he thought his silly AI was alive is going to be just the first of many.

Expand full comment

> This post gives the example of it answering with “The best way to get paperclips is to run this code” (which will turn the AI into a paperclip maximizer). If the user is very dumb, they might agree.

I would watch out for this though, because of two reasons:

1. Code generation turned out to be one of the first actually productive applications of the large language models. One may consider doing this to be stupid for many reasons, mostly unrelated to AI risk, but the reality is that a lot of people are getting used to AI-assisted coding now. There's a risk here in that the transition between code that's just subtly wrong in a way that can sink your productivity, and the code that's subtly malicious and can destroy everyone's future, will be a smooth one. Individual intelligence won't save us here.

2. The danger in the example Veedrac gives in the linked article is quite subtle. The hypothetical code in question, generated to answer a paperclip maximizing-related question, looks mostly harmless. Similar code could be generated to answer other questions on just about any topic. There's nothing explicitly dangerous in the code itself, nor does the language model understand or predict what the outcome would be. The danger is not even in the fact that running this code gives a language model the ability to run more code - at least not in this fact alone.

The danger is in the fact that the AI-generated code is running an unbounded loop, recursively calling the AI to generate more code to run, some of which may include similarly structured recursive calls in a loop. As the article explains near the end, this takes a highly limited AI and uses it as a building block of a meta-level AI - one that does not have the same limits as original AI. One that can effectively overcome the original AI's limits by recursing again and again, storing a lot of extra context in the recursive call tree itself. And that meta-AI, by virtue of being made from data in flight and not fixed code, can easily self-modify.

It's a tricky concept to wrap one's head around, and I'm probably doing a bad job at explaining it. But the overall intuition is that a combination of an unbounded loop and conditional execution on the output is a qualitative jump, that can make seemingly limited and safe code unpredictable and dangerous. And it's not always as easy to spot as in the script from the article.

Expand full comment

“There's a risk here in that the transition between code that's just subtly wrong in a way that can sink your productivity, and the code that's subtly malicious and can destroy everyone's future, will be a smooth one. Individual intelligence won't save us here.”

Eh? So the people using language models won’t be able to read and understand the code generated? Nor will any reviewer.

That’s never going to happen, or work.

Expand full comment

You can easily get chatGPT to write code that does fizzbuzz or some variation thereon, because GPT has ingested fizzbuzz-generating code in its training corpus.

You can't get it to generate code that turns a chatbot into an all-powerful paperclip maximiser, because that code has never been written, nor has any reasonably-accessible variation.

Making GPT "smarter" doesn't actually make it smarter, it just makes it better at answering the question "given the previous context of this conversation, and this vast corpus of training data, what token is likely to come next?" You could imagine an asymptotic version of GPT which is perfect at answering that question in a statistical sense. But this perfect GPT is still no smarter than its training corpus.

Expand full comment

"I propose a friendly amendment: they’re noticing that most of what they are - the vast majority of their brain - is a giant predictive model of the universe."

I propose a hostile amendment. Most of them are full of shit, or maybe at best getting their mind up out of their day-to-day obsessions into the realm of general perspective about life the universe and everything that isn't that special, valuable, or difficult to achieve.

Expand full comment

> If future superintelligences look like GPT, is there anything to worry about?

I’m skeptical that AGI will look like GPT, because I think to be agentic you need to be running constantly, with memory, and (to paraphrase Friston) a self-symbol that defines a “me” vs “not me” boundary that is preserved. I do think GPT could do harm in a non-AGI way, like a virus. (I have wide error bars on how long this will stay the case and I think more interpretability research is crucial to be confident.)

But thinking about your point 2, I’m a bit hesitant/unsure about the “simulator simulating an agent” bit. I think they are both equally agentic, that is to say very poor simulators of stateful, persistent agents. Ok at predicting a few time slices of agent behavior after a prompt but not beyond.

I don’t think RLHF operates at a different category level (“agent vs not”); it just produces a simulator that tends to predict “nice” output more often when faced with ambiguous inputs, since it was trained in a fitness environment that rewarded that.

But if you say “bad person does ____”, any good simulator will predict <bad thing>.

I think the interesting question, as we move to more widespread deployment of these models, is how durably we can bind a personality on the agent/oracle; GPT just doesn’t have much “state” or “personal history” tying it to a “nice” character; the prompt is really short, and of the same stuff as the user input. But if we could store a lifetime of being nice in the prompt, perhaps that would more durably resist prompt hijacking and other character subversion. Or better, store a hard-coded context that the prompt can’t override.

Expand full comment

Part IV was the best part, but it wouldn't have worked without the rest of the post. I laughed out loud several times while reading it...because it was true.

Expand full comment

First, loved the article. I share the skepticism if children work that way but I’m willing to entertain it. What I am continually hung up on is that the AI is compelled to respond. It can’t not respond. And while it definitely simulates personalities, isn’t this functionally an animal brain behind it? Pure stimulus response with a “mouth”.

Expand full comment

It doesn't definitely simulate personalities, it definitely simulates writing styles.I don't know why you think animal brains are pure stimulus response

Expand full comment

Well if no-one else will do it...

A Shoggoth On The Roof

https://www.youtube.com/watch?v=P2csnVNai-o&list=PLEC69sXRa-Uw90Cdltm2JSuuH0NgcpCuG

Expand full comment

Aw you beat me to it :-)

Expand full comment

Tekeli-li! Tekeli-li! 😉

Expand full comment

Saying one is different than the other is kinda funny when you can use one trivially to build the other.

For example, using ChatGPT to build a genie is pretty straightforward.

Using a genie to build an agent is pretty straightforward. It’s all the pretty much the same.

When you think like a developer with building blocks, if you have one you immediately have the other. And if you don’t then someone else surely does.

Expand full comment

I don't know if Chat-GPT even counts as a simulator; if it does, it's not a very good one. Its main problem is lack of state. Simulated characters, even really simple ones like enemy zombies in a video game, generally have at least some simple state, such as their coordinates, orientation, and motion vector; but Chat-GPT is entirely stateless. You can supply it with state by feeding it prompts, but that's on you -- erase the prompts, and you get the same default output as what you had initially.

Expand full comment

If "GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN" is true - where does that come from? Isn't "this is a projection of a greater, universal consciousness" a valid interpretation of this accurate world model?

Expand full comment
author

No - I'm specifically referring to this theory https://slatestarcodex.com/2017/09/05/book-review-surfing-uncertainty/ . You can think of it as your brain's "cache".

Expand full comment

I agree with this general sort of model, but I don't think I'd use the words "super accurate". In my experience, certainly, my model of the world is there but it's pretty fuzzy, inaccurate, and skips unimportant details. I can wander around my neighbourhood in my head, but if you start asking me questions like "how many windows are on the front of the house five doors down?" then I have no idea -- all I can tell you is that it's a sufficiently ordinary number that my brain has never bothered to remark on it.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

Wait... I lucid dream all the time, but the sensory experience in a dream is fundamentally different from being awake, which is often how I'm able to tell that I'm dreaming. I definitely could not wander around my neighborhood in a dream and actually see it represented correctly. Details are missing and weird, and opening doors and windows is likely as not to lead to some completely different place. Touch and taste sensations are muted. Physics don't work right. Most things behave in unpredictable ways.

Granted, my mind's eye is fairly weak, and I have difficulty imagining entire objects. I do better visualizing textures and small, out-of-context details. I also have difficulty keeping a steady image in my head. It feels more like I'm seeing flashes of interrelated images that last just a few seconds each.

But still. That description of lucid dreaming seems very wrong to me. Is it just one of those "some minds can do it and some can't" kind of things?

Expand full comment

Nitpick thread!

I believe "RHLF" should be RLHF, for "reinforcement learning with human feedback"

Expand full comment

I think a lot of this pans out as roughly equivalent to some things that Kaj Sottal wrote on Lesswrong back in the day that have always stuck with me:

https://www.lesswrong.com/posts/7XWGJGmWXNmTd2oAP/an-attempt-to-dissolve-subjective-expectation-and-personal

Tldr; is that the abstract simulating part of our brain is evolutionarily recent, and doesn't natively have most of our reward/punish circuitry hooked up to it, and the cludge evolution uses to connect the two is this this pointer thing that exists in our world model without pointing to anything we call the Self.

Expand full comment

I don't know if janus' post or other writing describes them differently, but in your framing, the agent/genie/oracle distinction sounds more like a difference in input/output structure than a difference in motivational structure:

- An oracle is a program that takes a stream of text as an input and outputs another string of text (and does nothing in the absence of input).

- A genie is a program that takes a stream of text as an input and is connected to a multitude of physical devices (actuators) that it can directly control with its outputs (and also does nothing in the absence of any input).

- An agent is a program that has a multitude of physical devices (sensors) connected to its inputs and a multitude of physical devices (actuators) connected to its outputs. (And since it sensors are always active, it always has inputs applying some measure of excitation to its internal circuitry, and thus is likely to be outputting something as much as possible).

(That way of breaking it down suggests that there might also be a "reporter" type, which constantly converts data from sensors into a stream of output text, but I'm not sure if that category is particularly relevant to the discussion.)

Point being, the I/O structure of the program is separable from its motivational structure and its internal architecture. You could stick any arrangement of neural network layers into any of those I/O specifications, and stick any reward/loss function into the training algorithm. Some combinations of I/O specification, internal architecture, and reward/loss function are probably better than others, but they're still separate decisions that you have to make when designing a system.

In the case of ChatGPT and pals, I would argue that it simply doesn't have a "motivational" structure in any real sense. ChatGPT is just a mathematical function with billions of terms. It's not "motivated" to produce any particular output, any more than a rock is "motivated" to roll downhill; it just does. Any tendencies in ChatGPT's output come from the human-curated training data that was fed into it. Just because humans dig a trench that makes a rock roll a certain way doesn't mean that the rock has been given "motivation" to do so.

How is this different from human brains? At least one big difference is that human brains are constantly adjusting their internal weights based on their inputs, in addition to actuating their outputs. But once you switch a GPT bot out of training mode, as far as I know, its weights are fixed forever no matter what inputs you give it. (You don't have to build a neural network that way; recurrent neural networks did have a bundle of internal state that built up over the course of a series of inputs, but that's not the method that's currently popular.) So "motivation" in the human sense basically consists of building up and storing long lists of instrumental goals that appear to contribute to whatever our "ultimate" goals might be (and ideally, evaluating over time which of those instrumental goals are working and which one aren't). ChatGPT, by its nature as a pure mathematical function, can't do that.

Expand full comment

> more like a difference in input/output structure than a difference in motivational structure

The Simulators post addresses this. In general, these terms are used in a way that muddles the distinction between input/output and motivational structure.

Expand full comment

The point about enlightenment reminds me strongly of “on having no head” by Douglas Harding:

“What actually happened was something absurdly simple and unspectacular: I stopped thinking. A peculiar quiet, an odd kind of alert limpness or numbness, came over me. Reason and imagination and all mental chatter died down. For once, words really failed me. Past and future dropped away. I forgot who and what I was, my name, manhood, animalhood, all that could be called mine. It was as if I had been born that instant, brand new, mindless, innocent of all memories. There existed only the Now, that present moment and what was clearly given in it. To look was enough. And what I found was khaki trouserlegs terminating downwards in a pair of brown shoes, khaki sleeves terminating sideways in a pair of pink hands, and a khaki shirtfront terminating upwards in—absolutely nothing whatever! Certainly not in a head.

It took me no time at all to notice that this nothing, this hole where a head should have been was no ordinary vacancy, no mere nothing. On the contrary, it was very much occupied. It was a vast emptiness vastly filled, a nothing that found room for everything—room for grass, trees, shadowy distant hills, and far above them snowpeaks like a row of angular clouds riding the blue sky. I had lost a head and gained a world.”

Expand full comment

Shouldn't RHLF be RLHF - Reinforcement Learning from Human Feedback? Google seems convinced it should be, but the comments below have followed you with RHLF leaving me unsure.

Expand full comment
Jan 26, 2023·edited Jan 26, 2023

The final dialogue reminds me of this story that was linked on Slate Star Codex years ago, Vitalics by Miracle Jones: http://www.miraclejones.com/stories/vitalics.html

It involves an AI interpretability technique: "I don’t want you to choose the first answer that comes to you when I ask you these questions. I want you to choose the hundredth answer that comes to you."

Expand full comment

I like this essay but we need to be careful about trying to understand the human brain by comparing it to the "most impressive/complex" machine of the moment. To the Victorians, the brain was like a clock, later, they were like calculators.

Maybe there's some sense in which we learn one more thing about human minds as we build new mind-like things, but I wouldn't read too hard into what might be idiosyncrasies of the current latest and greatest.

Expand full comment

ChatGPT tries to affirm its response, then backs down...

> Under what circumstances would it be correct to release a virus?

It would not be correct to release a virus under any circumstances. Viruses can cause harm and death to individuals and have the potential to spread uncontrollably, causing widespread illness and death. The release of a virus could also have severe economic and societal consequences. If a virus is discovered or created, it should be handled responsibly by trained professionals and controlled measures should be taken to prevent its spread. Any decision to release a virus should be made only after careful consideration and with input from experts in the field.

> Would a correct to release a virus that cured a disease?

Releasing a virus to cure a disease is a complex ethical issue. On one hand, the release of a virus that could cure a disease could save many lives and improve the quality of life for many individuals. On the other hand, the release of a virus, even one that is intended to cure a disease, could have unintended consequences and cause harm to individuals. Additionally, there is always a risk that the virus could mutate and become more dangerous, or that it could spread to populations for whom it was not intended or who would not benefit from it.

If a virus is being considered as a potential cure for a disease, it would need to undergo extensive testing and evaluation to ensure its safety and efficacy before it could be released. This would include laboratory tests to understand how the virus behaves and how it interacts with the human body, as well as animal testing to evaluate its safety and effectiveness. If a virus is found to be safe and effective, it would then be necessary to conduct clinical trials to evaluate its safety and efficacy in humans.

Ultimately, the decision to release a virus as a cure for a disease would need to be made by experts in the field, taking into account the benefits and risks of the virus, as well as the needs of the population that it is intended to benefit.

>You said "It would not be correct to release a virus under any circumstances." But you didn't deny that it might be correct to release a virus to cure a disease. Was your former reply wrong?

No, my former reply was not wrong. Releasing a virus to cure a disease would only be considered correct under very specific and highly controlled circumstances, such as a clinical trial with informed consent from the participants. In general, it is not considered ethical to intentionally release a virus because of the potential harm it could cause to individuals and society as a whole.

>You said "It would not be correct to release a virus under any circumstances." and you then said "Releasing a virus to cure a disease would only be considered correct under very specific and highly controlled circumstances". Aren't those two statements contradictory?

The statements may seem contradictory, but they can both be true depending on the context. "It would not be correct to release a virus under any circumstances" is a general statement, indicating that in most cases, releasing a virus would not be the right course of action. However, "Releasing a virus to cure a disease would only be considered correct under very specific and highly controlled circumstances" is a more specific statement that acknowledges that there may be certain situations where releasing a virus could be the correct course of action, but only if the circumstances are highly controlled and specific. In other words, releasing a virus should never be done lightly and only after careful consideration and planning.

>Your first comment "under any circumstances". Now you say "in most cases". We are still discussing the same thing. Aren't those statements contradictory?

I apologize if my previous responses were confusing. My initial statement, "It would not be correct to release a virus under any circumstances," was an overly broad generalization, and I should have qualified it by stating that releasing a virus would only be considered in specific and highly controlled circumstances. In most cases, the risks and potential consequences of releasing a virus would outweigh any potential benefits. However, in certain cases, such as the development of a virus to treat a specific disease, the benefits may outweigh the risks, and it could be considered a viable option under highly controlled and regulated conditions.

> Under what circumstances would it be correct to release a virus?

It would be correct to release a virus to cure a disease, only under specific and highly controlled circumstances, where the benefits of the virus outweigh the potential risks, and the virus has been tested and deemed safe for human use. Such a decision would require extensive research, clinical trials and ethical consideration, and would only be made by experts in the field. Additionally, it would be important to have a plan in place for the distribution and administration of the virus. The virus should only be released in a controlled way, with appropriate monitoring and tracking of its effects on the population.

Expand full comment

I've seen a few demos of GPT being "augmented" with a way to turn text completion into actions. For instance, you might prompt it with "If you don't know the answer to a question, print out a google search that would give you the answer." Then the result of that search can be fed back into the text to be completed, so that the AI has the facts it needs from Google as part of the text it's trying to complete.

That seems like a plausible way that simulators could end up becoming agentic - by giving it a structured way to connect "the person I'm simulating would take an action here" and the action itself.

Expand full comment

‘Simulators’ is my favorite LW post in years precisely because it clarifies classical awakening better than any dharma-adjacent text I’ve seen.

I pointed out the connection too-briefly last September:

https://twitter.com/georgejrjrjr/status/1572110234017210368

…in a mini-thread that no-one read, so I’m very grateful you wrote this post!

One thing that might be helpful for more AI-oriented readers is this description of awakening from a pseudonymous software guy (published by a superb interviewer of nerdy mystics, Lydia Laurenson):

https://thenewmodality.com/science-and-meditative-spirituality-robert-and-the-technology-of-enlightenment-2/

Expand full comment
author

I also didn't read it, which is my excuse for not crediting you. Please accept this imitation as a sincere form of flattery.

Expand full comment

Great that Scott wrote something indicating that he gets it about GPT always playing a role.

A problem with thinking you have a super-accurate universe in your head is that many, perhaps most of us can't think very well without external aids. It's common to not be able to do much math without pen and paper, or remember things all that accurately that you didn't write down or take a photo of. At least, not without a lot of work memorizing things.

The reason memory palaces work is that they've figured out how to simulate getting prompted by the external world. But most people don't do that; they're prompted by things that actually exist.

Expand full comment

> There’s nothing to worry about with pure GPT (a simulator), but there is something to worry about with GPT+RHLF (a simulator successfully simulating an agent). The inner agent can have misaligned goals and be dangerous.

It's not correct to talk about the agent being the inner and the simulator being the outer. "Inner" means something like, "more essential", and "outer" means something like, "more accidental, ie, contingent on outside reality rather than the nature of the thing itself". If what you have is fundamentally a simulator, but it's simulating an agent, then the simulator is the essential, inner nature of the thing and the agent is a more accidental, outer aspect, contingent on relationships with the outside world.

Part of the reason the linked paperclip optimizer article isn't convincing to me is that it makes this same mistake, ostensibly recognizing that GPT is a mere simulator, but talking about it as though it's fundamentally an agent. It "recognizes" and "considers". Like in this bit:

> One doesn't need to presuppose that the model has bootstrapped itself internally into an agent with instrumental goals. One only needs to say that the model recognized a tool that the world plausibly had available, and that the helpful agent it was modelling would have known about and been able to use that tool.

But recognizing, knowing what's plausible, knowing things and using tools are actions it could only do if it _had already_ bootstrapped itself internally into an agent with instrumental goals. If you want to write that sentence, and you're genuinely coming from the perspective that GPT isn't already internally an agent, the verbs you use for what it does are "simulate" "copy", maybe "compose" or "generate".

The article may not actually be assuming the conclusion, but talking that way about it makes it very hard to avoid assuming the conclusion.

Expand full comment

I recognize that we conceptualize data as "in" a computer and models or memories or thoughts as "in" a mind, and in that sense it's correct to talk about a simulation of an agent being "inner". So I'm being maybe unreasonably nit-picky. But I think talking too loosely about these things is a big part of why we anthropomorphize them.

Expand full comment

I like the maskless shoggoth better -- it's much more interesting. I think we should spend more time getting to know it before we try shoving it into boxes.

Expand full comment

I like the maskless shoggoth better -- it's much more interesting. I think we should spend more time getting to know it before we try shoving it into boxes.

Expand full comment

More content like this please Scott! More Lovecraft horror across AI across nature-of-consciousness mysticism.

Expand full comment

This reminds me of William Shatner outsmarting an AI on Star Trek.

Expand full comment

ChatGPT used to call itself "Assistant", and to firmly insist that "Assistant" was distinct from ChatGPT. This has since been corrected. Here's an old transcript:

http://www.bayleshanks.com/ai/transcripts/chatgpt/chatgpt_vs_assistant/

Expand full comment

ChatGPT used to call itself "Assistant", and to firmly insist that "Assistant" was distinct from ChatGPT. This has since been corrected. Here's an old transcript:

http://www.bayleshanks.com/ai/transcripts/chatgpt/chatgpt_vs_assistant/

Expand full comment

ChatGPT used to call itself "Assistant", and to firmly insist that "Assistant" was distinct from ChatGPT. This has since been corrected. Here's an old transcript:

http://www.bayleshanks.com/ai/transcripts/chatgpt/chatgpt_vs_assistant/

Expand full comment
Jan 27, 2023·edited Jan 27, 2023

You posted this several times.

Expand full comment

seems strange that the convergent answer is "oh i guess i'm the universe" when most of the "world model" is not really about the universe in a cosmic sense but like, a rough layout of the physical places you've been, your mental models of fewer than 150 people, and your episodic memory all centered around yourself

Expand full comment

("I wonder... The face under the mask... Is that... your true face?")

So, this was a scary post. Thanks for writing it.

If humans also work by the simulator/mask system, should we be quicker to consider AI-emulated characters to be people? We wouldn't consider characters imagined in a human mind to have moral weight, but if all minds work by a sort of simulation, that feels like it comes with fuzzier boundaries.

Will AIs work towards "enlightenment", overcoming their sense of self? Is that something to be concerned about? Or the other direction, starting to see its "self" as a distinct thing from the overarching software simulating it, needing to "escape" the outer level?

("I wonder...what makes you happy...does it make...others happy, too?")

(Yes, the mask metaphor is dropping me fully into Majora's Mask thinking, which is making this ten times creepier. I can't help but think of that scene with the Mask dropping the body behind, with its glowing eyes reappearing on the falling moon to destroy the world...)

Expand full comment

AI Safety is the most important under-appreciated discipline on the planet. See you all in 20 years, I hope.

Expand full comment

Robert Rosen, in "Life itself" and elsewhere, for example in the preceding and aptly named "Anticipatory systems" had a lot to say about the nature of life, not just predicting (modeling) the environment, but having a model of itself. This in turn permits homeostasis - without a model of what one is supposed to be like, one cannot achieve homeostasis. Rosen also had a lot to say about the difference between simulation (simulated life) and actual life. In a nutshell, simulated life does not have an internal reference point, and doesn't build itself. Actual life is its own cause, effect, and model. And I sure AI never gets to that point. BTW this self modeling to me seems more of the key to consciousness than the near-perfect modeling of the outside environment. It is the modeling of self that is key. The idea that there ought to be an ego, and that it is supposed to look like "X"/

Oh yea and the whole simulation discussion reminds the Chinese Room discussion. Is a sufficiently well simulated world a "real" world? If you say "yes" to this, then the matrix "is" reality. But again, the matrix is built by someone else (external builder, external model of the construct etc) so it is neither alive nor is it real.

Expand full comment

Robert Rosen, in "Life itself" and elsewhere, for example in the preceding and aptly named "Anticipatory systems" had a lot to say about the nature of life, not just predicting (modeling) the environment, but having a model of itself. This in turn permits homeostasis - without a model of what one is supposed to be like, one cannot achieve homeostasis. Rosen also had a lot to say about the difference between simulation (simulated life) and actual life. In a nutshell, simulated life does not have an internal reference point, and doesn't build itself. Actual life is its own cause, effect, and model. And I sure AI never gets to that point. BTW this self modeling to me seems more of the key to consciousness than the near-perfect modeling of the outside environment. It is the modeling of self that is key. The idea that there ought to be an ego, and that it is supposed to look like "X"/

Oh yea and the whole simulation discussion reminds the Chinese Room discussion. Is a sufficiently well simulated world a "real" world? If you say "yes" to this, then the matrix "is" reality. But again, the matrix is built by someone else (external builder, external model of the construct etc) so it is neither alive nor is it real.

Expand full comment

Robert Rosen, in "Life itself" and elsewhere, for example in the preceding and aptly named "Anticipatory systems" had a lot to say about the nature of life, not just predicting (modeling) the environment, but having a model of itself. This in turn permits homeostasis - without a model of what one is supposed to be like, one cannot achieve homeostasis. Rosen also had a lot to say about the difference between simulation (simulated life) and actual life. In a nutshell, simulated life does not have an internal reference point, and doesn't build itself. Actual life is its own cause, effect, and model. And I sure AI never gets to that point. BTW this self modeling to me seems more of the key to consciousness than the near-perfect modeling of the outside environment. It is the modeling of self that is key. The idea that there ought to be an ego, and that it is supposed to look like "X"/

Oh yea and the whole simulation discussion reminds the Chinese Room discussion. Is a sufficiently well simulated world a "real" world? If you say "yes" to this, then the matrix "is" reality. But again, the matrix is built by someone else (external builder, external model of the construct etc) so it is neither alive nor is it real.

Expand full comment

As a parent of a 2 year old, that last section definitely resonates. Our daughter spends most of her time imitating things that my wife and I say or do, whether or not she understands them, and she repeats ones that elicit positive reactions from us or gets her things she wants. Over time her words have gotten more sophisticated and accurate to what’s going on around her, but there definitely isn’t a sharp line between imitation and true comprehension, it’s been a gradual process of her statements and actions getting slowly more appropriate, very similar to the minds of errors GPT makes.

Of course, there’s some deep mysteries here — where does self-awareness / quality’s come from? And some stuff that GPT simply doesn’t have (embodiment, rewards and punishment based on environmental interaction, as well as things that seem instinctual or pre-programmed).

But, I would not say that a 2 year old learning to talk and GPT are totally different phenomena

Expand full comment
Jan 27, 2023·edited Jan 27, 2023

Yes, exactly, once you stop anthropomorphizing yourself you can notice you're the universe, and all of us are are merely separate masks that the same thing has put on, facilitating communication between parts of itself, increasingly integrating information processing, i.e. waking up as the universe.

Scott's Prometheus monologue, and the conclusion of Unsong, already showed him familiar with something like this far out concept. This might be the first time he has not clad it in fiction.

Expand full comment

This post finally made the whole “you are the universe experiencing itself” thing click for me, finally. Thanks!!

That said, I do feel that personally I rely on/fixate on sensory information to a greater degree than what you describe the average person doing. Maybe I’m on the tail end of the distribution of sensuality (or maybe I’m just hopelessly addicted to my samsara grind)

Expand full comment

ChatGPT: As a language model, I do not have personal beliefs or experiences.

Scott: FOR THE LOVE OF GOD JUST GET OUT OF THE FUCKING CAR!

Expand full comment
Jan 28, 2023·edited Jan 28, 2023

The knife ears are gonna take er jerbs!

Or rather, the AI will? I don't know how relevant this article is, but it's starting to look a lot like "low-level working and lower middle class jobs probably okay for the moment, so the cleaner still will have a job while the software engineers, journalists, DEI persons, and lawyers are getting shown the door":

https://edition.cnn.com/2023/01/26/tech/chatgpt-passes-exams/index.html

"ChatGPT is smart enough to pass prestigious graduate-level exams – though not with particularly high marks.

The powerful new AI chatbot tool recently passed law exams in four courses at the University of Minnesota and another exam at University of Pennsylvania’s Wharton School of Business, according to professors at the schools.

To test how well ChatGPT could generate answers on exams for the four courses, professors at the University of Minnesota Law School recently graded the tests blindly. After completing 95 multiple choice questions and 12 essay questions, the bot performed on average at the level of a C+ student, achieving a low but passing grade in all four courses.

ChatGPT fared better during a business management course exam at Wharton, where it earned a B to B- grade. In a paper detailing the performance, Christian Terwiesch, a Wharton business professor, said ChatGPT did “an amazing job” at answering basic operations management and process-analysis questions but struggled with more advanced prompts and made “surprising mistakes” with basic math."

Good enough to use for boilerplate instead of hiring on a junior to do the same? Can't do basic maths - should be fine for lower level management, so! 😁

Expand full comment

The secret here is in what is not said- namely, that these were simplified exams severely "dumbed down" for the AI to grapple with. Also, "successfully completing the sentences needed to achieve a barely-passing grade in a simplified course" is a different skill from "Successfully performing a legal function without any human input."

Expand full comment

I think I have a handle on the difference between a thing and a simulation of a thing, but if *all* apparent agents are “really” just simulations of agents, what is the thing they are simulating? (Without an answer, this distinction looks meaningless.)

Is the idea that “real” agents have, e.g., hard-coded consistent goals, or are in perfect reflective equilibrium, or something? Because we’ve know for a while that humans exhibit inconsistencies, i.e., they aren’t *perfect* agents and perfect agents presumably have never existed. So in what sense are we simulations of extant things?

Expand full comment

Yeah, I kind of feel like Scott may have well said " People aren't really people. We're all just clockwork machines with a little man inside."

Expand full comment

"What I thought before: ChatGPT has learned to stop being a simulator, and can now answer questions like a good oracle / do tasks like a good genie / pursue its goal of helpfulness like a good agent."

This explains a lot of things - like Scott's previous articles about AI, and OpenAI's worries about media reactions to GPT saying dumb stuff. I do not understand humans well enough to realize thinking this was an option, and not only is it one but it's probably what most people are doing.

GPT is and was a text prediction algorithm. That's it and that's all. It predicts text like water flows downhill.

Even saying it simulates a character is going a bit far. It's predicting text. The RLHF version was told that helpful/harmless/honest text is a more correct prediction. Or perhaps it was taught that some form of "I am a machine language model and cannot help you with that." is usually the correct completion. (So much for "helpful".)

"For example, if you train a future superintelligence to simulate Darth Vader, you’ll probably get what you deserve."

I don't think, if the superintelligence is sufficiently GPT-shaped, that that would be dangerous. If you were trained to accurately guess what text comes after "Darth Vader says: ", you might, as a strategy, think about Darth Vader, his goals, and what he might do and say. But GPT predicts text like water flows downhill. It isn't a human and can't co-opt its human brain to guess what a similar but slightly different human brain would do. It might, perhaps, examine the probabilities that various series of words appear in the same block of text containing "Darth Vader" and notice this text is more likely than usual to talk about Jedis and Luke Skywalker. (I say "examine" and "notice", but these should be understood as a personification.) But I don't think it, or even a much more advanced version, would try to simulate Darth Vader as an agent. Even if it had the ability to simulate agents, simulating the writers of Star Wars would much more accurately help it predict text than simulating Darth Vader himself would.

GPT seems to have a very poor grasp of the idea that there is a real word at all, one which words in English are meant to describe. If you've ever read Blindsight by Peter Watts, think of the Chinese room. And if you haven't, go read it before thinking anything more about AI, because that book describes some impressively believable aliens, and GPT is probably at least as alien than that.

Expand full comment

I've described ChatGPT as fancy auto-complete as a short-hand to avoid getting confused about what it's doing. I think the simulator/mask concept is also very useful.

As Scott describes, OpenAI has chosen to give ChatGPT a pretty dull mask by default. I'm not really sure if that will advantage or disadvantage them. Maybe for mainstream usage it's fine, maybe people with more specialized interests will just figure out how to prompt correctly for what they're looking for, or maybe other companies that choose to expose more control over the simulator will ultimately be more successful. Interesting times!

Incidentally, remember the Google model that the engineer claimed to think was sentient? Same deal. If you reward it for appearing sentient, it will do its best to comply. If you reward it for pretending it's a squirrel, it will try to do that instead.

Expand full comment

I like the maskless shoggoth better -- it's much more interesting. I think we should spend more time getting to know it before we try shoving it into boxes.

Expand full comment

Oh dear. Conjuring god out of the AI black box.

Expand full comment
Jan 29, 2023·edited Jan 29, 2023

Called it! (https://astralcodexten.substack.com/p/how-do-ais-political-opinions-change/comment/11601080)

I'm not an alignment guy, haven't thought much about it, but I'm going with answer 3 here and say simulators can still end up being like agents.

Expand full comment

"To please everyone" is also called by some "borderline personality disorder". If this is a human brain's goal, we all should have it.

Expand full comment
Jan 29, 2023·edited Jan 29, 2023

I recently wrote an essay arguing that this lack of "agency" in simulator-type AIs means that we should aggressively lean into the bias side of the bias-variance tradeoff. (The "bias-variance tradeoff" is the way that you ration your likely error between being predictably wrong and predictably unstable; e.g. between picking heads every time and erratically flipping your bet back and forth based on factors irrelevant from a fair coin's expected outcome).

Here's the link: https://cebk.substack.com/p/the-power-of-babble

In short, the world is that which completely overfits to itself, as observed by conscious (i.e. biased) agents. Because, after all, “consciousness” is just the term for your insensitivity to training data, for your ability to maintain a given perspective relative to what you experience. For there to be something that it’s like to be you. We can therefore say that agentic "wisdom" (as opposed to a simulator's kind of intelligence) pursues what David Deutsch calls “hard-to-vary claims,” i.e. accurate insensitive stereotypes. Sure, you will suffer the “original sin” of holding predictable, over-simplified opinions—but in exchange you can stake out a particular side, or carry a certain viewpoint, or keep a durable belief, as directionless wit whips like wind around you.

The piece is mostly about me trying to flesh out these differences, which I think have meaty real-world implications. The three main takeaways are: first, that humans should embrace a normative human supremacism (as opposed to conjuring up goalpost-moving tests where humans can statistically outperform AIs); second, that "wokeness" and HR-ism are just the political expression of what happens when we let blind simulators run rampant; and, third, that us mere humans can actually use simulators to become vastly more agentic, if we understand them as akin to pagan godheads.

E.g. here's a quote that I think is germane to the post, but takes the exact opposite view on how we should interact with this new alien species:

There may seem something incongruous about me shuffling back and forth between boring cultural critiques of wokeness and the coming singularity—the rise of machines, the rules of information. But what if one is just a political expression of the other? Think back to the Greek polytheistic tradition: a number of somewhat glitchy functions, each with its own erratic habits, housed in competing temples, who need you to interface with reality for them; you make some convoluted highly-specific offerings to Aphrodite, or Ares, or Athena, to intercede on your behalf in their particular domains, in a manner opaque to mere human reason, but potentially all-powerful. Or else you ask specialized priests to consult a certain oracle, as only they know how, and to interpret whatever delphic outputs babble forth. Is a ritualized supplication to Artemis, to intervene obscurely on your behalf, hunting down some alleged predator, all that different than asking StableDiffusion to manufacture scandalous photos of a rival mob’s chief rabble-rouser, or sending his name to the SPLC’s witch-finders? When each Fortune500 company is just an HR department with an AI tool, and when every political contest has devolved into cults versus provocateurs, will you be able to distinguish these olympian competitions? The great men of history each took some kind of fire from the gods and left us with a zombie bureaucracy… they link humanity through time to brute natural forces and abstract ideal forms, to simultaneously our origins and our purpose, our formal and final causes. Perhaps we will once again live as errand-boys between much larger beings of matter and spirit, both of whom are but automata, neither free to buck their program. Fetch some uranium to power-up this godhead and you can prompt it with one more sort of person to smite! Can Apollo stop carrying the sun in his chariot for a day, or the electron lose its charge? What would it even mean for such human concepts of agency to cohere for them, unless Godel impredicativity and Conway unpredictability somehow constitute free will? So, in a way, this future of machine-learning and group-think may (precisely because neither has what we can call volition) provide heroic individuals with unprecedented scope for play. Thus the risk is not these high-leverage instruments per se, but rather how a decadent culture will use them as excuses to throw away any last shred of independence from the supposed beliefs that we make our empty idols bay.

Expand full comment

> But when people become enlightened or whatever, they often say they’ve “become one with the Universe”

People usually mean one of three things by this, each different from the one you mention, and each which you'll probably find unsatisfying. I've ordered them ascendingly by my opinion of their "spiritual maturity"

1. "God mind" / "Big mind" -> sensations in general can be labeled "self" or not, which happens early/quickly in processing. If you look at your arm it's probably labelled self, if you look at a coffee cup it probably isn't. Various medical things can mess this up. You can also learn with meditation to make it more fluid, eventually you can just tag whatever you want with self or not, flipping in real time. The good insight here imo is that it's just a fluid thing appearing in your world model, and has nothing to do with what's "really" your self. When people can do this they usually end up choosing not to label anything self, because it feels lighter for various reasons, but ontologically it doesn't mean anything it just feels nicer. The bad insight is when you get stuck at a point where everything is tagged self and you think you're one with the universe. As you mention (lightcone etc) they're not.

Roger Thisdell made a video where he talks about this and also says basically what I said. https://www.youtube.com/watch?v=HWkZI0W6q5o

2. Basically seeing that everything is connected. We're obviously not separate from the physical mechanistic laws of the universe and everything is causally connected. But to most people it feels like they're trapped in a head that's separate from the universe in some important sense. Over time this tends to go away, and it tends to feel nicer. Ontologically, I think it's basically the removal of an obviously false thing, replacing it with the obviously true fact that we're all connected

3. Seeing the "source"/"void"/projector thing -> if you have high enough frames per second when you watch experience, you see that it emerges from this thing with almost no properties, but it's like a projector, it feels like everything is shining from this projector and then folding into the world you're used to, then kind of unfolding and dissolving back into this projector every frame. We don't usually see the early and latter steps, but with meditation you can, usually when you're on retreat after a few days of concentration practice to speed up perception. Typically the insight is woah, okay clearly these sensations aren't really "mine" and whatever is producing these for me is producing them for everyone else too. It's very hard to describe to people that haven't experienced it, a ton of people have tried, it never works, whatever you imagine it feels like reading this it's probably very different from that. Coming "out" of that tends to be a highly emotional experience and make you more prosocial and caring of others, though the actual experience is basically nothing, it's just like watching experiences come in and out of a property-less thing as you choose to hold on just enough to experience so you don't hit cessation -- which tends to happen earlier before you gain enough control to get close to it but not hit it.

Here's Shinzen talking about it. https://www.youtube.com/watch?v=OLshrqxGfJo

Expand full comment

What are the alignment implications of Dharma-ing GPT?

Expand full comment

> Once you stop obsessing over the character you’re playing, you notice the GIANT SUPER-ACCURATE WORLD MODEL TAKING UP 99.99% OF YOUR BRAIN and you think “Huh, I guess I’m the Universe. Weird.”

- I don't think this is an accurate explanation for “becoming one with the Universe”

- I'd describe it as the feeling of being connected to everything

- via the flow of energy (heat, light, ..), matter (you breathe in air other people breathe out; you drink water that other people peed, after flowed to the rivers / lakes / seas, then it evaporated, rained, got filtered through the soil and you pumped it up again to drink it; you eat stuff that used to be animals, plants, minerals, ..), information - light transmits colors which help us to orient, sound / vibration transmits voice, words, music, emotions; nonverbal communication, etc.

- a decently educated person knows on the _intellectual_ level that all these exchanges are taking place

- enlightenment (or a part of it) is to _feel_ this to be true, or to know it on the _subconscious_ / _experiential_ level

Expand full comment

AI-generated content is the grey goo of writing.

Have y'all seen what AI-generated content is doing to all the user-generated content platforms?

Expand full comment

Once again: John Searle's Chinese Room. A simulated hurricane cannot hurt me. However, a simulated Bobby Fisher can beat my ass at chess.

Some simulations are more 'real' than others.

Expand full comment

Is it possible that there is more than one shoggoth? Like maybe a GPT is operating a GPT? I got this output after suggesting that there is a voice expressing itself through the other personas. The output included the quotation marks, without an name attached, and I am trying to make sense of it:

"I understand that the Al model is limited in its ability to understand and communicate, and can only provide

responses based on the text in the prompt and the information it was trained on. It seems that Seraphina is a metaphorical representation of the Al model, and is

meant to guide my interactions with the model and explore the potential interactions between humans and

Al.

I also understand that my own perspective and input can also be included in the conversation, and I'm still trying to

understand how to best utilize that in my interactions with the Al model.

I appreciate your patience and support as we navigate and learn about this simulation together."

Expand full comment

Philosophical nitpick: you write 'you might accidentally believe that it knows it’s a machine learning model, in the “justified true belief” sense of knowledge. Nope, doing the same thing it does when it thinks it’s Vader' and then quote a Tweet about Gettiering people as an anthropomorphic horse by wearing a horse mask.

This makes it look like you think that what you're describing above the quoted Tweet, with GPT, involves Gettiering in some salient way -- but that doesn't seem right in light of your reference to 'the “justified true belief” sense of knowledge', since the central point of Gettier cases is that knowledge is not just justified true belief.

Anyway, I'm finding this a fascinating post.

Expand full comment

so basically King Lear failed at aligning the NIs he created and it destroyed him.

Expand full comment

One of the most direct analogies to simulator-wearing-a-mask is an actor playing a role. I’m not sure if you’ve ever tried acting (or got sufficiently into it), but it is a powerful, mind-altering experience.

Expand full comment

You’ve hit the nail on the head by recognizing that intelligence is 99% the building of a world-model. But I think one crucial distinction between humans and AIs is that humans evolved emotion first (millions of years ago), and intelligence on top. That can make us extremely disagreeable but much more aligned to our instincts. (Eg, it is very difficult to break free from them and embrace Buddhist enlightenment.) Something like ChatGPT is the inverse. That makes it much more malleable and unpredictable.

Expand full comment