Janus' GPT Wrangling

Sep 19, 2022

...

73 Comments

Crotchety Crank

I wish the Loom HPMOR imitation came furnished with some indication of how many branches were considered, and how many times they only took the first sentence of three GPT generated, and all that - the same way I wish scientific studies would always correct properly for multiple comparisons.

There's the potential for me to be wildly impressed and mildly frightened by that output! But as it is, I can't tell how impressed to be; how much is just humans tirelessly mining the search space for something that would provoke that reaction in me?

Expand full comment

Relevant to this post and the general topic/audience: you can trick GPT models to ignore the prompt, or exfiltrate it: https://simonwillison.net/2022/Sep/12/prompt-injection/

And using further AI-based approaches to try to patch up this kind of security hole works about as well as you'd expect: https://simonwillison.net/2022/Sep/17/prompt-injection-more-ai/

Expand full comment

I really enjoyed reading this post. It reminds me of the last time I had this much fun, which was at a wedding party for my good friend Jane. Ah, Jane is a real hoot...

Expand full comment

dyoshida’s Substack

I personally don't read the quoted snippet as even remotely displaying any sort of self-reference. Am I missing something?

Expand full comment

"The self is a relation, which relates to itself, or is precisely that in the relation that the relation relates to itself; the self is not the relation but that the relation relates to itself."

Expand full comment

Peter S. Shenkin

Sep 19, 2022·edited Sep 19, 2022

A long time ago — in fact, in the fabled 1960s — I asked a friend who worked at a brain research center, "If we were able to create a machine as good as a brain, would it also be as bad as a brain?" He thought for a minute and said he didn't know, but thought it was a good question.

The examples in this blog posting, together with some of the responses, seem to strengthen the case for"Yes."

Expand full comment

I clicked the link without reading on, and thought, "wow, there's no way an AI wrote this!" Then I realized that the italicized part of the page was the original HPMoR, not the AI output. (I don't remember if I read that far before dropping HPMoR in either of my attempts)

Then I got to the actual "AI bit" and thought "wow, there's no way an AI wrote this", and came back to this post to see if I was right.

Expand full comment

Alec’s Substack

"This is not a complement." Do you mean that it does not slot nicely beside prior examples (complement), or that it does not flatter them (compliment)?

Expand full comment

The Loom HPMOR was so good I thought it was fake and somebody wrote it. Knowing that it's a curated human-machine centaur makes me feel, uh, better? Less displaced from reality?

Gotta stay away from stories like that, they're my catnip.

Expand full comment

Putting aside GPT for the moment, consider a picture is worth 1000 words. Since the open sourcing of Stable Diffusion, the rapid progress in mere weeks is astounding. And yet...

SD has an interesting premise: Take an image with known description (tokenized) and add noise to it, in a coherent way, in a large number of steps. Make those steps somewhat reversible (insert lots of math). At the end of the steps, your image is purely "random" noise. Not really, but it looks like noise. NOW, the fun begins: train the neural network to step BACKWARDS from the noise to the original image. Teach it millions of these... Now give it a random image of noise, and a tokenized description of what you want to find out of the noise. And it works.

It's the Infinite Library, of Borges, replaced with a 512x512 grid of 16 million colored pixels... Chaos with no index, and all indexes right or wrong, but you've taught the machinento move from book to book, looking for the nearest image to what you want to find... And each page it opens, leads it closer to your goal.... Move towards the shelves with dog photos, and also towards the shelves with Christina Hendricks, and eventually it finds order in the chaos and triumphantly it returns a photo of Christina Hendricks with a dog... Never mind that she's got six fingers and the dog has three eyes. It did as you asked. Ask again.

It has no actual clue about Christina Hendricks or Dogs... Only how to pattern match.

It's a remixer, not a creator.

True creators are rare and special and easy to miss.

RNGesus isn't.

Expand full comment

BTW, this is why OpenAI will fail and Open Source like Stable Diffusion will win, quoted from the email sent out today:

With improvements in our safety system, DALL·E is now ready to support these delightful and important use cases – while minimizing the potential of harm from deepfakes.

We made our filters more robust at rejecting attempts to generate sexual, political, and violent content – while also working to reduce false flags – and built new detection and response techniques to stop misuse.

Our content policy still prevents uploading images of anyone without their consent, or images that you do not have the rights to.

Expand full comment

Proof at last that excessive political correctness makes one dull and unimaginative.

We really are going to cripple our machines, aren't we?

Expand full comment

FWIW, the word "Dittomancy" isn't created by GPT3 from whole cloth. It's from a different self-aware modern fantasy series, [Erfworld](https://archives.erfworld.com/) . That's a LitRPG with a [well-described set of 24 magical disciplines](https://scratchpad.fandom.com/wiki/Erfworld_Magic), of which Dittomancy is one; it deals with making magical copies of things or people. It bears very little resemblance to what Quirrell describes in the GPT3 HPMOR chapter, which sounds much more like the investigations into theoretical Thinkamancy we see.

(You can also count me among those who don't see any indication of self-awareness in that GPT3 HPMOR chapter. I enjoyed reading it, boggled at some of the impressive wordsmithing in it, but also spotted a number of telltale misuses of language or inconsistent thoughts that show there's no underlying point being made.)

Expand full comment

"Harry’s mind was looking up at the stars with a sense of agony."

I can't discern if this is from the original text, or the one generated by GPT-3, so congratulations, I suppose? It's the same kind of awful prose as the original. How does a *mind* look up? Is it meant to be figurative, where the idea of the mind 'looking up' is 'the mind is imagining and thinking about the stars' or is it meant to be literal?

This is writing on the level of Rings of Power "why does a stone sink and a ship float?" prose, and if GPT-3 is revealed to be writing parts of the script I won't be at all surprised.

I'm sorry, I know a lot of you love HPMoR, but oh dear God. Terrible, terrible, terrible prose.

"It was the sort of grim face an ordinary person might make after biting into a meat pie, and discovering that it was rotten and had been made from kittens."

Or the sort of face I make whenever I read another excerpt from it. Please tell me that this is not the original but the machine creation.

Expand full comment

Sep 20, 2022·edited Sep 20, 2022

Ha, I wrote a short story on LessWrong about this a few months ago - turns out it's already true: https://www.lesswrong.com/posts/Ke7DiT2DHMyGiv3s2/beauty-and-the-beast

Expand full comment

Would it be possible to get GPT-3 to comment on each article in this Substack. I would love to read its comments.

Expand full comment

At least one major world religion teaches that history will culminate in a wedding party.

Expand full comment

Machines are like us, when they're too good at optimizing their lives, they lack the chaos needed to find even better, system-changing, unexpected treasure.

Expand full comment

That you can do this is kind of fun. It's a choose your own adventure interactive fiction on a (near?) infinite scale. It's a new take on collaborative writing.

Expand full comment

Peter’s Substack

Sep 20, 2022·edited Sep 20, 2022

I'm pretty confused as to what's interesting or suggestive about these examples. I mean yes, ha ha, a machine talking about itself is kinda fun art in a Escher/Hofstadter (tho Hof isn't a good example of this genre imo) way but it seems like the examples are actually examples of a total lack of any kind of self-awareness/self-reference.

Am I missing something? Like yah, you ask a language model to predict some text about a common discussion in it's training set and it does...the fact that it's discussed in that training set doesn't seem very deep. If that's all it's supposed to be my bad but I got the sense that ppl are seeing more in it.

Where things would get interesting is if the machine was asked to predict text like "What GPT-3 says when given the prompt 'say something about cookies' is" and it gave the completion that was genuinely the result of just prompting it with "say something about cookies'".

Now that would indicate a certain level of self-awareness/use of reflection in generating beliefs. This is no different than a child learning prefacing "I" before cookie gets a cookie ...the understanding that I refers to them is still far off.

Expand full comment

> for this prompt, it’s mostly 63. Its internal probability meter says there’s a 36% chance that 63 is the right answer, although it chooses it more than just 36% of the time. When it doesn’t choose 63, it usually chooses 66.

This seems slightly wrong; 36% means it will pick it 36% of the time - at T=1. Lowering the temperature makes it biased towards picking more likely stuff more often. At T=0, it'll always pick the best option.

Expand full comment

Musings on the Alignment Problem

That the model likes saying "there is no definitive answer to this question" is likely not due to overoptimization in the sense described in the Stiennon et al. summarization paper (a problem that's not very hard to avoid). Instead, our best explanation is that this particular problem stems from issues with our training data: some of our labelers thought these kind of responses were good in some settings, and the model over-generalized this. We're in the process of improving this.

Expand full comment

" Its internal probability meter says there’s a 36% chance that 63 is the right answer, although it chooses it more than just 36% of the time. When it doesn’t choose 63, it usually chooses 66."

> This sentence seems a bit confused. It depends on the temperature. With temperature 1 it will choose 63 36% of the time. With temperature 0 it will choose the most probably answer 100% of the time.

Expand full comment

RandomSourceAnimal

Has anyone trained a model to detect socioeconomic class from portraits or video data of people? It seems like something that would be easy to do. Even if you limit it to a particular population (e.g., white people in the US, or in NYC, etc.) Then you could use the model to identify indicia of high and low class, and see how those indicia map to popular conceptions of the same.

Expand full comment

The Tension of Reflexive Identi…

Just goes to show that self-awareness can be imitated just like about any other human characteristic. Makes me think of a song called coin-operated boy by The Dresden Dolls.

Expand full comment

Virtual Marginalia

I started having a derealization panic attack while reading the HPMOR-like story.

Controlled it by convincing myself that finding out that reality isn't real by reading a blog post discussing an AI-generated story about characters realizing that their reality isn't real would be just... *too* on the nose.

Expand full comment

#nojs-banner { position: fixed; bottom: 0; left: 0; padding: 16px 16px 16px 32px; width: 100%; box-sizing: border-box; background: red; color: white; font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif, "Apple Color Emoji", "Segoe UI Emoji", "Segoe UI Symbol"; font-size: 13px; line-height: 13px; } #nojs-banner a { color: inherit; text-decoration: underline; } This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts