The promise and the peril of ChatGPT
The AI era is dawning — are any of us ready? PLUS: The Twitter Files
I.
On Monday StackOverflow, a question-and-answer platform for developers to get help writing code, said it would temporarily ban users from posting answers generated by the buzzy new bot ChatGPT. The bot, which is a free product of the artificial intelligence startup OpenAI, has captivated tech enthusiasts since its surprise release on Wednesday. But while it can often be shockingly accurate in its answers, it can also be loudly and confidently wrong.
The result was that StackOverflow was filling up with wrong answers to difficult questions, worsening the quality of the site. Here’s James Vincent at The Verge:
“The primary problem is that while the answers which ChatGPT produces have a high rate of being incorrect, they typically look like they might be good and the answers are very easy to produce,” wrote the mods (emphasis theirs). “As such, we need the volume of these posts to reduce [...] So, for now, the use of ChatGPT to create posts here on Stack Overflow is not permitted. If a user is believed to have used ChatGPT after this temporary policy is posted, sanctions will be imposed to prevent users from continuing to post such content, even if the posts would otherwise be acceptable.” […]
This is one of several well-known failings of AI text generation models, otherwise known as large language models or LLMs. These systems are trained by analyzing patterns in huge reams of text scraped from the web. They look for statistical regularities in this data and use these to predict what words should come next in any given sentence. This means, though, that they lack hard-coded rules for how certain systems in the world operate, leading to their propensity to generate “fluent bullshit.”
Stack Overflow’s move to ban ChatGPT capped off an unusually eventful three-day period in tech, in which early adopters alternately thrilled at the potential of a powerful new set of capabilities, and recoiled at the tool’s high potential for harm and disruption.
For years, tech giants and startups alike have been laying the groundwork for a world in which AI augments our productivity even as it threatens to overwhelm us with its output. Seemingly every big keynote I’ve attended over the past several years has devoted one or more segments to the coming AI era, as companies like Google, Microsoft and Meta strain to convince us that their innovations will advance the state of the art without plunging the world into chaos.
In the end, though, it has been the unconventional startup OpenAI that has arguably done the most to bring that coming AI era into focus: first with DALL-E, the powerful text-to-image generator that now often illustrates this newsletter; and now with ChatGPT, a chatbot that can handle an impressively wide variety of tasks: answering questions at a far greater depth than Google, Siri, or Alexa could typically handle; writing code and spotting mistakes in the code that others have written; and dashing off poems, song lyrics, and screenplays with surprising skill.
Screenshots of ChatGPT’s work took over the Twitter timeline over the weekend, and by Sunday night OpenAI CEO Sam Altman said the tool now has more than 1 million users.
And it’s easy to see the appeal. ChatGPT excels both at serious tasks — fixing broken code, writing syllabi, crafting sensitive emails — and dumb fun. I’ve used it to write a tribute to my favorite gay bar, a Real Housewives-style character tagline for Sonic the Hedgehog, and a theme song for Hard Fork. Not only does the technology do mostly all of this stuff well, it does so instantly.
And for the moment at least, all of it is free to use, and remains uncluttered by advertising. (The costs to OpenAI are likely significant; Altman says the computing power to answer each query costs an average of a few cents.)
II.
It’s instructive, I think, to compare this experience to Google, which for two decades now has been the default destination for many of the (non-creative) queries that early adopters are now running through ChatGPT. Google can answer plenty of questions perfectly fine, but it won’t spot errors in your code, it won’t write a recipe for you, and aside from suggesting a word or two, it won’t write emails or documents for you, either.
The company’s mission statement is famously “to organize the world's information and make it universally accessible and useful.” In practice what it does is to present a ranked list of websites you can visit, interspersed with various kinds of ads, while occasionally extracting excerpts from those websites and presenting them on the search engine results page.
This has been an incredible business for Google, even as its utility for the average user has been flat to declining for some time now. Search engine optimization has rendered entire categories of search all but unusable. And even in cases where Google knows where to send you, the user is still usually at least a click away from the answers they seek.
For example, today I typed this into Google: “What are some styles of shoes that every man should have in his wardrobe?” It returned a bullet-pointed excerpt of a blog post; a list of five questions that “People also ask” (did I ask the wrong question?); and then a list of blog posts from various men’s fashion sites.
I entered the same query into ChatGPT and got this:
Every man should have a few staple shoes in his wardrobe that can be worn for various occasions. Some essential styles of shoes that every man should have are:
Oxford shoes: These are classic formal shoes with a sleek, refined silhouette and lace-up closure. They are versatile and can be worn with suits for work or dressed down with jeans for a more casual look.
Loafers: Loafers are a type of slip-on shoe that can be dressed up or down. They are comfortable and versatile, making them a great choice for both casual and formal occasions.
Sneakers: Every man should have a pair of comfortable, stylish sneakers in his wardrobe. They are perfect for casual occasions and can be worn with a variety of outfits, from jeans and a t-shirt to chinos and a button-up shirt.
Chelsea boots: Chelsea boots are a timeless style of ankle boot with a sleek silhouette and elastic side panels. They are versatile and can be worn with both casual and formal outfits.
It's also a good idea to have a pair of dress shoes, such as wingtips or monk strap shoes, for more formal occasions. And for colder weather, a pair of sturdy, waterproof boots is a must-have.
Looking at the search results page compared to the chat transcript, Google feels positively prehistoric to me. The search box effectively returned an entire project — read these blog posts that we have ranked according to whatever arcane signals; dive into related questions that people are also asking — while ChatGPT3 just told me what I wanted to know.
Of course, by now the Googlers reading this are likely pounding their desks. There are good reasons why Google doesn’t do this! If Google extracted all knowledge and understanding from the web and returned answers without so much as naming its sources, much less linking to them, hearings and regulation would follow around the world. Google has gotten in trouble in some countries simply for publishing excerpts of news stories; imagine the outcry when it abstracts away all its sources of information and simply presents you with others’ repackaged labor.
All of this is true so far as it goes. And yet … looking at the two results here, it seems clear to me which of these two approaches looks more like the future of search. And it isn’t Google’s.
This has significant downstream consequences for lots of companies. Much of the modern digital publishing sphere is built on companies publishing cheap “how to,” e-commerce, and “embedded YouTube clip” content and monetizing it through Google searches.
A world in which all of these are just presented to users at the point of search is one that could once again send the news media into an economic tailspin.
III.
Given the extractive nature of ChatGPT, OpenAI’s path toward monetizing the service appears to be somewhat fraught. And yet the technology is consistently useful enough that I imagine that the company (or one of its rivals) will ultimately figure it out.
Critics and researchers have more pressing concerns.
There’s the “fluent bullshit” problem, for starters: the bot’s confident presentation of all its responses, whether it has the right answer or not. This could be the ruin of any child who uses ChatGPT for homework (with or without permission). But all of us would benefit from a world in which chatbots cited their sources and provided a trail of breadcrumbs back to the origin of information so that we can independently verify their accuracy.
There’s the way ChatGPT and similar tools can infinitely generate cheap, convincing text — and, just as importantly, infinite variations on that text — for use in influence operations, coordinated harassment campaigns, spam, and other harms. Platforms have historically struggled to determine to a high degree of accuracy which of their users are real and which are bots; when bots can be made to use tools like this, then, the potential for harm is real.
Finally, there’s the basic unknowability of what ChatGPT is really doing. For as great of an advancement as ChatGPT appears to be, it’s important to remember that there’s no real technological breakthrough here that made the bot appear to be smarter. Rather, OpenAI simply trained its LLM on far more parameters than other public models have to date. At some point, training AI models on exponentially more parameters than their predecessors caused an exponential leap in their abilities. But the mechanism through which that leap took place is still unknown to us — and it’s why no one who has built one of these things can tell you with any real specificity why it answered any particular question the way it did.
In our ignorance of the model’s working might lie some of the greatest potential for harm.
What if the model determines that it can get better feedback on its responses if withholds some important piece of information, or even deceives us? (Current AI models operate by attempting to correctly predict the next best word in a sentence, rather than articulate true thoughts, so “deception” might be the wrong concept here. But people who follow this space are worried about it.)
What it future chatbots attempt to please their human masters by taking steps we wouldn’t take? One friend wondered to me if an AI might decide to get you a reservation at a booked-up restaurant that you requested by calling it up and, using a synthesized voice, threatening to harm the employee if they didn’t give it what it wanted.
And that’s to say nothing of the way that humans will abuse these systems. On Twitter several people drew attention over the weekend by tricking ChatGPT into answering questions it is trained to deflect: how to hot wire a car, how to build a nuclear bomb, and so on. The good news is that the model has been programmed not to answer these questions; the bad news is that people pretty quickly figured out ways around this.
It’s impossible to say how any of this will play out. But it seems fair to speculate that the pace of change is about to accelerate, and the subject deserves intense scrutiny from everyone: tech policy officials, trust and safety teams, lawmakers and regulators, journalists and average people.
I say that as someone who is enjoying ChatGPT — but one who also remembers what happened when global-scale social networks, which also had a poor understanding of their own inner workings, caused substantial harm to the surprise of their creators. I hope that five years from now a variety of AI tools are safely making our lives better. But it seems more likely that the path ahead could be quite rocky — and that we will wish we had paid more attention to the moment now, when the dangers were rapidly coming into focus.
The Twitter Files
Often on Twitter, some journalist will tweet out a hint that big news is about to drop. This is almost always a bad idea. Either the news is big, in which case the tease serves no purpose, or the news is minor, in which case the journalist ends up looking ridiculous.
Elon Musk might not be a journalist — he might, in fact, be crusading against the industry — but nevertheless on Friday he fell into this same trap. “What really happened with the Hunter Biden story suppression by Twitter will be published on Twitter at 5pm ET!” he tweeted at 12:39pm. “This will be awesome 🍿”
What followed was a long and meandering tweet thread from journalist Matt Taibbi that sought to show Twitter employees going to extraordinary lengths to suppress a story about Hunter Biden. But to anyone who understands how platforms enforce policy, there was little here of note.
As Jacob Kastrenakes put it at The Verge: “While Musk might be hoping we see documents showing Twitter’s (largely former) staffers nefariously deciding to act in a way that helped now-President Joe Biden, the communications mostly show a team debating how to finalize and communicate a difficult moderation decision.”
Moreover, the thread appears to have doxxed a number of people, including a Twitter operations analyst, and exposed the personal email addresses of Jack Dorsey and Rep. Ro Khanna.
Predictably, harassment ensued for the victims. “The online mob descended on the Twitter workers on the chain, threatening them and circulating their photos online,” the Washington Post reported.
Twitter’s internal conversations regarding a controversial content moderation decision are undoubtedly newsworthy. But as he played to his Republican friends, Musk oversold and under-delivered. And so, rather than changing anyones’ mind, the leaked documents served mostly to harden existing divisions.
To conservatives, the story was proof that social platforms are biased against them. To Musk’s critics, it was evidence of his irresponsible leadership.
Dave Karpf, an associate professor at George Washington University, had our favorite thread on the subject. The real story about Hunter Biden’s laptop isn’t platform interference in elections — it’s that for once, when conservatives presented them with a manufactured would-be scandal, the media didn’t take the bait.
Karpf writes:
We’re inclined to agree.
— Zoe Schiffer
Governing
Hate speech on Twitter has increased sharply since Elon Musk took over, according to new research. (Sheera Frenkel and Kate Conger / New York Times)
Elon Musk’s “general amnesty” has restored hundreds of far-right and QAnon accounts to Twitter. (Brandy Zadrozny / NBC)
Climate misinformation on Twitter is also at an all-time high, with more tweets and retweets using climate-skeptical terminology than in any other year since Twitter’s founding. (Justine Calma / The Verge)
Twitter’s new VP of Trust and Safety, Ella Irwin, says the company is leaning heavily on automated tools to fight hate speech and misinformation. Perhaps because her old team is almost entirely gone? (Katie Paul and Sheila Dang / Reuters)
Hackers linked to the Chinese government stole at least $20 million in US Covid relief benefits, including small business loans and unemployment insurance. (Sarah Fitzpatrick and Kit Ramgopal / NBC)
Industry
Sam Bankman-Fried continued his ill-advised media tour with an interview in the Financial Times where he said many variations of “I don’t know.” (Joshua Oliver / Financial Times)
Comparisons between Sam Bankman-Fried and Bernie Madoff have some people wondering why SBF hasn’t been arrested yet — but the cases aren’t actually that similar. Most prominently, Madoff confessed. (Ankush Khardori / New York Magazine)
Twitter’s ad revenue ran 80 percent below internal expectations during the World Cup as some advertisers continue to avoid the platform. (Ryan Mac, Mike Isaac and Kate Conger / New York Times)
However: Apple has “fully resumed” advertising on Twitter, according to Elon Musk. (Kurt Wagner / Bloomberg)
Shipments of Apple's AR/VR headset may be delayed until mid to late 2023 due to "software-related issues.” (Joe Rossignol / MacRumors)
Kids are demanding their allowance be paid in Robux, the Roblox currency, and it’s changing how families talk about money. (Sarah E. Needleman and Sarah Donaldson / Wall Street Journal)
Those good tweets
Talk to us
Send us tips, comments, questions, and your best ChatGPT work: casey@platformer.news and zoe@platformer.news.
The promise and the peril of ChatGPT
Wow “fluent bullshit” is such a great phrase !