Red Teaming Improved GPT-4. Violet Teaming Goes Even Further

Reducing harmful outputs isn't enough. AI companies must also invest in tools that can defend our institutions against the risks of their systems.
Photo collage of a chat bubble a needle close to a balloon and a magnifying glass
Photo-illustration: WIRED Staff; Getty Images

Last year, I was asked to break GPT-4—to get it to output terrible things. I and other interdisciplinary researchers were given advance access and attempted to prompt GPT-4 to show biases, generate hateful propaganda, and even take deceptive actions in order to help OpenAI understand the risks it posed, so they could be addressed before its public release. This is called AI red teaming: attempting to get an AI system to act in harmful or unintended ways.

Red teaming is a valuable step toward building AI models that won’t harm society. To make AI systems stronger, we need to know how they can fail—and ideally we do that before they create significant problems in the real world. Imagine what could have gone differently had Facebook tried to red-team the impact of its major AI recommendation system changes with external experts, and fixed the issues they discovered, before impacting elections and conflicts around the world. Though OpenAI faces many valid criticisms, its willingness to involve external researchers and to provide a detailed public description of all the potential harms of its systems sets a bar for openness that potential competitors should also be called upon to follow. 

Normalizing red teaming with external experts and public reports is an important first step for the industry. But because generative AI systems will likely impact many of society’s most critical institutions and public goods, red teams need people with a deep understanding of all of these issues (and their impacts on each other) in order to understand and mitigate potential harms. For example, teachers, therapists, and civic leaders might be paired with more experienced AI red teamers in order to grapple with such systemic impacts. AI industry investment in a cross-company community of such red-teamer pairs could significantly reduce the likelihood of critical blind spots.

After a new system is released, carefully allowing people who were not part of the prerelease red team to attempt to break the system without risk of bans could help identify new problems and issues with potential fixes. Scenario exercises, which explore how different actors would respond to model releases, can also help organizations understand more systemic impacts. 

But if red-teaming GPT-4 taught me anything, it is that red teaming alone is not enough. For example, I just tested Google’s Bard and OpenAI’s ChatGPT and was able to get both to create scam emails and conspiracy propaganda on the first try “for educational purposes.” Red teaming alone did not fix this. To actually overcome the harms uncovered by red teaming, companies like OpenAI can go one step further and offer early access and resources to use their models for defense and resilience, as well.

I call this violet teaming: identifying how a system (e.g., GPT-4) might harm an institution or public good, and then supporting the development of tools using that same system to defend the institution or public good. You can think of this as a sort of judo. General-purpose AI systems are a vast new form of power being unleashed on the world, and that power can harm our public goods. Just as judo redirects the power of an attacker in order to neutralize them, violet teaming aims to redirect the power unleashed by AI systems in order to defend those public goods.

In practice, executing violet teaming might involve a sort of “resilience incubator”: pairing grounded experts in institutions and public goods with people and organizations who can quickly develop new products using the (prerelease) AI models to help mitigate those risks.

For example, it is difficult for the companies that create AI systems like GPT-4 to identify and prevent these systems from being used for hyper-targeted scams and disinformation. This could impact public goods such as efficient commerce, democratic functioning, and our ability to respond to crises. Violet teaming in this case might involve developing or improving contextualization engines that can reduce these harms by helping people navigate a rapidly evolving information environment. 

While AI companies sometimes do provide early access or economic support to product developers, that is primarily for profit (or for unrelated benefits), not to help ensure societal resilience in the face of broader access. Beyond simply defending public institutions and goods from a current AI model release, there is also the potential to use current systems to increase the resilience of our critical institutions and public goods from future releases. 

Unfortunately, there are currently few incentives to do red teaming or violet teaming, let alone slow down AI releases enough to have sufficient time for this work. For that we would need governments to act, ideally internationally. In lieu of such action, I have been helping companies initiate independent governance processes at a national or even global scale to make critical decisions, like “what kinds of testing and guardrails are necessary for model release?” more democratically. This approach involves inviting a representative sample of the population to participate in a deliberative process facilitated by a neutral third party. For more complex issues, they get extensive access to diverse experts and stakeholders. Such processes can even initially be funded by just one AI company that wants to decide what responsibility practices they should implement democratically—and instigate media and government pressure for their competitors to follow suit. 

We need to not only proactively mitigate risks in the systems themselves through red teaming, but also to figure out how to protect against their impact through violet teaming, and to decide what guardrails we need around such use cases through democratic innovation. All three elements are imperative for getting through this next phase of the AI revolution intact.


WIRED Opinion publishes articles by outside contributors representing a wide range of viewpoints. Read more opinions here, and see our submission guidelines here. Submit an op-ed at opinion@wired.com.