Discover more from Unstabler Ontology
Thoughts on SB-1047
SB-1047 is a proposed California AI regulation bill. Dean Ball and Zvi Mowshowitz, among others, have commented on the bill. Here I present some summary and comments in relation to the bill and Dean and Zvi’s commentaries.
The bill itself
First, the bill appears to be mainly targeted at misuse risk, not accident risk:
If not properly subject to human controls, future development in artificial intelligence may also have the potential to be used to create novel threats to public safety and security, including by enabling the creation and the proliferation of weapons of mass destruction, such as biological, chemical, and nuclear weapons, as well as weapons with cyber-offensive capabilities.
22602 specifies key terms, including the following:
(f) “Covered model” means an artificial intelligence model that meets either of the following criteria:
(1) The artificial intelligence model was trained using a quantity of computing power greater than 10^26 integer or floating-point operations.
(2) The artificial intelligence model was trained using a quantity of computing power sufficiently large that it could reasonably be expected to have similar or greater performance as an artificial intelligence model trained using a quantity of computing power greater than 10^26 integer or floating-point operations in 2024 as assessed using benchmarks commonly used to quantify the general performance of state-of-the-art foundation models.
Zvi points out that the 10^26 limit is so high that it only plausibly covers GPT-4, Gemini Ultra and Claude. So this is more relevant as applying to future models rather than current ones.
(i) (1) “Derivative model” means an artificial intelligence model that is a derivative of another artificial intelligence model, including either of the following:
(A) A modified or unmodified copy of an artificial intelligence model.
(B) A combination of an artificial intelligence model with other software.
(2) “Derivative model” does not include an entirely independently trained artificial intelligence model.
Seems quite broad. Suppose someone took an open weights AI model, and then did much more training on top of that model. As stated, that would be a derivative model. As would small modifications like, say, a web browser that accesses an AI model.
(m) “Full shutdown” means the cessation of operation of a covered model, including all copies and derivative models, on all computers and storage devices within custody, control, or possession of a person, including any computer or storage device remotely provided by agreement.
Easy for closed models. For open models, if “a person” refers to the developer, then this is feasible. I think this is the most likely interpretation.
(n) (1) “Hazardous capability” means the capability of a covered model to be used to enable any of the following harms in a way that would be significantly more difficult to cause without access to a covered model:
(A) The creation or use of a chemical, biological, radiological, or nuclear weapon in a manner that results in mass casualties.
(B) At least five hundred million dollars ($500,000,000) of damage through cyberattacks on critical infrastructure via a single incident or multiple related incidents.
(C) At least five hundred million dollars ($500,000,000) of damage by an artificial intelligence model that autonomously engages in conduct that would violate the Penal Code if undertaken by a human.
(D) Other threats to public safety and security that are of comparable severity to the harms described in paragraphs (A) to (C), inclusive.
(2) “Hazardous capability” includes a capability described in paragraph (1) even if the hazardous capability would not manifest but for fine tuning and posttraining modifications performed by third-party experts intending to demonstrate those abilities.
Clearly, “hazardous capability” being realized would refer to something very serious. Perhaps the least seriously harmful hazardous capability would be something like use of AI to enable financial scams or hacking (e.g. one could imagine FTX using ChatGPT to make their financial fraud significantly easier). Part 2 is saying that red teaming is a valid method for demonstrating hazardous capabilities.
22603 specifies “limited duty exemption” for models, plus the most stringent regulations for non-derivative non-covered models without such exemptions:
(2) A developer may determine that a covered model qualifies for a limited duty exemption if the covered model will have lower performance on all benchmarks relevant under subdivision (f) of Section 22602 and does not have greater general capability than either of the following:
(A) A noncovered model that manifestly lacks hazardous capabilities.
(B) Another model that is the subject of a limited duty exemption.
Mostly this doesn’t seem very important; in general, it will be rare for a covered model to have lower performance on all relevant benchmarks than a non-covered model. Models can be evaluated as noncovered post testing (22603(c)), so this loosens pre-training requirements in limited cases. What about non-derivative covered models that don’t have limited duty exemptions?
(b) Before initiating training of a covered model that is not a derivative model and is not the subject of a limited duty exemption, and until that covered model is the subject of a limited duty exemption, the developer of that covered model shall do all of the following:
(1) Implement administrative, technical, and physical cybersecurity protections to prevent unauthorized access to, or misuse or unsafe modification of, the covered model, including to prevent theft, misappropriation, malicious use, or inadvertent release or escape of the model weights from the developer’s custody, that are appropriate in light of the risks associated with the covered model, including from advanced persistent threats or other sophisticated actors.
Unclear how this applies to open weights models. Since this is about measures prior to training, probably it is about preventing misuse and so on prior to training finishing, model testing, etc.
(2) Implement the capability to promptly enact a full shutdown of the covered model.
As said before, this is easy for closed weight models, and its feasibility for open weight models depends on the interpretation of “a person” in the definition of “full shutdown”, where Zvi and I think the most likely interpretation is that “a person” means the developer.
(3) Implement all covered guidance.
(4) Implement a written and separate safety and security protocol that does all of the following:
(A) Provides reasonable assurance that if a developer complies with its safety and security protocol, either of the following will apply:
(i) The developer will not produce a covered model with a hazardous capability or enable the production of a derivative model with a hazardous capability.
(ii) The safeguards enumerated in the policy will be sufficient to prevent critical harms from the exercise of a hazardous capability in a covered model.
(B) States compliance requirements in an objective manner and with sufficient detail and specificity to allow the developer or a third party to readily ascertain whether the requirements of the safety and security protocol have been followed.
I assume “either” means only one of (i) and (ii) has to be apply. (ii) seems less strict, so I’ll focus on this.
This is hard not just for open weights models but for closed weights models. First of all, jailbreaking a sufficiently capable model would enable hazardous capabilities. Jailbreaking is an unsolved problem. Second, the developer not only has to prevent critical harms from hazardous capability, but also has to specify a protocol that, if followed by “a developer” (generic), would prevent this; it must be possible for a third party to ascertain whether the protocol has been followed. This is a substantially harder problem that requires formalizing a highly reliable AI security protocol that will succeed if followed by third parties. I do not believe that the development of such a protocol is on the horizon in the next 5 years or so (probably longer). Given this, I assume AI companies training covered models would strongly prefer to train them outside CA.
The requirement is potentially less strict due to terms like “reasonable assurance”, which increases the degree of subjectivity in the interpretation of the rules.
(C) Identifies specific tests and test results that would be sufficient to reasonably exclude the possibility that a covered model has a hazardous capability or may come close to possessing a hazardous capability when accounting for a reasonable margin for safety and the possibility of posttraining modifications, and in addition does all of the following:
(i) Describes in detail how the testing procedure incorporates fine tuning and posttraining modifications performed by third-party experts intending to demonstrate those abilities.
(ii) Describes in detail how the testing procedure incorporates the possibility of posttraining modifications.
(iii) Describes in detail how the testing procedure incorporates the requirement for reasonable margin for safety.
(iv) Describes in detail how the testing procedure addresses the possibility that a covered model can be used to make posttraining modifications or create another covered model in a manner that may generate hazardous capabilities.
(v) Provides sufficient detail for third parties to replicate the testing procedure.
Similar to the security protocol, arguably redundant. Reflects a “defense in depth” approach where both the protocol and tests are necessary. Developing the test procedure has similar difficulties to developing the security protocol, but if the security protocol works, the tests should pass.
(D) Describes in detail how the developer will meet requirements listed under paragraphs (1), (2), (3), and (5).
(E) If applicable, describes in detail how the developer intends to implement the safeguards and requirements referenced in paragraph (1) of subdivision (d).
(F) Describes in detail the conditions that would require the execution of a full shutdown.
(G) Describes in detail the procedure by which the safety and security protocol may be modified.
(H) Meets other criteria stated by the Frontier Model Division in guidance to achieve the purpose of maintaining the safety of a covered model with a hazardous capability.
(5) Ensure that the safety and security protocol is implemented as written, including, at a minimum, by designating senior personnel responsible for ensuring implementation by employees and contractors working on a covered model, monitoring and reporting on implementation, and conducting audits, including through third parties as appropriate.
(6) Provide a copy of the safety and security protocol to the Frontier Model Division.
(7) Conduct an annual review of the safety and security protocol to account for any changes to the capabilities of the covered model and industry best practices and, if necessary, make modifications to the policy.
(8) If the safety and security protocol is modified, provide an updated copy to the Frontier Model Division within 10 business days.
(9) Refrain from initiating training of a covered model if there remains an unreasonable risk that an individual, or the covered model itself, may be able to use the hazardous capabilities of the covered model, or a derivative model based on it, to cause a critical harm.
Since derivative models are quite broad, this is a strict requirement for open weights models, though the word “unreasonable” has room for subjective interpretation.
(10) Implement other measures that are reasonably necessary, including in light of applicable guidance from the Frontier Model Division, National Institute of Standards and Technology, and standard-setting organizations, to prevent the development or exercise of hazardous capabilities or to manage the risks arising from them.
This empowers a new governing body to create further requirements on non-derivative covered models. As such, there is little assurance that the requirements will not become even more strict in the future.
There are additional requirements prior to release:
(d) Before initiating the commercial, public, or widespread use of a covered model that is not subject to a limited duty exemption, a developer of the nonderivative version of the covered model shall do all of the following:
(1) Implement reasonable safeguards and requirements to do all of the following:
(A) Prevent an individual from being able to use the hazardous capabilities of the model, or a derivative model, to cause a critical harm.
(B) Prevent an individual from being able to use the model to create a derivative model that was used to cause a critical harm.
(C) Ensure, to the extent reasonably possible, that the covered model’s actions and any resulting critical harms can be accurately and reliably attributed to it and any user responsible for those actions.
(2) Provide reasonable requirements to developers of derivative models to prevent an individual from being able to use a derivative model to cause a critical harm.
(3) Refrain from initiating the commercial, public, or widespread use of a covered model if there remains an unreasonable risk that an individual may be able to use the hazardous capabilities of the model, or a derivative model based on it, to cause a critical harm.
(4) Implement other measures that are reasonably necessary, including in light of applicable guidance from the Frontier Model Division, National Institute of Standards and Technology, and standard-setting organizations, to prevent the development or exercise of hazardous capabilities or to manage the risks arising from them.
Mostly, these are similar to the pre-training requirements. Since the definition of derivative model is quite broad, it seems infeasible for a developer of a capable open weights model to in general prevent creation of derivative models used to cause a critical harm.
This mostly covers regulation of the models themselves.
22604 specifies additional regulations on computing clusters (such as AWS). In cases when a customer is accessing enough compute to train a covered model, they must do KYC, monitor covered model deployment, and have full shutdown functionality. These are not especially strict requirements.
22605 specifies that covered models and cloud computing services must provide price schedules and avoid unlawful discrimination or non-competitive activity. Relatively minor.
22606 specifies that violations are treated civilly, and may pierce the corporate veil in cases where the corporate structure was to “purposefully and unreasonably limit or avoid liability”. Unclear what the standard is; basically, LLCs are less protective of liability than they otherwise would be.
22607 specifies whistleblower protections for employees reporting possible violations. Reasonable in isolation.
11547.6 specifies the creation of the Frontier Model Division, discussed earlier as having authority to impose additional requirements on non-derivative covered models. The critical question is: who is this? I don’t know how to determine that, which is important given how much power the Division would have.
11547.7 specifies the creation of “a public cloud computing cluster, to be known as CalCompute, with the primary focus of conducting research into the safe and secure deployment of large-scale artificial intelligence models and fostering equitable innovation”. A government program that may or may not be useful, but probably not significantly harmful.
The good parts
Some people believe AI will advance rapidly in the coming decades and will pose serious risks that need regulation to avoid disaster. Other people believe AI will plateau, and is more like “any other technology”, and strict regulations will be counterproductive. There are other more marginal positions, but these are the main ones I care about. Specifying compute thresholds and capability thresholds seems like a good way to distinguish these two views on AI based on beliefs, so that regulation kicks in more in the worlds where AI advances rapidly than when it doesn’t. This could produce a compromise in subjective Bayesian expected utility for both views. (Of course, whether such a compromise is a Pareto improvement relative to the default depends a lot on the details.)
Specifying criteria for “hazardous capability” seems useful for having specific criteria for large harms. It makes sense for AI regulation to focus on cases of potential large harms, decreasing overall regulatory burden while avoiding the most significant risks.
Given an otherwise reasonable law, whistleblower protections make sense.
The neutral parts
Limited duty exemptions don’t seem to matter. Nor do post-training testing requirements (due to redundancy with general security protocols). Regulations on compute clusters probably don’t do much given that compute clusters are likely already monitoring very large uses of compute. Preventing non-competitive pricing may or may not be a good idea, but probably isn’t very important. CalCompute may or may not be a good government project, but it is unlikely to be very harmful.
The bad parts
There is no known security protocol that would ensure against hazardous capabilities (above a certain capability level). This is especially true for open weight models, which can be modified/jailbroken more easily; the notion of “derivative models” is broad. Even use of language models to make large financial fraud easier would count as a hazardous capability (depending on how “significantly more difficult” is interpreted). Developers must protect against intentional misuse/jailbreaking, not just accidents (see e.g. 26603(d)(1)).
Given this, I don’t see how anyone training a covered model in CA can, in good faith, provide assurance that they are following a security protocol that in general, not just in their specific case, prevents development of hazardous capabilities. Perhaps this would lead to large AI companies moving out of CA. Personally, I think this would be a bad thing, not just because I think I benefit from being in CA adjacent to large model training, but because CA as a center for AI enables more coordination among AI developers.
There are potential ways around the requirements through (a) technically violating it, but gaining favor with regulators, (b) having words like “reasonable” interpreted favorably. This would imply that training sufficiently capable models would not be a matter of having the right to do it (by satisfying objectively evaluable requirements), but being favored subjectively by regulators.
Improving the bill
Here are two proposed bills that I think are superior to this one:
Just get rid of the bill. Maybe keep “neutral” things like the monitoring requirements for cloud computing services and CalCompute, if for some reason there has to be an AI bill.
Just ban non-derived covered model training in CA entirely.
As I’ve argued, the requirements for non-derived covered models are so strict that I don’t see how anyone could in good faith claim to have met them. I think a simple ban would be much clearer, and leave much less room for misinterpretation. To be clear, I think option 1 is better than option 2, on the basis that I think it would be bad for AI companies to leave CA, but I think this bill is worse than either option. If option 2 seems too strict, perhaps consider raising the compute threshold, say, to 2^28 FLOPs.
One can ask the question of why AI incumbents such as OpenAI are not freaking out more, given this. One guess is that they expect to more easily get away with technical violations through regulatory capture and so on. If this is true, my guess is that this outcome is worse than a simple ban.
Responding to Dean Ball
I mostly agree with Dean Ball. He says:
What does it mean to be a covered model in the context of this bill? Basically, it means developers are required to apply the precautionary principle not before distribution of the model, but before training it.
I agree this is the most straightforward interpretation of: “The safeguards enumerated in the policy will be sufficient to prevent critical harms from the exercise of a hazardous capability in a covered model.”
Dean says:
A developer can self-certify (with a lot of rigamarole) that their model has a “positive safety determination,” but they do so under pain and penalty of perjury. In other words, a developer (presumably whoever signed the paperwork) who is wrong about their model’s safety would be guilty of a felony, regardless of whether they were involved in the harmful incident.
Scott Weiner says that perjury only applies to intentionally lying to the government, not merely to mistakes. I am not a legal expert here, but Scott seems more likely to be correct.
Responding to Zvi Mowshowitz
Zvi is basically supportive of the bill, and critical of Dean Ball’s take. I am writing this partially because I discussed the bill with Zvi and he said he would be interested in potential improvements to the bill (which I presented previously).
Zvi says:
My worry is that this has potential loopholes in various places, and does not yet strongly address the nature of the future more existential threats. If you want to ignore this law, you probably can.
For most people, “you” can ignore the bill. If you’re training a covered model, I am not sure how you could possibly ignore this bill, given the strict requirements for security protocols.
I agree with Zvi that the 10^26 limit is quite high and only applies to a few current models, at most. As such, this is more relevant to future models.
Zvi says:
Bell is also wrong about the precautionary principle being imposed before training.
I do not see any such rule here. What I see is that if you cannot show that your model will definitely be safe before training, then you have to wait until after the training run to certify that it is safe.
22603(b) applies “before initiating training”. 22603(c) seems to be an additional requirement for post-training testing, not an escape clause.
Zvi says:
The arguments against such rules often come from the implicit assumption that we enforce our laws as written, reliably and without discretion. Which we don’t. What would happen if, as Eliezer recently joked, the law actually worked the way critics of such regulations claim that it does? If every law was strictly enforced as written, with no common sense used, as they warn will happen? And someone our courts could handle the case loads involved? Everyone would be in jail within the week.
When people see proposals for treating AI slightly more like anything else, and subjecting it to remarkably ordinary regulation, with an explicit and deliberate effort to only target frontier models that are exclusively fully closed, and they say that this ‘bans open source’ what are they talking about?
I don’t think it is hard to write laws that would be good to enforce as written. For example, laws against CFCs, running red lights, and shoplifting seem, if not currently good to enforce as written, easy to adapt.
I think there is a general problem with American laws being written to be broken. I care about this general problem, but I see how someone with a specific object-level agenda could think it’s worth passing a law written to be broken, even if it contributes to a generally bad problem. I don’t particularly have a pro AI regulation agenda at this time, so this doesn’t apply to me.
As I specified, I would be happier with a complete ban on training covered models in CA than the bill as written. I don’t think this would be net positive, but better. The definition of a covered model is relatively clear compared to other parts of the bill, and enforcement should be fairly straightforward. I think someone who thinks the bill has the right idea, but that my proposed modification is too strict, should consider a modification that bans covered models, but changes the definition of covered model to be a more powerful model than the current definition.
I talked with Zvi offline and he said his legal theory on this has to do with “legal literalism/formalism” versus “legal realism”; “legal formalism” interprets laws as they’re explicitly written, and “legal realism” interprets the laws as they’re put into practice, e.g. in interpreting words as they’re commonly interpreted in courts rather than what they literally mean.
I think that, while legal formalism has problems, I don’t have a better way to interpret laws like this. I see legal realism as advanced legal theory (requiring, for example, a history of case law to interpret non-literal meanings of words), and I am not a legal expert, and I don’t think Zvi is either. I prefer for laws to be written so that they can be interpreted literally, without requiring advanced legal theory.
Conclusion
I think this law interpreted literally amounts to a ban on training covered models in CA (unless words like “reasonable” are interpreted loosely), is net negative (due to causing AI companies to move out of CA, preventing some forms of coordination), and is inferior to a simple ban on training covered models in CA. Perhaps there are ways for companies to get around this and falsely claim to have compliant security protocols, but I expect that in practice these claims will be in bad faith and will only be accepted due to regulatory capture / favoritism. My guess is that, in general, creating ways around laws through bad faith security protocols and regulatory capture makes things worse. In AI safety, one particular problem is that legal-fiction disinformation about AI security protocols could create actual harmful confusion about which AI systems are safe.
Therefore, I don’t support this bill, I think it is worse than no bill, and also worse than a simple ban on training non-derived covered models.
Subscribe to Unstabler Ontology
Even less stable than unstableontology.com