The app for independent voices

I read China’s newly-released draft rules on AIGC. Initial thoughts on some noteworthy sections with my own annotation:

ARTICLE 7: outlines all the ways that a Provider of AIGC products must guarantee that both pre-training data and optimization data sources must be legal, including:

  • Copyright infringement

  • Privacy of personal information (likely follow the framework laid out last year in the Personal Information Protection Law)

  • Ensure the “authenticity, accuracy, objectivity, and diversity” of data (not sure what “diversity” here means and how it’ll be evaluated or enforced. This requirement also goes well beyond legality; you can technically obtain data that is legal, but not “accurate” or “diverse”. Could be a way to expand scope of these rules to regulate input training data)

I’m mildly impressed that the draft rules specify both pre-training and optimization data. Regulators seem to have at least a basic understanding of how AIGC works in the context of RLHF 

“Provider” is a legal term here that includes all organizations and individuals that provide AIGC in the form of chat, text, image, and audio.

ARTICLE 8: Providers must also formulate “clear, specific, and actionable rules for data labeling” and properly train human labelers. (First time I’ve seen human labelers being prominently mentioned in any AI policy or regulation. Human labelers are an important part of the AIGC pipeline that few talk about, and often reside outside a country’s jurisdiction. Tesla used to use labelers based in Africa for Autopilot. Regulators clearly think they are important enough to warrant their own section to exert control, onus being placed on the Providers who employ these labelers.)

ARTICLE 15: If users report content that violates the rules, not only do Providers have to filter them out, the model that generated the “bad content” also must be updated within 3 months. (Relying on users to report objectionable content is a main tool that regulators use to keep Chinese internet “clean”, and out-of-no-where complaints often cause certain websites to go down, get censored, before coming back online again. To comply with this rule, taking stuff down is not enough; AIGC providers have to quickly find where in the model generated the bad stuff and fix that too!) 

ARTICLE 4: This part outlines all the ways that AIGC content will be considered illegal, inappropriate, objectionable, etc. to trigger regulatory enforcement or user complaints. All the typical political content restrictions that one might expect are included, so nothing super-newsworthy here even though this section will get the most media coverage anyways. The only part that stood out to me is 4.3, where the rule prohibits “unfair competition by exploiting advantages in algorithms, data, and platforms” which is a nod to China’s antitrust crackdown that fined Alibaba and Meituan for abusing their platform advantages.

What other sections do you all find interesting or worth discussing? In a perfect world where regulators can learn from each other objectively to face global-scale problems together without political pressure or geopolitical tension (I know, I know, wishful thinking), which sections present some good ideas that the US ought to adopt in its own AIGC regulations (someday)?

Apr 11, 2023
at
10:20 PM

Log in or sign up

Join the most interesting and insightful discussions.