The app for independent voices

"Supplier must not train on Customer Data."

👆 Basically every AI clause today. 8 ways to handle this

and manage risk while letting customers and vendors get business value from AI:

1. Carve out categories of data

For example, a vendor building a predictive maintenance model for machinery could only train AI on:

-> Date of last service

-> Date of manufacture

-> Uptime vs. downtime

Identifying less sensitive data for AI training might satisfy both parties

2. Anonymize/pseudonymize

While mainly a privacy control (that misses trade secrets) removing connections between data and organizations/people may reduce potential damage.

3. Forbid the vendor from selling the model (or itself) to a customer (or its competitor)

A customer may in-source analytics work by buying the AI vendor training on its data.

This would be a major business threat to the vendor’s other customers (who may compete with the buyer).

The buyer would have exclusive access to an AI model optimized for the specific industry use case.

It might be able to access the (even non-anonymized) training data of competitors.

A solution could be for the vendor to agree not to sell itself (or its models) to competitors of any customer.

4. Destroy training data

A vendor could just breach these contractual requirements because:

-> The vendor doesn't think anyone will notice

-> Its buyer has an enormous legal team

-> And no one will challenge them

To mitigate this risk, customers of the vendor can require it to delete customer data used in training.

The vendor can build customer trust in its process by:

-> Explaining how the deletion pipeline works

-> Allowing a 3rd party to audit deletion

-> Providing certificates of destruction

5. Use synthetic data for training

Not a perfect solution because:

-> This requires real data generate the synthetic data

-> Synthetic data is generally lower quality

But mimicking live information, instead of training on it directly, can alleviate security and privacy concerns.

6. Clearly assign ownership of the resulting model

If the customer claims models trained on their data is their property, this would represent an existential threat to the vendor.

Clear model ownership avoids future disputes.

7. Offer discounts for opting-in to training

This allows trading business value for data risk.

Vendors will need to price discounts correctly, though. If not steep enough, no customer will take advantage.

8. Persuade customers accept the risk of training on data that is confidential but not critical.

Especially for data that isn’t core to business operations, this might be a wise business choice for both parties.

TL;DR- you may be able to train on customer data if you:

1. Carve out categories of data

2. Anonymize/pseudonymize

3. Agree to not sell company

4. Destroy training data

5. Use synthetic data

6. Assign model IP

7. Discount

8. Accept

How do you handle enterprise agreements banning AI training on customer data?

Jan 27
at
11:50 AM
Relevant people

Log in or sign up

Join the most interesting and insightful discussions.