AI doesn’t exist without data.
Which makes tracking, classifying, and organizing it so key for AI governance and security.
Two new standards help companies tackle this problem:
-> HITRUST’s AI Security Assessment with Certification
-> ISO/IEC 42001:2023
What HITRUST requires:
Baseline Unique IDs (BUID) 07.07aAISecOrganizational.4-5 state, for data used:
-> to train, fine-tune, test, and validate AI models
-> in retrieval-augmented generation (RAG)
Organizations must:
-> Maintain a catalog of trusted data sources
-> Inventory data used, including at least:
--- Provenance
--- Sensitivity
For ISO 42001, organizations don’t NEED to do any of these. But these (optional) Annex A controls require tracking:
A.4.3: Data resources
This includes information about:
-> Retention
-> Intended use
-> Update/modification
-> Quality (duplicating A.7.4 in my opinion)
-> Provenance (duplicative as well, of A.7.3 and A.7.5)
of “data resources utilized for the AI system.”
It doesn’t specifically say “training,” so this can include data for AI processing.
A.7.2: Data for development/enhancement of AI systems
This requirement focuses on data’s:
-> Privacy and security implications (duplicates A.7.3)
-> Potential security and safety threats
-> Accuracy/integrity (duplicates A.7.4)
-> Transparency and explainability
-> Representativeness
A.7.3: Acquisition of data
A broad control you can summarize as “data governance.” It requires noting:
-> Sources
-> Categories
-> Quantities
-> Demographics/biases
-> Data rights/ownership
-> Privacy and security requirements
A.7.4: Quality of data
ISO/IEC 25024:2015 (the relevant reference) defines data quality as the degree to which the data’s
-> characteristics satisfy
-> stated and implied needs
-> when used under specified conditions.
A.7.5: Data provenance
This is information about data’s:
-> update
-> creation
-> validation
-> abstraction
-> transcription
-> transfer of control
A.7.6: Data preparation
This control requires documenting granular steps in the model training process like:
-> Encoding
-> Data cleaning
-> Normalization
The verdict?
ISO 42001's Annex A controls have much heavier demands for data management
This makes sense because 42001 is an AI governance standard, while HITRUST’s certification is a security-focused one.
There is understandably a lot of overlap, though.
Are you considering ISO 42001 or HITRUST certification (or both)?