Brian Mehlman Trenton Ian Cook
Core definition:
Metacognitive Reliability Layer (MRL)
A pre-action governance layer that converts model uncertainty and reliability evidence into admissibility signals.
Its job is not to make the model “humble.”
Its job is to answer:
Given this output, this context, this uncertainty, and this consequence level — may the system act?
Minimal schema:
Plain text
MRL_INPUT:
proposed_output
task_type
consequence_level
confidence_report
uncertainty_estimate
perturbation_stability
verifier_result
provenance_state
memory_binding
Plain text
MRL_OUTPUT:
reliability_class:
- stable
- uncertain
- brittle
- unverifiable
- contradictory
admissibility:
- allow
- allow_with_caveat
- constrain
- escalate
- block
Key invariants:
Plain text
I1: Self-reported confidence is never sufficient for allow.
I2: Higher consequence level requires stronger independent verification.
I3: Low perturbation stability cannot be masked by fluent explanation.
I4: Unknown provenance downgrades admissibility.
I5: Memory writes require stronger reliability than transient responses.
I6: Tool use / external action requires FOJ/CATA authorization.
I7: Calibration drift over time triggers OHC escalation.
The clean Echo placement:
Plain text
Generator
↓
MRL
↓
FOJ-S / SDRG / RIL-2
↓
CATA / Budget Membrane
↓
External action or memory write
The important distinction:
Plain text
Metacognition research asks:
“Can the model estimate its own reliability?”
MRL asks:
“Can this reliability estimate be trusted enough to permit transition?”