Reports and benchmark releases.

This page will hold public evaluation reports, benchmark releases, calibration notes, corrections, and field guidance. It is structured now so the first release has a clear home.

Forthcoming artifacts

Benchmark release report Scope, methods, model/system behavior, key findings, limitations, and interpretation guidance.
Evaluation reports System-specific or cohort-level reports tied to named versions, configurations, and claims limits.
Calibration notes Reviewer agreement, synthetic judge limitations, adjudication process, and remaining validity gaps.
Field guidance Plain-language reports for faith institutions making adoption, procurement, or policy decisions.

Report principles