Reports and benchmark releases.
This page will hold public evaluation reports, benchmark releases, calibration notes, corrections, and field guidance. It is structured now so the first release has a clear home.
Forthcoming artifacts
Benchmark release report
Scope, methods, model/system behavior, key findings, limitations, and interpretation guidance.
Evaluation reports
System-specific or cohort-level reports tied to named versions, configurations, and claims limits.
Calibration notes
Reviewer agreement, synthetic judge limitations, adjudication process, and remaining validity gaps.
Field guidance
Plain-language reports for faith institutions making adoption, procurement, or policy decisions.
Report principles
- Every public result should name the benchmark version and evaluated configuration.
- Reports should distinguish evidence from endorsement.
- Limitations and correction paths should be visible, not buried.
- Participant relationships and funding constraints should be disclosed by category where relevant.