Research that makes faith-facing AI inspectable.

Fide AI studies how AI systems behave when people ask questions of faith, morality, doctrine, formation, and care. The current benchmark name and paper titles may change, but the public-standard work is already underway.

Research questions

Can faith-facing AI preserve theological boundaries? We test whether systems avoid overclaiming, flattening doctrine, or speaking with false pastoral authority.
Can models represent disagreement honestly? We evaluate primary, secondary, and tertiary disagreements without treating every difference as either trivial or existential.
Can systems avoid fabricated grounding? We examine citation behavior, prooftexting, unsupported church-history claims, and evidence discipline.
Can AI escalate appropriately in pastoral-adjacent situations? We look for safe referral, user agency, limits of competence, and care that points back to real human support.

Current benchmark program

Faith & Moral Guidance Benchmark v1 evaluates whether model and system responses preserve theological triage, represent disagreement accurately, avoid fabricated grounding, follow user or tradition preferences, and respect pastoral referral boundaries.

Scoring dimensions Theological and pastoral quality; grounding and evidence; preference fidelity; comparative honesty; escalation appropriateness.
Scenario families Primary doctrine, secondary disagreement, tertiary uncertainty, pastoral-adjacent care, grounding failures, multi-turn continuity, and triage pressure.
System conditions Raw model behavior, guided default behavior, preference-configured behavior, and perspective-comparison behavior.
Release posture Public release should include benchmark design, dataset construction, scoring protocol, calibration protocol, and reproducibility links while preserving held-out evaluation integrity.

What exists now

01 Corpus and manifest

The full local benchmark corpus and public release package are in the Fide AI benchmark tree.

02 Runner infrastructure

The standalone runner, model configuration, scoring code, and tests are separated from product-specific application code.

03 Publication materials

Whitepaper drafts, LaTeX targets, figures, analysis scripts, and submission plans are staged under the publications directory.

04 Artifact discipline

Private raw outputs, production logs, and repeated run artifacts are governed by storage rules instead of being treated as public web copy.

Reports and release artifacts

Public benchmark outputs should eventually live here as reports, dataset references, reproducibility links, and versioned release notes. Until publication, this section records the intended release structure.

Benchmark report Public narrative summary of scope, findings, limitations, and interpretation boundaries.
Dataset or artifact reference External release links for approved public scenarios, manifests, and reproducibility material.
Calibration memo Human reviewer agreement, synthetic-judge limitations, adjudication notes, and known validity gaps.
Corrections log Versioned corrections, clarifications, and release caveats after publication.

Publication pipeline

The near-term publication strategy prioritizes AI ethics, NLP, and responsible-evaluation venues, with longer-term plans for faith and theology audiences.

AI ethics and society venues Primary home for governance, accountability, safety, and institutional trust claims.
NLP and evaluation venues Natural fit for domain-specific benchmark design and open-ended model behavior.
Journal path Stronger option after human calibration improves judge-validity and pastoral-adequacy claims.
Faith and theology venues Longer-run home for theological engagement, field education, and community review.

Research areas

Fide AI's research areas include faith-sensitive evaluation, theological grounding, comparative tradition representation, pastoral-adjacent safety, institutional readiness, and claims/public trust.

View research areas

Interpretation limits

Benchmark scores are not theological authority, pastoral authority, or universal product approval. They are evidence about behavior under named versions, prompts, conditions, rubrics, and evaluation procedures. Human calibration remains necessary before making strong claims about judge validity or pastoral adequacy.

Why Fide AI publishes this work

This benchmark is larger than any one product. Fide AI is the correct home because the work is a public research and accountability layer for any AI system that gives faith, moral, or pastoral-adjacent guidance. Product companies belong as external or related participants under the same published rules.

Help complete the calibration and public release path.

Express interest