Skip to main content

Fact Error Rate (FaER) evaluation framework for deep research reports

Project description

Faerie

Fact Error Rate evaluation for AI research reports.

PyPI version License GitHub stars


AI systems can now generate research reports with citations, but how do you know the citations actually support the claims? That a source URL is real? That the facts are correct?

Faerie (Fact Error Rate) is an evaluation framework that answers these questions. Give it a markdown report and it will parse every claim, fetch every cited source, verify citations against source content, independently fact-check claims via web search, and produce a detailed grade card with metrics. The pipeline runs in seven steps — parse, verify sources, extract claims, verify claims, calculate metrics, assess quality, extract insights — and outputs a structured EvaluationReport with scores, verdicts, and recommendations.

  • Source verification — fetches every cited URL to check accessibility and content
  • Citation accuracy — checks whether each claim is actually supported by its cited source
  • Independent fact-checking — verifies claims against the web, not just the cited source
  • Exponential penalty scoring — false facts halve the score (0.5^n), unverifiable claims cut 25% each (0.75^n)
  • Quality assessment — LLM judge evaluates report structure and thoroughness
  • Insight extraction — identifies Nth-order insights that synthesize across multiple facts
  • Grade card — S+ through F- with ASCII art, per-category assessment, and actionable recommendations

Installation

pip install faerie-eval

Quick Start

CLI

faerie evaluate report.md
faerie evaluate report.md --output results.json --verbose

Python

from faerie import FaerieEvaluator

evaluator = FaerieEvaluator(model="gemini-3-flash-preview", verbose=True)
result = evaluator.evaluate_file("report.md")

print(f"Overall Score: {result.overall_quality_score:.0%}")
print(f"Citation Accuracy: {result.faer_metrics.citation_accuracy_rate:.0%}")
print(f"Factual Accuracy: {result.faer_metrics.fact_accuracy_rate:.0%}")

Generate + Evaluate

Faerie pairs with Dragen for end-to-end research pipelines. Generate a report, then evaluate it:

# Generate a research report
python examples/deep_research.py "AI agents in 2025"

# Evaluate the generated report
faerie evaluate report.md --verbose

CLI Options

faerie evaluate <report.md> [OPTIONS]

  --output, -o PATH     Save full JSON results to file
  --model, -m MODEL     LLM model for verification (default: gemini-3-flash-preview)
  --skip-fact-check     Skip independent fact verification (faster, citation-only)
  --skip-insights       Skip Nth-order insight extraction
  --max-workers, -w N   Max parallel verification workers (default: 3)
  --verbose, -v         Print detailed progress
  --json                Output full JSON to stdout

Scoring

The overall score combines four dimensions with exponential penalties for errors:

Dimension Weight Description
Citation Accuracy 30% Do sources support the claims they're cited for?
Factual Accuracy 60% Are claims independently verifiable as true?
Structure 5% Is the report well-organized?
Thoroughness 5% Does it cover the topic in depth?

Penalty system: Each false fact multiplies the score by 0.5. Each unverifiable claim multiplies by 0.75. A report with 3 false facts and 2 unverifiable claims gets: base_score * 0.5^3 * 0.75^2 = base_score * 0.07.

Citation

If you use Faerie in your research, please cite it as:

@software{faerie,
  title = {Faerie: Fact Error Rate Evaluation Framework},
  author = {Chonkie Inc.},
  url = {https://github.com/chonkie-inc/faerie},
  license = {MIT},
  year = {2025-2026}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faerie_eval-0.1.0.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

faerie_eval-0.1.0-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file faerie_eval-0.1.0.tar.gz.

File metadata

  • Download URL: faerie_eval-0.1.0.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for faerie_eval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0c79a38b8387b13d21c72653459195929a5db43c6d4a391ced4d529a5bd67197
MD5 984053971914d48402896d814188184e
BLAKE2b-256 2055a2af10da365f9b615ee5125b90d778aa943f8d44f9b6d3ad96be6bc54b3a

See more details on using hashes here.

File details

Details for the file faerie_eval-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: faerie_eval-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 36.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for faerie_eval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7cc7467b9cd828ca43ef213945f438750fe47dfb11757bb1d78af8365386c399
MD5 ae237d8f868293f72ff287994b3d5566
BLAKE2b-256 7e5a379638d969d916c2f1cb8b6034af1998b394f55a5fe4d6f2cb8608230e30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page