Fact Error Rate (FaER) evaluation framework for deep research reports

Project description

Faerie

Fact Error Rate evaluation for AI research reports.

AI systems can now generate research reports with citations, but how do you know the citations actually support the claims? That a source URL is real? That the facts are correct?

Faerie (Fact Error Rate) is an evaluation framework that answers these questions. Give it a markdown report and it will parse every claim, fetch every cited source, verify citations against source content, independently fact-check claims via web search, and produce a detailed grade card with metrics. The pipeline runs in seven steps — parse, verify sources, extract claims, verify claims, calculate metrics, assess quality, extract insights — and outputs a structured EvaluationReport with scores, verdicts, and recommendations.

Source verification — fetches every cited URL to check accessibility and content
Citation accuracy — checks whether each claim is actually supported by its cited source
Independent fact-checking — verifies claims against the web, not just the cited source
Exponential penalty scoring — false facts halve the score (0.5^n), unverifiable claims cut 25% each (0.75^n)
Quality assessment — LLM judge evaluates report structure and thoroughness
Insight extraction — identifies Nth-order insights that synthesize across multiple facts
Grade card — S+ through F- with ASCII art, per-category assessment, and actionable recommendations

Installation

pip install faerie-eval

Quick Start

CLI

faerie evaluate report.md

faerie evaluate report.md --output results.json --verbose

Python

from faerie import FaerieEvaluator

evaluator = FaerieEvaluator(model="gemini-3-flash-preview", verbose=True)
result = evaluator.evaluate_file("report.md")

print(f"Overall Score: {result.overall_quality_score:.0%}")
print(f"Citation Accuracy: {result.faer_metrics.citation_accuracy_rate:.0%}")
print(f"Factual Accuracy: {result.faer_metrics.fact_accuracy_rate:.0%}")

Generate + Evaluate

Faerie pairs with Dragen for end-to-end research pipelines. Generate a report, then evaluate it:

# Generate a research report
python examples/deep_research.py "AI agents in 2025"

# Evaluate the generated report
faerie evaluate report.md --verbose

CLI Options

faerie evaluate <report.md> [OPTIONS]

  --output, -o PATH     Save full JSON results to file
  --model, -m MODEL     LLM model for verification (default: gemini-3-flash-preview)
  --skip-fact-check     Skip independent fact verification (faster, citation-only)
  --skip-insights       Skip Nth-order insight extraction
  --max-workers, -w N   Max parallel verification workers (default: 3)
  --verbose, -v         Print detailed progress
  --json                Output full JSON to stdout

Scoring

The overall score combines four dimensions with exponential penalties for errors:

Dimension	Weight	Description
Citation Accuracy	30%	Do sources support the claims they're cited for?
Factual Accuracy	60%	Are claims independently verifiable as true?
Structure	5%	Is the report well-organized?
Thoroughness	5%	Does it cover the topic in depth?

Penalty system: Each false fact multiplies the score by 0.5. Each unverifiable claim multiplies by 0.75. A report with 3 false facts and 2 unverifiable claims gets: base_score * 0.5^3 * 0.75^2 = base_score * 0.07.

Citation

If you use Faerie in your research, please cite it as:

@software{faerie,
  title = {Faerie: Fact Error Rate Evaluation Framework},
  author = {Chonkie Inc.},
  url = {https://github.com/chonkie-inc/faerie},
  license = {MIT},
  year = {2025-2026}
}

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Feb 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faerie_eval-0.1.0.tar.gz (28.3 kB view details)

Uploaded Feb 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

faerie_eval-0.1.0-py3-none-any.whl (36.0 kB view details)

Uploaded Feb 13, 2026 Python 3

File details

Details for the file faerie_eval-0.1.0.tar.gz.

File metadata

Download URL: faerie_eval-0.1.0.tar.gz
Upload date: Feb 13, 2026
Size: 28.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for faerie_eval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0c79a38b8387b13d21c72653459195929a5db43c6d4a391ced4d529a5bd67197`
MD5	`984053971914d48402896d814188184e`
BLAKE2b-256	`2055a2af10da365f9b615ee5125b90d778aa943f8d44f9b6d3ad96be6bc54b3a`

See more details on using hashes here.

File details

Details for the file faerie_eval-0.1.0-py3-none-any.whl.

File metadata

Download URL: faerie_eval-0.1.0-py3-none-any.whl
Upload date: Feb 13, 2026
Size: 36.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for faerie_eval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7cc7467b9cd828ca43ef213945f438750fe47dfb11757bb1d78af8365386c399`
MD5	`ae237d8f868293f72ff287994b3d5566`
BLAKE2b-256	`7e5a379638d969d916c2f1cb8b6034af1998b394f55a5fe4d6f2cb8608230e30`

See more details on using hashes here.

faerie-eval 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Faerie

Fact Error Rate evaluation for AI research reports.

Installation

Quick Start

CLI

Python

Generate + Evaluate

CLI Options

Scoring

Citation

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes