Skip to main content

RAG evaluation system using Ragas with Phoenix/Langfuse tracing

Project description

EvalVault

RAG(Retrieval-Augmented Generation) 시스템을 대상으로 평가(Eval) → 분석(Analysis) → 추적(Tracing) → 개선 루프를 하나의 워크플로로 묶는 CLI + Web UI 플랫폼입니다.

PyPI Python 3.12+ CI License

English version? See README.en.md.


Quickstart (CLI)

uv sync --extra dev
cp .env.example .env

uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
  --metrics faithfulness,answer_relevancy \
  --profile dev \
  --auto-analyze

Tip: 기본 저장소는 Postgres+pgvector입니다. SQLite를 쓰려면 --db 또는 DB_BACKEND=sqlite + EVALVAULT_DB_PATH를 지정하세요.


핵심 기능

  • End-to-End 평가 루프: Eval → Analysis → Tracing → Improvement를 한 흐름으로 실행
  • Dataset 중심 운영: 합격 기준(threshold)을 데이터셋에 유지
  • Artifacts-first: 보고서뿐 아니라 모듈별 원본 결과를 구조화 저장
  • 옵션형 Observability: Phoenix/Langfuse/MLflow는 필요할 때만 활성화
  • CLI + Web UI: 동일 run_id 기반으로 히스토리/비교/리포트 통합
  • 회귀 게이트(CI/CD): evalvault regress / ci-gate가 baseline 대비 통계적 회귀를 감지하고, 안정 스키마의 JSON 아티팩트 + exit code로 CI에 통합 (평가 게이트 verdict는 passed/failed까지만 — 릴리스 promote/rollback은 emit하지 않음)

문서 허브

  • 문서 인덱스: docs/INDEX.md
  • 핸드북(교과서형): docs/handbook/INDEX.md
  • 외부 요약본: docs/handbook/EXTERNAL.md
  • 운영 가이드(로컬/도커/관측/런북): docs/handbook/CHAPTERS/04_operations.md
  • 워크플로(실행/분석/비교/회귀): docs/handbook/CHAPTERS/03_workflows.md
  • 품질/테스트/CI: docs/handbook/CHAPTERS/06_quality_and_testing.md
  • 아키텍처: docs/handbook/CHAPTERS/01_architecture.md
  • 오프라인/폐쇄망(Docker/모델 캐시): docs/guides/OFFLINE_DOCKER.md, docs/guides/OFFLINE_MODELS.md
  • 어댑터 계약(외부 도구 통합): docs/adapter-contract.md · 머신-리더블 상태 .ai-tool-suite/project-state.json · 변경 narrative docs/development-journal.md
  • 회귀 게이트 픽스처 예제(폐쇄망): tests/fixtures/e2e/regression_gate/ (pass/fail/incomplete-provenance)

참고(호환성): docs/guides/USER_GUIDE.md, docs/guides/DEV_GUIDE.md 등 일부 문서는 과거 링크 호환을 위한 deprecated 스텁이며, 최신 내용은 handbook을 따릅니다.


Web UI

# API
uv run evalvault serve-api --reload

# Frontend
cd frontend
npm install
npm run dev

브라우저에서 http://localhost:5173 접속 후, Evaluation Studio에서 실행/히스토리/리포트를 확인합니다.


오프라인/폐쇄망

  • Docker 이미지 번들: docs/guides/OFFLINE_DOCKER.md
  • NLP 모델 캐시 번들: docs/guides/OFFLINE_MODELS.md

LLM 모델은 폐쇄망 내부 인프라가 관리하며, EvalVault는 분석용 NLP 모델 캐시만 번들에 포함합니다.


기여

uv run ruff check src/ tests/
uv run ruff format src/ tests/
uv run pytest tests -v
  • 기여 가이드: CONTRIBUTING.md
  • 개발/테스트 루틴: AGENTS.md, docs/handbook/CHAPTERS/06_quality_and_testing.md

License

EvalVault is licensed under the Apache 2.0 license.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalvault-1.78.0.tar.gz (2.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalvault-1.78.0-py3-none-any.whl (893.5 kB view details)

Uploaded Python 3

File details

Details for the file evalvault-1.78.0.tar.gz.

File metadata

  • Download URL: evalvault-1.78.0.tar.gz
  • Upload date:
  • Size: 2.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evalvault-1.78.0.tar.gz
Algorithm Hash digest
SHA256 f2ef35da2339ca0eb3660128e17fda9075558227507267e65cfcd93a16424a2c
MD5 f9bc3212462acebda155e8d1a53ef2a1
BLAKE2b-256 737390814bac71d094c573853a16d5b1e5d6d3bbbf9491ca76f7f6c9ef02d8b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalvault-1.78.0.tar.gz:

Publisher: release.yml on ntts9990/EvalVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file evalvault-1.78.0-py3-none-any.whl.

File metadata

  • Download URL: evalvault-1.78.0-py3-none-any.whl
  • Upload date:
  • Size: 893.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for evalvault-1.78.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd97a8e1f4f0479511643e32f210ee6440373fc703151df6c5b6f9ec782dabbb
MD5 99f9b54a06a5568acca37b178da95d61
BLAKE2b-256 0e21b6574e07e4dd518f46b8356b891c0a3bfbafbd1e339168bf9ab663ba5c44

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalvault-1.78.0-py3-none-any.whl:

Publisher: release.yml on ntts9990/EvalVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page