Skip to main content

RAG evaluation system using Ragas with Phoenix/Langfuse tracing

Project description

EvalVault

RAG(Retrieval-Augmented Generation) 시스템을 대상으로 평가(Eval) → 분석(Analysis) → 추적(Tracing) → 개선 루프를 하나의 워크플로로 묶는 CLI + Web UI 플랫폼입니다.

PyPI Python 3.12+ CI License

English version? See README.en.md.


Quickstart (CLI)

uv sync --extra dev
cp .env.example .env

uv run evalvault run --mode simple tests/fixtures/e2e/insurance_qa_korean.json \
  --metrics faithfulness,answer_relevancy \
  --profile dev \
  --auto-analyze

Tip: 기본 저장소는 Postgres+pgvector입니다. SQLite를 쓰려면 --db 또는 DB_BACKEND=sqlite + EVALVAULT_DB_PATH를 지정하세요.


핵심 기능

  • End-to-End 평가 루프: Eval → Analysis → Tracing → Improvement를 한 흐름으로 실행
  • Dataset 중심 운영: 합격 기준(threshold)을 데이터셋에 유지
  • Artifacts-first: 보고서뿐 아니라 모듈별 원본 결과를 구조화 저장
  • 옵션형 Observability: Phoenix/Langfuse/MLflow는 필요할 때만 활성화
  • CLI + Web UI: 동일 run_id 기반으로 히스토리/비교/리포트 통합

문서 허브

  • 문서 인덱스: docs/INDEX.md
  • 핸드북(교과서형): docs/handbook/INDEX.md
  • 외부 요약본: docs/handbook/EXTERNAL.md
  • 운영 가이드(로컬/도커/관측/런북): docs/handbook/CHAPTERS/04_operations.md
  • 워크플로(실행/분석/비교/회귀): docs/handbook/CHAPTERS/03_workflows.md
  • 품질/테스트/CI: docs/handbook/CHAPTERS/06_quality_and_testing.md
  • 아키텍처: docs/handbook/CHAPTERS/01_architecture.md
  • 오프라인/폐쇄망(Docker/모델 캐시): docs/guides/OFFLINE_DOCKER.md, docs/guides/OFFLINE_MODELS.md

참고(호환성): docs/guides/USER_GUIDE.md, docs/guides/DEV_GUIDE.md 등 일부 문서는 과거 링크 호환을 위한 deprecated 스텁이며, 최신 내용은 handbook을 따릅니다.


Web UI

# API
uv run evalvault serve-api --reload

# Frontend
cd frontend
npm install
npm run dev

브라우저에서 http://localhost:5173 접속 후, Evaluation Studio에서 실행/히스토리/리포트를 확인합니다.


오프라인/폐쇄망

  • Docker 이미지 번들: docs/guides/OFFLINE_DOCKER.md
  • NLP 모델 캐시 번들: docs/guides/OFFLINE_MODELS.md

LLM 모델은 폐쇄망 내부 인프라가 관리하며, EvalVault는 분석용 NLP 모델 캐시만 번들에 포함합니다.


기여

uv run ruff check src/ tests/
uv run ruff format src/ tests/
uv run pytest tests -v
  • 기여 가이드: CONTRIBUTING.md
  • 개발/테스트 루틴: AGENTS.md, docs/handbook/CHAPTERS/06_quality_and_testing.md

License

EvalVault is licensed under the Apache 2.0 license.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalvault-1.77.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalvault-1.77.0-py3-none-any.whl (893.1 kB view details)

Uploaded Python 3

File details

Details for the file evalvault-1.77.0.tar.gz.

File metadata

  • Download URL: evalvault-1.77.0.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalvault-1.77.0.tar.gz
Algorithm Hash digest
SHA256 2fceef3b5a1b8c12c75d316440443a66107d5c7ea1fd4939c1fcf9f99822e41a
MD5 2372f2530639b02d4ca5c8c088d10916
BLAKE2b-256 681990646d31e58f5559c72f2c55f06a4ac0ddeff1bc480ab8d4e5773fb12a0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalvault-1.77.0.tar.gz:

Publisher: release.yml on ntts9990/EvalVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file evalvault-1.77.0-py3-none-any.whl.

File metadata

  • Download URL: evalvault-1.77.0-py3-none-any.whl
  • Upload date:
  • Size: 893.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for evalvault-1.77.0-py3-none-any.whl
Algorithm Hash digest
SHA256 928b733fbc91a67d680b6903d44bd7f4a0893a15a3d49bb3435bc460bbbbdfd7
MD5 7085197eb34f4868a87566b00acf6c09
BLAKE2b-256 155673b55d6e4919878e3fa08399167e68d2cd73de93601c0dc061993c09eeda

See more details on using hashes here.

Provenance

The following attestation bundles were made for evalvault-1.77.0-py3-none-any.whl:

Publisher: release.yml on ntts9990/EvalVault

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page