Streamline policy evaluation workflows with AI-driven analysis and evaluation framework-agnostic processing

These details have not been verified by PyPI

Project links

Homepage

Project description

Evaluatr

Understanding Evaluation Mapping in the UN Context

UN evaluation work encompasses several interconnected domains:

Quality Check: Assessing evidence quality and methodological rigor in evaluation reports
Mapping/Tagging: Identifying which standardized framework themes are central to each report
Impact Evaluation: Measuring program effectiveness using RCTs, quasi-experimental designs, etc.
Synthesis: Aggregating findings across reports on specific themes/regions to generate insights

Mapping/tagging is a foundational step that identifies which themes from established evaluation frameworks (like IOM’s Strategic Results Framework or the UN Global Compact for Migration) are central to each report. These frameworks provide agreed-upon nomenclature covering all relevant themes, ensuring common terminology across stakeholders and enabling interoperability for UN-wide aggregation and communication.

Rather than extracting evidence for specific themes, mapping creates a curated index enabling evaluators to retrieve the most relevant reports for subsequent synthesis work, maximizing both precision (finding all relevant reports) and recall (avoiding irrelevant ones).

[!NOTE]

Throughout this documentation, we use “mapping” and “tagging” interchangeably.

The Challenge We Solve

IOM evaluators possess deep expertise in mapping evaluation reports against frameworks like the Strategic Results Framework (SRF), but face significant operational challenges when processing reports that often exceed 150 pages of diverse content across multiple projects and contexts.

The core challenges are:

Time-intensive process: Hundreds of staff-hours required per comprehensive mapping exercise
Individual consistency: Even expert evaluators may categorize the same content differently across sessions
Cross-evaluator consistency: Different evaluators may interpret and map identical content to different framework outputs
Scale vs. thoroughness: Growing volume of evaluation reports creates pressure to choose between speed and comprehensive analysis

What is Evaluatr?

Evaluatr is an AI-powered system that automates mapping evaluation reports against structured frameworks while maintaining interpretability and human oversight. Initially developed for IOM (International Organization for Migration) evaluation reports and the Strategic Results Framework (SRF), it transforms a traditionally manual, time-intensive process into an efficient, transparent workflow.

The system maps evaluation reports against hierarchical frameworks like the SRF (objectives, enablers, cross-cutting priorities, outcomes, outputs, indicators) and connects to broader frameworks like the Sustainable Development Goals (SDGs) for interoperability.

Beyond automation, Evaluatr prioritizes interpretability and human-AI collaboration—enabling evaluators to understand the mapping process, audit AI decisions, perform error analysis, and build training datasets over time, ensuring the system aligns with organizational needs through actionable, transparent, auditable methodology.

Key Features

1. Document Preparation Pipeline ✅ Available

Repository Processing: Read and preprocess IOM evaluation report repositories with standardized outputs
Automated Downloads: Batch download of evaluation documents from diverse sources
OCR Processing: Convert scanned PDFs to searchable text using Optical Character Recognition (OCR) technology
Content Enrichment: Fix OCR-corrupted headings and enrich documents with AI-generated image descriptions for high-quality input data

2. AI-Assisted Framework Mapping ✅ Available

Multi-Stage Pipeline: Three-stage mapping process that progressively narrows from broad themes ( SRF Enablers, Cross-cutting Priorities, GCM objectives) to specific SRF outputs. Each stage enriches context for the next—for example, knowing a report is cross-cutting in nature helps accurately map specific SRF outputs
Cost Optimization: Leverages LLM prompt caching to minimize token usage and API costs during repeated analysis
Command-line Interface: Streamlined pipeline execution through easy-to-use CLI tools (evl_ocr, evl_md_plus, evl_tag)
Transparent Tracing: Complete audit trails of AI decisions stored for human review and evaluation

3. Knowledge Synthesis 📋 Planned

Knowledge Cards: Generate structured summaries for downstream AI tasks like proposal writing and synthesis

️ Installation & Setup

[!TIP]

We recommend using isolated Python environments. uv provides fast, reliable dependency management for Python projects.

From PyPI (Recommended)

pip install evaluatr

From GitHub

pip install git+https://github.com/franckalbinet/evaluatr.git

Development Installation

# Clone the repository
git clone https://github.com/franckalbinet/evaluatr.git
cd evaluatr

# Install in development mode
pip install -e .

# Make changes in nbs/ directory, then compile:
nbdev_prepare

[!NOTE]

This project uses nbdev for literate programming - see the Development section for more details.

Environment Configuration

Create a .env file in your project root with your API keys:

MISTRAL_API_KEY="your_mistral_api_key"
GEMINI_API_KEY="your_gemini_api_key"
ANTHROPIC_API_KEY="your_anthropic_api_key"

Note: Evaluatr uses lisette, LiteLLM and DSPy for LLM interactions, giving you flexibility to use any compatible language model provider beyond the examples above.

Quick Start

IOM Workflow (Programmatic)

For IOM evaluators working with the official evaluation repository:

Reading an IOM Evaluation Repository

from evaluatr.readers import IOMRepoReader

# Initialize reader with your Excel file
reader = IOMRepoReader('files/test/eval_repo_iom.xlsx')

# Process the repository
evaluations = reader()

# Each evaluation is a standardized dictionary
for eval in evaluations[:3]:  # Show first 3
    print(f"ID: {eval['id']}")
    print(f"Title: {eval['meta']['Title']}")
    print(f"Documents: {len(eval['docs'])}")
    print("---")

ID: 1a57974ab89d7280988aa6b706147ce1
Title: EX-POST EVALUATION OF THE PROJECT:  NIGERIA: STRENGTHENING REINTEGRATION FOR RETURNEES (SRARP)  - PHASE II
Documents: 2
---
ID: c660e774d14854e20dc74457712b50ec
Title: FINAL EVALUATION OF THE PROJECT: STRENGTHEN BORDER MANAGEMENT AND SECURITY IN MALI AND NIGER THROUGH CAPACITY BUILDING OF BORDER AUTHORITIES AND ENHANCED DIALOGUE WITH BORDER COMMUNITIES
Documents: 2
---
ID: 2cae361c6779b561af07200e3d4e4051
Title: Final Evaluation of the project "SUPPORTING THE IMPLEMENTATION OF AN E RESIDENCE PLATFORM IN CABO VERDE"
Documents: 2
---

Downloading Evaluation Documents

from evaluatr.downloaders import download_docs
from pathlib import Path

fname = 'files/test/evaluations.json'
base_dir = Path("files/test/pdf_library")
download_docs(fname, base_dir=base_dir, n_workers=0, overwrite=True)

Universal CLI Workflow

Process any evaluation report from PDF to tagged outputs using three streamlined commands.

Example: Given a report at example-report-dir/example-report-file.pdf

Step 1: OCR Processing

evl_ocr example-report --pdf-dir . --output-dir md_library

Step 2: Document Enrichment

evl_md_plus example-report --md-dir md_library

Step 3: Framework Tagging

evl_tag example-report --md-dir md_library

Detailed CLI Usage

`evl_ocr` - OCR Processing

Convert PDF evaluation reports to structured markdown with extracted images.

Usage:

evl_ocr <eval-id> [OPTIONS]

Options: - --pdf-dir: Directory containing PDF folders (default: ../data/pdf_library) - --output-dir: Output directory for markdown (default: ../data/md_library) - --overwrite: Reprocess if output already exists

Examples:

# Basic usage
evl_ocr example-report

# Custom paths
evl_ocr example-report --pdf-dir ./reports --output-dir ./markdown

# Force reprocess
evl_ocr example-report --overwrite

Output Structure:

md_library/
└── example-report/
    └── example-report-file/
        ├── page_1.md
        ├── page_2.md
        └── img/
            ├── img-0.jpeg
            └── img-1.jpeg

`evl_md_plus` - Document Enrichment

Fix markdown heading hierarchy and enrich images with AI-generated descriptions.

Usage:

evl_md_plus <eval-id> [OPTIONS]

Options: - --md-dir: Directory containing markdown folders (default: ../data/md_library) - --overwrite: Reprocess if enhanced/enriched already exists

Examples:

# Basic usage
evl_md_plus example-report

# Force reprocess
evl_md_plus example-report --overwrite

Output: Creates enhanced/ and enriched/ directories with corrected headings and image descriptions.

`evl_tag` - Framework Tagging

Map evaluation reports against established frameworks (SRF, GCM) using AI-assisted analysis.

Usage:

evl_tag <eval-id> [OPTIONS]

Options: - --md-dir: Directory containing markdown folders (default: _data/md_library) - --stages: Comma-separated stages to run (default: 1,2,3) - Stage 1: SRF Enablers & Cross-cutting Priorities - Stage 2: GCM Objectives - Stage 3: SRF Outputs - --force-refresh: Force refresh specific stages (comma-separated: sections,stage1,stage2,stage3)

Examples:

# Run all stages
evl_tag example-report

# Run specific stages only
evl_tag example-report --stages 1,2

# Force refresh certain stages
evl_tag example-report --force-refresh stage1,stage3

# Combined options
evl_tag example-report --stages 2,3 --force-refresh sections

Output: Results stored in ~/.evaluatr/traces/ with complete audit trails of AI decisions.

Documentation

Full Documentation: GitHub Pages
Module Notebooks (literate programming with nbdev):
Examples: See the nbs/ directory for Jupyter notebooks

Contributing

Development Philosophy

Evaluatr is built using nbdev, enabling documentation-driven development where code, docs, and tests live together in notebooks.

Adding CLI Commands

We use fastcore.script to create CLI tools. See the nbdev console scripts tutorial for setup details.

Development Setup

We welcome contributions! Here’s how you can help:

Fork the repository

# Install development dependencies
pip install -e .

Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes in the nbs/ directory
Compile with nbdev_prepare
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Dependencies

See settings.ini for the complete list of dependencies. Key packages include: - fastcore & pandas - Core data processing - lisette, litellm & dspy - AI/LLM integration - mistralai - OCR processing

Support

Issues: GitHub Issues
Discussions: GitHub Discussions

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.7.2

Nov 13, 2025

This version

0.7.0

Oct 18, 2025

0.6.4

Oct 18, 2025

0.6.1

Oct 16, 2025

0.5.0

Oct 6, 2025

0.3.1

Sep 23, 2025

0.1.5

Sep 19, 2025

0.1.4

Sep 19, 2025

0.1.3

Sep 18, 2025

0.1.0

Sep 15, 2025

0.0.4

Jul 18, 2025

0.0.3

Jul 18, 2025

0.0.2

Jul 18, 2025

0.0.1

Jul 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evaluatr-0.7.0.tar.gz (41.2 kB view details)

Uploaded Oct 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evaluatr-0.7.0-py3-none-any.whl (37.3 kB view details)

Uploaded Oct 18, 2025 Python 3

File details

Details for the file evaluatr-0.7.0.tar.gz.

File metadata

Download URL: evaluatr-0.7.0.tar.gz
Upload date: Oct 18, 2025
Size: 41.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for evaluatr-0.7.0.tar.gz
Algorithm	Hash digest
SHA256	`fe677436f19a349f5c8dfd3d7aca285c833899345a3d8a5898a230d4dbc36cf6`
MD5	`32aa5ca5fb94cfc32f190822243ba07b`
BLAKE2b-256	`e7b2f62ee51f168ec9ffb695d153c1a50568bb7db7647e54d544a82c5bf8fd43`

See more details on using hashes here.

File details

Details for the file evaluatr-0.7.0-py3-none-any.whl.

File metadata

Download URL: evaluatr-0.7.0-py3-none-any.whl
Upload date: Oct 18, 2025
Size: 37.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for evaluatr-0.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`86098f370e452d73e55ffadb3a5af970c35a4941df43e4d56f7c9817a75ad28d`
MD5	`39a8920318857ef47365ac3839cc0740`
BLAKE2b-256	`de5a72aaa8067f8e2bf820ab8d26aca045481f8c780039b8cefd95e6d85e88a3`

See more details on using hashes here.

evaluatr 0.7.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Evaluatr

Understanding Evaluation Mapping in the UN Context

The Challenge We Solve

What is Evaluatr?

Key Features

1. Document Preparation Pipeline ✅ Available

2. AI-Assisted Framework Mapping ✅ Available

3. Knowledge Synthesis 📋 Planned

️ Installation & Setup

From PyPI (Recommended)

From GitHub

Development Installation

Environment Configuration

Quick Start

IOM Workflow (Programmatic)

Universal CLI Workflow

Detailed CLI Usage

evl_ocr - OCR Processing

evl_md_plus - Document Enrichment

evl_tag - Framework Tagging

Documentation

Contributing

Development Philosophy

Adding CLI Commands

Development Setup

License

Dependencies

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`evl_ocr` - OCR Processing

`evl_md_plus` - Document Enrichment

`evl_tag` - Framework Tagging