Skip to main content

Realign is a simulation based evaluation framework for multi-step agents.

Project description

Realign: Evaluation & Experimentation Framework for AI Applications

realign_banner.png

realign is an evaluation and experimentation framework for building reliable AI applications through test-driven development. Test and evaluate agent architectures, RAG systems, prompts, and models across hundreds of scenarios specific to your use-case.

🎯 With Realign, you can:

  • Build reliable AI agents and RAG systems with test suites tailored to your use-case
  • Evaluate quality by simulating your agents over hundreds of scenarios in parallel
  • Experiment with 100+ models, prompts, and other parameters to find optimal configurations
  • Detect regressions by integrating test suites with your CI/CD pipeline
  • Track experiments with HoneyHive for cloud-scale analytics, visualization, and reproducibility

💡 What’s unique about Realign

  • YAML-Driven DX: Cleanly manage your agents, evaluator prompts, datasets, and other parameters using easy-to-read YAML config files
  • Composable Evaluators: Automatically evaluate quality using our library of 25+ pre-built evaluators, or create your own using composable building blocks
  • Blazing Fast Execution: Speed up your evaluations with parallel processing and async capabilities, with built-in modules for smart rate limiting
  • Statistical Rigor: Use statistics to test hypotheses and sweep hyperparameters to optimize performance

Quickstart

Installation & Setup

To install the package, run

pip install realign

Set your API keys as environment variables:

export OPENAI_API_KEY="your_openai_key"

or put them in a .env file:

OPENAI_API_KEY="your_openai_key"

Tweet Generator

Let's build an agent that produces content for your brand.

Tweetbot: generates N high quality tweets concurrently, runs pairwise comparisons for the generated tweets using an LLM judge, aggregates comparisons using elo scores, and shows you the best and worst tweets.

Please download and run this code!

Code: tweetbot.py

Config: config.yaml

Tutorials

  1. Simple Tweet Bot: Generate tweets with any model using a prompt and template
  2. Generate 10 Tweets in Parallel (Async): Generate tweets concurrently using async
  3. Using Config Files: Setup config files to separate code and config
  4. Set up Evaluators: Set up evaluator functions, new and built-in
  5. Using Realign Evaluators: Use evaluators with configs

Concepts

1. @evaluator decorator

Learn how you can set up evaluators, and configure them with

  • wrapping
  • transforming
  • aggregating
  • checking
  • other settings and kwargs

An Evaluator is a function which scores your app's output and checks if the score is within a target range.

2. Simulation

A Simulation is a stochastic process that runs N times. It has statistical properties.

img

3. Agents

An LLM agent comprises the settings, instructions, and context given to an LLM to autonomously complete a certain task.

Set them up with

  • agent_name

  • the model settings

    • model: >100 providers/models

    • hyperparams: dictionary of OpenAI-type hyperparams

  • the prompt

    • system_prompt: a space for your agent's instructions

    • template: a template with variables marked with double curlies {{var}}

    • template_params: a dictionary mapping the variable names to their actual values

    • json_mode: a boolean flag which will deserialize the JSON response into a Python dict

Guides

  • [TODO] how do I evaluate my agent?

  • [TODO] how to I customize my evaluator?

  • [TODO] how do I improve my agent?

  • [TODO] how do I improve my RAG pipeline?

API Reference

coming soon!

Contributing

We welcome contributions from the community to help make Realign better. This guide will help you get started. If you have any questions, please reach out to us on Discord or through a GitHub issue.

Project Overview

Realign is an MIT licensed testing framework for multi-turn AI applications. It simulates user interactions, evaluates AI performance, and generates adversarial test cases.

We particularly welcome contributions in the following areas:

  • Bug fixes

  • Documentation updates, including examples and guides

Getting Started

  1. Fork the repository on GitHub.

  2. Clone your fork locally:

git clone https://github.com/[your-username]/realign.git

cd realign
  1. Set up your development environment:
pip install -r requirements.txt

Development Workflow

  1. Create a new branch for your feature or bug fix:
git checkout -b feature/your-feature-name
  1. We try to follow the Conventional Commits specification. This is not required for feature branches. We merge all PRs into main with a squash merge and a conventional commit message.

  2. Push your branch to your fork:

git push origin your-branch-name
  1. Open a pull request against the main branch of the promptfoo repository.

When opening a pull request:

  • Keep changes small and focused. Avoid mixing refactors with new features.

  • Ensure test coverage for new code or bug fixes.

  • Provide clear instructions on how to reproduce the problem or test the new feature.

  • Be responsive to feedback and be prepared to make changes if requested.

  • Ensure your tests are passing and your code is properly linted.

Don't hesitate to ask for help. We're here to support you. If you're worried about whether your PR will be accepted, please talk to us first (see Getting Help).

Getting Help

If you need help or have questions, you can:

Code of Conduct

We follow the Contributor Covenant Code of Conduct. Please read and adhere to it in all interactions within our community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

realign-0.1.48.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

realign-0.1.48-py3-none-any.whl (57.4 kB view details)

Uploaded Python 3

File details

Details for the file realign-0.1.48.tar.gz.

File metadata

  • Download URL: realign-0.1.48.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for realign-0.1.48.tar.gz
Algorithm Hash digest
SHA256 0b2c79c197eed7a345dce7144f82a15c90aeb434accf4ebe17ab84d3029f952e
MD5 e8024d366214f0d46519362cb240e6d2
BLAKE2b-256 a2c7b2b4c7447264ddf2cfbe1fa9c199e872ffbd1605c1e502bfa77e9f5c2ff3

See more details on using hashes here.

File details

Details for the file realign-0.1.48-py3-none-any.whl.

File metadata

  • Download URL: realign-0.1.48-py3-none-any.whl
  • Upload date:
  • Size: 57.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.13

File hashes

Hashes for realign-0.1.48-py3-none-any.whl
Algorithm Hash digest
SHA256 5cf0fb9225e2737517b735f3935ec1782df14618a5ffdb5583cd5a9b0e305e54
MD5 4dce6738145f2afe8b1e42d0df450598
BLAKE2b-256 fb6ff22000073d146e3ea3316a2b3897a7e0ff3c5dbc19e5ce19d58238055f6e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page