LLM testing on steroids
Project description
RedLite
An opinionated toolset for testing Conversational Language Models.
Usage
-
Install required dependencies
pip install redlite[all]
-
Generate several runs (using Python scripting, see examples, and below)
-
Review and compare runs
redlite server --port <PORT>
Python API
import os
from redlite import run, load_dataset
from redlite.openai import OpenAIModel
from redlite.metric import PrefixMetric
model = OpenAIModel(api_key=os.environ["OPENAI_API_KEY"])
dataset = load_dataset("hf:innodatalabs/rt-gaia")
metric = PrefixMetric(ignore_case=True, ignore_punct=True, strip=True)
run(model=model, dataset=dataset, metric=metric)
Goals
- simple, easy-to-learn API
- lightweight
- only necessary dependencies
- framework-agnostic (PyTorch, Tensorflow, Keras, Flax, Jax)
- basic analytic tools included
Develop
python -m venv .venv
. .venv/bin/activate
pip install -e .[dev,all]
Make commands:
- test
- test-server
- lint
- wheel
- docs
- black
TODO
- deps cleanup (randomname!)
- review/improve module structure
- automate CI/CD
- write docs
- publish docs automatically (CI/CD)
- web UI styling
- better test server
- tests
- Integrations (HF, OpenAI, Anthropic, vLLM)
- Fix data format in HF datasets (innodatalabs/rt-* ones) to match standard
- more robust backend API (future-proof)
- better error handling for missing deps
- document which deps we need when
- export to CSV
- Upload to Zeno
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
redlite-0.0.23-py3-none-any.whl
(456.0 kB
view hashes)