UpTrain - ML Observability and Retraining Framework
Project description
An open-source framework to evaluate, test and monitor LLM applications
Docs - Slack Community - Bug Report - Feature Request
UpTrain is a Python framework that ensures your LLM applications are performing reliably by allowing users to check aspects such as correctness, structural integrity, bias, hallucination, etc. UpTrain can be used to:
- Validate model's response and safeguard your users against hallucinations, bias, incorrect output formats, etc.
- Experiment across multiple model providers, prompt templates, and quantify model's performance.
- Monitor your model's performance in production and protect yourself against unwanted drifts
Key Features 💡
- ChatGPT Grading - Utilize LLMs to grade your model outputs.
- Custom Grading Checks - Write your custom grading prompts.
- Embeddings Similarity Check - Compute cosine similarity between prompt and response embeddings
- Output Validation - Safeguard your users against inappropriate responses
- Prompt A/B Testing - Experiment across multiple prompts and compare them quantatively.
- UMAP Visualization and Clustering - Visualize your embedding space using tools like UMAP and t-SNE.
- Hallucination Checks - Use metrics like custom grading, text similarity, and embedding similarity to check for hallucinations.
- Toxic Keywords Checks - Make sure your model outputs are not biased or contain toxic keywords.
- Feature Slicing - Built-in pivoting functionalities for data dice and slice to pinpoint low-performing cohorts.
- Realtime Dashboards - Monitor your model's performance in realtime.
Get started 🙌
To run it on your machine, checkout the Quickstart tutorial:
Install the package through pip:
pip install uptrain
How to define checks:
Say we want to check whether our model's responses contain any grammatical mistakes or not.
# Define your checkset - list of simple checks, dataset file,
# and api_keys
checkset = CheckSet(
checks = Check(
name = "grammar_score",
operators = [
GrammarScore(
col_in_text = "model_response",
col_out = "grammar_score"
),
],
plots = PlotlyChart.Table(title="Grammar scores"),
),
source = JsonReader(fpath = '...')
)
settings = Settings(openai_api_key = '...')
checkset.setup(settings)
checkset.run()
Integrations
Eval Frameworks | LLM Providers | LLM Packages | Serving frameworks |
---|---|---|---|
OpenAI Evals ✅ | GPT-3.5-turbo ✅ | Langchain 🔜 | HuggingFace 🔜 |
EleutherAI LM Eval 🔜 | GPT-4 ✅ | Llama Index 🔜 | Replicate 🔜 |
BIG-Bench 🔜 | Claude 🔜 | AutoGPT 🔜 | |
Cohere 🔜 |
UpTrain in Action
Experimentation
You can use the UpTrain framework to run and compare LLM responses for different prompts, models, LLM chains, etc. Check out the experimentation tutorial to learn more.
Validation
You can use the UpTrain Validation Manager to define checks, retry logic and validate your LLM responses before showing it to your users. Check out the tutorial here.
Monitoring
You can use the UpTrain framework to continuously monitor your model's performance and get real-time insights on how well it is doing on a variety of evaluation metrics. Check out the monitoring tutorial to learn more.
Why UpTrain 🤔?
Large language models are trained over billions of data points and perform really well over a wide variety of tasks. But one thing these models are not good at is being deterministic. Even with the most well-crafted prompts, the model can misbehave for certain inputs, be it hallucinations, wrong output structure, toxic or biased response, irrelevant response, and error modes can be immense.
To ensure your LLM applications work reliably and correctly, UpTrain makes it easy for developers to evaluate the responses of their applications on multiple criteria. UpTrain's evaluation framework can be used to:
- Validate (and correct) the response of the model before showing it to the user
- Get quantitative measures to experiment across multiple prompts, model providers, etc.
- Do unit testing to ensure no buggy prompt or code gets pushed into your production
- Monitor your LLM applications in real time and understand when they are going wrong in order to fix them before users complain.
We are constantly working to make UpTrain better. Want a new feature or need any integrations? Feel free to create an issue or contribute directly to the repository.
License 💻
This repo is published under Apache 2.0 license. We are also working towards adding a hosted offering to make setting off eval runs easier - please fill this form to get a waitlist slot.
Stay Updated ☎️
We are continuously adding tons of features and use cases. Please support us by giving the project a star ⭐!
Provide feedback (Harsher the better 😉)
We are building UpTrain in public. Help us improve by giving your feedback here.
Contributors 🖥️
We welcome contributions to UpTrain. Please see our contribution guide for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file uptrain-0.3.1.tar.gz
.
File metadata
- Download URL: uptrain-0.3.1.tar.gz
- Upload date:
- Size: 118.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95df178948794f5e9827c1ea6ffde921cfde9b851a06b1d3fa45508d9b528132 |
|
MD5 | 02d2d238d06dd923a5e10e4938e7b29f |
|
BLAKE2b-256 | c401ef1581429ac2dcb7ec32b6fb02ffdf087c7071425e32ab27b55183cb853f |
File details
Details for the file uptrain-0.3.1-py3-none-any.whl
.
File metadata
- Download URL: uptrain-0.3.1-py3-none-any.whl
- Upload date:
- Size: 165.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3a1bb682937269092475c5ee351c66cb050359f6b702532db42f2a0f6d488e3 |
|
MD5 | 8da599a633c03afedb1b26d10aa37c80 |
|
BLAKE2b-256 | 4c5fa2880b7c93d9e0c024b45931dc49c08c85809fe382888edd1ad15f08837e |