Skip to main content

UpTrain - ML Observability and Retraining Framework

Project description

uptrain

An open-source framework to evaluate and monitor LLM applications

Try out Evaluations - UpTrain in Action - Slack Community - Feature Request

PRs Welcome Docs Community Website

UpTrain is a Python framework that ensures your LLM applications are performing reliably by allowing users to check aspects such as correctness, structural integrity, bias, hallucination, etc. UpTrain can be used to:

Experimentation

UpTrain framework can be used to experiment across multiple prompts, model providers, chain configurations, etc. and get quantitative scores to compare them. Check out the experimentation tutorial to learn more.

uptrain experimentation

Validation

You can use the UpTrain Validation Manager to define checks, retry logic and validate your LLM responses before showing it to your users. Check out the tutorial here.

uptrain validation

Monitoring

You can use the UpTrain framework to continuously monitor your model's performance and get real-time insights on how well it is doing on a variety of evaluation metrics. Check out the monitoring tutorial to learn more.

uptrain monitoring

Get started 🙌

To run it on your machine, checkout the Quickstart tutorial:

Install the package through pip:

pip install uptrain

Note: Uptrain uses commonly used python libraries like openai-evals and sentence-transformers. To make sure, all the functionalities work, use the uptrain-add command to install the full version of the package.

uptrain-add --feature full

How to use UpTrain:

Using UpTrain's builtin evaluation sets:

UpTrain provides a variety of checks like response relevance, response completeness, factual accuracy, retrieved-context quality, etc. which can be accessed using UpTrain's API key. Learn more about these evaluations.

Get your free UpTrain API Key here.

data = pl.DataFrame({
  "question": ["What is the meaning of life?"],
  "response": ["Who knows 🤔"]
})

check = CheckResponseCompleteness()
output = check.setup(Settings(uptrain_access_token="up-9g....")).run(data)

Configuring your own evaluation sets:

Say we want to plot a line chart showing whether our model's responses contain any grammatical mistakes or not.

# Step 1: Choose and create the appropriate operator from UpTrain
grammar_score = GrammarScore(
  col_in_text = "model_response",       # input column name (from dataset)
  col_out = "grammar_score"             # desired output column name
)

# Step 2: Create a check with the operators and the required plots as arguments 
grammar_check = Check(
  operators = [grammar_score],
  plots = LineChart(y = "grammar_score")
)

# Step 3: Create a CheckSet with the checks and data source as arguments
checkset = CheckSet(
    checks = [grammar_check]
    source = JsonReader(fpath = '...')
)

# Step 4: Set up and run the CheckSet
checkset.setup(Settings(openai_api_key = '...'))
checkset.run(dataset)

Key Features 💡

Integrations

Eval Frameworks LLM Providers LLM Packages Serving frameworks
OpenAI Evals ✅ GPT-3.5-turbo ✅ Langchain 🔜 HuggingFace 🔜
EleutherAI LM Eval 🔜 GPT-4 ✅ Llama Index 🔜 Replicate 🔜
BIG-Bench 🔜 Claude 🔜 AutoGPT 🔜
Cohere 🔜

Why UpTrain 🤔?

Large language models are trained over billions of data points and perform really well over a wide variety of tasks. But one thing these models are not good at is being deterministic. Even with the most well-crafted prompts, the model can misbehave for certain inputs, be it hallucinations, wrong output structure, toxic or biased response, irrelevant response, and error modes can be immense.

To ensure your LLM applications work reliably and correctly, UpTrain makes it easy for developers to evaluate the responses of their applications on multiple criteria. UpTrain's evaluation framework can be used to:

  1. Validate (and correct) the response of the model before showing it to the user
  2. Get quantitative measures to experiment across multiple prompts, model providers, etc.
  3. Do unit testing to ensure no buggy prompt or code gets pushed into your production
  4. Monitor your LLM applications in real time and understand when they are going wrong in order to fix them before users complain.

We are constantly working to make UpTrain better. Want a new feature or need any integrations? Feel free to create an issue or contribute directly to the repository.

License 💻

This repo is published under Apache 2.0 license. We are also working towards adding a hosted offering to make setting off eval runs easier - please fill this form to get a waitlist slot.

Stay Updated ☎️

We are continuously adding tons of features and use cases. Please support us by giving the project a star ⭐!

Provide feedback (Harsher the better 😉)

We are building UpTrain in public. Help us improve by giving your feedback here.

Contributors 🖥️

We welcome contributions to UpTrain. Please see our contribution guide for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uptrain-0.3.7.tar.gz (145.1 kB view details)

Uploaded Source

Built Distribution

uptrain-0.3.7-py3-none-any.whl (203.4 kB view details)

Uploaded Python 3

File details

Details for the file uptrain-0.3.7.tar.gz.

File metadata

  • Download URL: uptrain-0.3.7.tar.gz
  • Upload date:
  • Size: 145.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for uptrain-0.3.7.tar.gz
Algorithm Hash digest
SHA256 54e0c63cf4b8924bf0a4b9e693803c653542fa9e57023ec879386619ab57a57b
MD5 747f31c964380d462aa127c244c8e608
BLAKE2b-256 620e2829523f9e2ea3a402cbcf6d6a163971617ff3a2570b2b939ccf232a2925

See more details on using hashes here.

File details

Details for the file uptrain-0.3.7-py3-none-any.whl.

File metadata

  • Download URL: uptrain-0.3.7-py3-none-any.whl
  • Upload date:
  • Size: 203.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for uptrain-0.3.7-py3-none-any.whl
Algorithm Hash digest
SHA256 9d0766a57d2207bf308b798488215fac691839fa3725104f808b931c92c7c730
MD5 55db57535249c58b1ac3d41071f1d857
BLAKE2b-256 1c6f3cdf5e39788b6546f27c582bee3a7a22e900e2cf556fdf771845e2135829

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page