UpTrain - ML Observability and Retraining Framework

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

An open-source framework to evaluate and monitor LLM applications

Try out Evaluations - Self-serve Console - Slack Community - Feature Request - UpTrain in Action

UpTrain is a Python framework that ensures your LLM applications are performing reliably by allowing users to check aspects such as correctness, structural integrity, bias, hallucination, etc. UpTrain can be used to:

Experimentation

UpTrain framework can be used to experiment across multiple prompts, model providers, chain configurations, etc. and get quantitative scores to compare them. Check out the experimentation tutorial to learn more.

Validation

You can use the UpTrain Validation Manager to define checks, retry logic and validate your LLM responses before showing it to your users. Check out the tutorial here.

Monitoring

You can use the UpTrain framework to continuously monitor your model's performance and get real-time insights on how well it is doing on a variety of evaluation metrics. Check out the monitoring tutorial to learn more.

Get started 🙌

To run it on your machine, checkout the Quickstart tutorial:

Install the package through pip:

pip install uptrain

Note: Uptrain uses commonly used python libraries like openai-evals and sentence-transformers. To make sure, all the functionalities work, use the uptrain-add command to install the full version of the package.

uptrain-add --feature full

How to use UpTrain:

Using UpTrain's builtin evaluation sets:

UpTrain provides a variety of checks like response relevance, response completeness, factual accuracy, retrieved-context quality, etc. which can be accessed using UpTrain's API key. To seem them in action, you can see the Live Evaluation Demo

To learn how more about these builtin checks, check out the Built-in Checks Documentation.

Get your free UpTrain API Key here.

data = pl.DataFrame({
  "question": ["What is the meaning of life?"],
  "response": ["Who knows 🤔"]
})

check = CheckResponseCompleteness()
output = check.setup(Settings(uptrain_access_token="up-9g....")).run(data)

Configuring your own evaluation sets:

Say we want to plot a line chart showing whether our model's responses contain any grammatical mistakes or not.

# Step 1: Choose and create the appropriate operator from UpTrain
grammar_score = GrammarScore(
  col_in_text = "model_response",       # input column name (from dataset)
  col_out = "grammar_score"             # desired output column name
)

# Step 2: Create a check with the operators and the required plots as arguments 
grammar_check = Check(
  operators = [grammar_score],
  plots = LineChart(y = "grammar_score")
)
# We can also use prebuilt checks like CheckResponseCompleteness, CheckResponseRelevance, etc.
response_completeness_check = CheckResponseRelevance()


# Step 3: Create a CheckSet with the checks and data source as arguments
checkset = CheckSet(
    checks = [grammar_check, response_relevance_check]
    source = JsonReader(fpath = '...')
)

# Step 4: Set up and run the CheckSet
checkset.setup(Settings(openai_api_key = '...'))
checkset.run(dataset)

Running evaluations on UpTrain's hosted platform:

To learn how to run evaluations on UpTrain's hosted platform, check out the UpTrain API Client Tutorial.

Key Features 💡

ChatGPT Grading - Utilize LLMs to grade your model outputs.
Custom Grading Checks - Write your custom grading prompts.
Embeddings Similarity Check - Compute cosine similarity between prompt and response embeddings
Output Validation - Safeguard your users against inappropriate responses
Prompt A/B Testing - Experiment across multiple prompts and compare them quantatively.
UMAP Visualization and Clustering - Visualize your embedding space using tools like UMAP and t-SNE.
Hallucination Checks - Use metrics like custom grading, text similarity, and embedding similarity to check for hallucinations.
Toxic Keywords Checks - Make sure your model outputs are not biased or contain toxic keywords.
Feature Slicing - Built-in pivoting functionalities for data dice and slice to pinpoint low-performing cohorts.
Realtime Dashboards - Monitor your model's performance in realtime.

Integrations

Eval Frameworks	LLM Providers	LLM Packages	Serving frameworks
OpenAI Evals ✅	GPT-3.5-turbo ✅	Langchain 🔜	HuggingFace 🔜
EleutherAI LM Eval 🔜	GPT-4 ✅	Llama Index 🔜	Replicate 🔜
BIG-Bench 🔜	Claude 🔜	AutoGPT 🔜
	Cohere 🔜

Why UpTrain 🤔?

Large language models are trained over billions of data points and perform really well over a wide variety of tasks. But one thing these models are not good at is being deterministic. Even with the most well-crafted prompts, the model can misbehave for certain inputs, be it hallucinations, wrong output structure, toxic or biased response, irrelevant response, and error modes can be immense.

To ensure your LLM applications work reliably and correctly, UpTrain makes it easy for developers to evaluate the responses of their applications on multiple criteria. UpTrain's evaluation framework can be used to:

Validate (and correct) the response of the model before showing it to the user
Get quantitative measures to experiment across multiple prompts, model providers, etc.
Do unit testing to ensure no buggy prompt or code gets pushed into your production
Monitor your LLM applications in real-time and understand when they are going wrong in order to fix them before users complain.

We are constantly working to make UpTrain better. Want a new feature or need any integrations? Feel free to create an issue or contribute directly to the repository.

License 💻

This repo is published under Apache 2.0 license and we are committed to adding more functionalities to the UpTrain open-source repo. Upon popular demand, we have also rolled out a no-code self-serve console. For customized onboarding, please book a demo call here.

Stay Updated ☎️

We are continuously adding tons of features and use cases. Please support us by giving the project a star ⭐!

Provide feedback (Harsher the better 😉)

We are building UpTrain in public. Help us improve by giving your feedback here.

Contributors 🖥️

We welcome contributions to UpTrain. Please see our contribution guide for details.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.7.1

May 14, 2024

0.7.0

May 3, 2024

0.6.13

Apr 12, 2024

0.6.12

Apr 9, 2024

0.6.10.post1

Mar 22, 2024

0.6.9

Mar 20, 2024

0.6.8

Mar 18, 2024

0.6.7.post1

Mar 11, 2024

0.6.6.post3

Mar 7, 2024

0.6.6.post2

Mar 6, 2024

0.6.6.post1

Mar 5, 2024

0.6.6

Mar 4, 2024

0.6.5.post2

Mar 1, 2024

0.6.5.post1

Mar 1, 2024

0.6.5

Feb 29, 2024

0.6.4

Feb 28, 2024

0.6.3

Feb 22, 2024

0.6.2

Feb 21, 2024

0.6.0

Feb 20, 2024

0.5.0

Jan 9, 2024

0.4.9

Jan 3, 2024

0.4.8

Dec 18, 2023

0.4.7

Dec 11, 2023

0.4.6

Dec 6, 2023

0.4.3

Nov 16, 2023

0.4.2

Nov 2, 2023

0.4.1

Oct 12, 2023

0.4.0

Sep 29, 2023

This version

0.3.8

Aug 22, 2023

0.3.7

Aug 9, 2023

0.3.6

Jul 29, 2023

0.3.4

Jul 7, 2023

0.3.3

Jul 7, 2023

0.3.2

Jul 6, 2023

0.3.1

Jul 6, 2023

0.2.0

May 15, 2023

0.1.2

May 9, 2023

0.1.0

May 5, 2023

0.1.dev0 pre-release

Jan 23, 2023

0.0.11

Mar 29, 2023

0.0.10

Mar 16, 2023

0.0.9

Mar 8, 2023

0.0.8

Feb 28, 2023

0.0.7

Feb 23, 2023

0.0.6

Feb 23, 2023

0.0.5

Feb 20, 2023

0.0.4

Feb 11, 2023

0.0.3

Jan 31, 2023

0.0.1

Jan 24, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uptrain-0.3.8.tar.gz (146.3 kB view hashes)

Uploaded Aug 22, 2023 Source

Built Distribution

uptrain-0.3.8-py3-none-any.whl (204.5 kB view hashes)

Uploaded Aug 22, 2023 Python 3

Hashes for uptrain-0.3.8.tar.gz

Hashes for uptrain-0.3.8.tar.gz
Algorithm	Hash digest
SHA256	`469a21acb88312d5bdbb65ba2577a87b50b61b0f160b5f982ad6b1261ce4b508`
MD5	`6c9b4dab7df7402bd836f09879ce44d9`
BLAKE2b-256	`020d3798357f7f28322c41200816d2a7d1bad64513b14c972fab18c61b3a8d3f`

Hashes for uptrain-0.3.8-py3-none-any.whl

Hashes for uptrain-0.3.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f390bfb80bb123368aaefa4443ec8391c58f9d4cc2c87fdaefc9d57a73a70e95`
MD5	`d29cce2b4662b4d2aafe5a8fd5136823`
BLAKE2b-256	`e50ef1280355047f615928646eb199717b35f182289e976d8da056ba597da0d0`