Skip to main content

Business Evals for LLMs

Project description

paramount

Business Evaluations for LLM Chats - let your agents easily evaluate AI chat accuracy.

Getting Started

  1. Install the package:
pip install paramount
  1. Decorate your AI function:
@paramount.record()
def my_ai_function(message_history, new_question): # Inputs
    # <LLM invocations happen here>
    new_message = {'role': 'user', 'content': new_question}
    updated_history = message_history + [new_message]
    return updated_history  # Outputs.
  1. After my_ai_function(...) has run several times, launch the Paramount UI to evaluate results:
paramount

Your SMEs can now evaluate recordings and track accuracy improvements over time.

Paramount runs completely offline in your private environment.

Usage

After installation, run python example.py for a minimal working example.

Configuration

In order to set up successfully, define which input and output parameters represent the chat list used in the LLM.

This is done via the paramount.toml configuration file that you add in your project root dir.

It will be autogenerated for you with defaults if it doesn't already exist on first run.

[record]
enabled = true
function_url = "http://localhost:9000"  # The url to your LLM API flask app, for replay

[db]
type = "csv" # postgres also available
	[db.postgres]
	connection_string = ""

[api]
endpoint = "http://localhost" # url and port for paramount UI/API
port = 9001
split_by_id = false # In case you have several bots and want to split them by ID
identifier_colname = ""

[ui]  # These are display elements for the UI

# For the table display - define which columns should be shown
meta_cols = ['recorded_at']
input_cols = ['args__message_history', 'args__new_question']  # Matches my_ai_function() example
output_cols = ['1', '2']  # 1 and 2 are indexes for llm_answer and llm_references in example above

# For the chat display - describe how your chat structure is set up. This example uses OpenAI format.
chat_list = "output__1"  # Matches output updated_history. Must be a list of dicts to display chat format
chat_list_role_param = "role"  # Key in list of dicts describing the role in the chat
chat_list_content_param = "content"  # Key in list of dicts describing the content

It is also possible to describe references via config but is not shown here for simplicity.

See paramount.toml.example for more info.

For Developers

The deeper configuration instructions about the client & server can be seen here.

Docker

By using Dockerfile.server, you can containerize and deploy the whole package (including the client).

With Docker, you will need to mount the paramount.toml file dynamically into the container for it to work.

docker build -t paramount-server -f Dockerfile.server . # or make docker-build-server
docker run -dp 9001:9001 paramount-server # or make docker-run-server

License

This project is under GPL License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paramount-0.4.0.tar.gz (470.9 kB view details)

Uploaded Source

Built Distribution

paramount-0.4.0-py3-none-any.whl (474.9 kB view details)

Uploaded Python 3

File details

Details for the file paramount-0.4.0.tar.gz.

File metadata

  • Download URL: paramount-0.4.0.tar.gz
  • Upload date:
  • Size: 470.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for paramount-0.4.0.tar.gz
Algorithm Hash digest
SHA256 80270edb892a55b7c617f01e380d414d48694100388736f9532c0b41a1577f86
MD5 2747f20bd9452eaaca3cf93e5bf5f04f
BLAKE2b-256 7f8ac55ac161baf103a79e55565ca19a05081d2d76115842ad8290923cd14829

See more details on using hashes here.

File details

Details for the file paramount-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: paramount-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 474.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.6

File hashes

Hashes for paramount-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8cbe3564fdeb67d92d49c1d4de1fcba919c2a13c00376de0947354065e32957b
MD5 f9a636051ac6df13c321c51e561ac6c8
BLAKE2b-256 0341aef66599eda1da2c5f41c00080b67268efcd7e72fbec5196ef1e46c04183

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page