Skip to main content

A Python package for tracking Bedrock API usage metrics (cost, latency, tokens) with an automatically launched dashboard

Project description

Newberry Metrics

A Python package for tracking and analyzing AWS Bedrock API usage metrics, including costs, latency, and token usage, with an automatically launched dashboard for live visualization.

Features

  • Track API call costs, latency, and token usage (input/output).
  • Automatic Background Dashboard: A Streamlit dashboard is launched as a background process upon TokenEstimator initialization, providing live visualization.
  • Dashboard Features: Displays KPIs (total/average cost & latency), hourly/daily charts, and detailed call logs.
  • Persistent Session Storage: Maintains session-based metrics in a JSON file located in ~/.newberry_metrics/sessions/, uniquely identified by AWS credentials.
  • Support for multiple Bedrock models.
  • Automatic AWS credential handling.
  • Console alerts for configurable cost and latency thresholds.
  • Static method (TokenEstimator.stop_dashboard()) to manually stop the background dashboard process.
  • Duplicate Call Prevention: Ensures metrics for a single Bedrock response are logged only once, even if calculate_prompt_cost is called multiple times with the same response object.

Installation

pip install newberry_metrics

This will also install necessary dependencies like streamlit, pandas, and plotly.

AWS Credential Setup

The package uses the AWS credential chain. Configure your AWS credentials via IAM roles (recommended for EC2), aws configure, or environment variables.

Usage Examples

1. Initialize TokenEstimator & Launch Dashboard

When TokenEstimator is initialized, it automatically launches the Newberry Metrics dashboard in the background if it's not already running. The console will display the dashboard URL (typically http://localhost:8501).

from newberry_metrics import TokenEstimator

model_id = "anthropic.claude-3-5-sonnet-20240620-v1:0"
region = "us-east-1"
cost_alert_threshold = 0.05
latency_alert_threshold_ms = 2000

estimator = TokenEstimator(
    model_id=model_id,
    region=region,
    cost_threshold=cost_alert_threshold,      # Optional
    latency_threshold_ms=latency_alert_threshold_ms # Optional
)

The dashboard will continue running even if your script finishes. Open the URL in your browser.

2. Get Model Pricing

costs = estimator.get_model_cost_per_million()
print(f"Input cost/1M tokens: ${costs['input']}, Output cost/1M tokens: ${costs['output']}")

3. Making API Calls & Tracking Metrics

First, get the raw response object from Bedrock using get_response(). Then, pass this object to calculate_prompt_cost() to process it, calculate metrics, and update the session file.

prompt = "Explain Large Language Models simply."
max_tokens_to_generate = 150

# Step 1: Get the raw response object
raw_response_object = estimator.get_response(
    prompt=prompt, 
    max_tokens=max_tokens_to_generate
)

# Step 2: Calculate cost and update metrics
# This also returns detailed info about the call and session.
call_and_session_info = estimator.calculate_prompt_cost(raw_response_object)

print(f"Answer (truncated): {call_and_session_info.get('answer', 'N/A')[:100]}...")
current_call_metrics = call_and_session_info.get('current_call_metrics', {})
print(f"Cost for this call: ${current_call_metrics.get('cost', 0):.6f}")
print(f"Total session cost: ${call_and_session_info.get('total_cost_session', 0):.6f}")

Refresh your dashboard (using its refresh button 🔄) to see the new data. If calculate_prompt_cost is called again with the same raw_response_object, new metrics will not be logged, preventing duplicates.

4. Using the Dashboard

  • Automatic Launch: Starts in the background with TokenEstimator. URL and PID are printed.
  • Persistence: Runs independently of the launching script.
  • Data Source: Reads from ~/.newberry_metrics/sessions/session_metrics_<CREDENTIAL_HASH>.json.
  • Refresh Button: Manually click the 🔄 button on the dashboard to load the latest metrics after new calls.
  • Shutdown:
    • Programmatically: TokenEstimator.stop_dashboard()
    • Manually: Kill the process using the PID (from console or ~/.newberry_metrics/sessions/.newberry_dashboard.pid).
from newberry_metrics import TokenEstimator 
TokenEstimator.stop_dashboard()

5. Retrieve Current Session Metrics Programmatically

current_session_object = estimator.get_session_metrics()
print(f"Total calls: {current_session_object.total_calls}, Total cost: ${current_session_object.total_cost:.6f}")

6. Reset Session Metrics

Resets metrics in the current session's JSON file to zero.

estimator.reset_session_metrics()

Supported Models

Pricing information is included for (but not limited to):

  • amazon.nova-pro-v1:0
  • anthropic.claude-3-haiku-20240307-v1:0
  • anthropic.claude-3-sonnet-20240229-v1:0
  • anthropic.claude-3-opus-20240229-v1:0
  • anthropic.claude-3-5-sonnet-20240620-v1:0
  • meta.llama2-13b-chat-v1
  • meta.llama2-70b-chat-v1
  • ai21.jamba-1-5-large-v1:0
  • cohere.command-r-v1:0
  • cohere.command-r-plus-v1:0
  • mistral.mistral-7b-instruct-v0:2
  • mistral.mixtral-8x7b-instruct-v0:1

Parsing logic for these and other models is in bedrock_models.py.

Session Metrics & Alerting

  • Session File Location: ~/.newberry_metrics/sessions/session_metrics_<CREDENTIAL_HASH>.json. The hash is derived from AWS credentials and region.
  • PID File Location: ~/.newberry_metrics/sessions/.newberry_dashboard.pid.
  • Dashboard Data: The Streamlit dashboard reads from the session JSON file.
  • Metrics Stored: total_cost, average_cost, total_latency, average_latency, total_calls, and a detailed list api_calls (each with timestamp, cost, latency, input_tokens, output_tokens, call_counter).
  • Alerting: Console warnings are printed if cost_threshold (total session cost) or latency_threshold_ms (individual call latency) are exceeded.

Requirements

  • Python >= 3.10
  • boto3
  • streamlit
  • pandas
  • plotly

Contact & Support

License

This project is licensed under the MIT License.


Note: This package is actively maintained. Please ensure you are using the latest version for new features and model support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newberry_metrics-0.2.1.tar.gz (330.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newberry_metrics-0.2.1-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file newberry_metrics-0.2.1.tar.gz.

File metadata

  • Download URL: newberry_metrics-0.2.1.tar.gz
  • Upload date:
  • Size: 330.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for newberry_metrics-0.2.1.tar.gz
Algorithm Hash digest
SHA256 c6c3c4c8c70a20c767557836a4d86eba78cd7557d22813e43bd35046a2bff779
MD5 6125e3d8133e2c519d807e380520dcf9
BLAKE2b-256 b2b2eb255c700bbe8e73916aaf60e255d6b978d8c37b0d5f291a4166d6fc56b2

See more details on using hashes here.

File details

Details for the file newberry_metrics-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for newberry_metrics-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 90571945972fb5ef14c129baff41c5c191dfb316a7a8219e9277fcc48c109789
MD5 b270d21b391221c4acb76ccb21777be4
BLAKE2b-256 e30c1695558adb1a4215ef190998c2fa03c07c9fd385e0f804156b27a5b7e6b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page