Skip to main content

A Python package for tracking Bedrock API usage metrics (cost, latency, tokens) with DynamoDB storage and alerting.

Project description

Newberry Metrics

A Python package for tracking and analyzing AWS Bedrock API usage metrics, including costs, latency, and token usage, with an automatically launched dashboard for live visualization.

Latest Version: 0.1.8

Features

  • Track API call costs, latency, and token usage (input/output).
  • Automatic Streamlit dashboard for live visualization, launched as a background process.
  • Dashboard displays KPIs (total/average cost & latency), hourly/daily charts, and detailed call logs.
  • Maintain session-based metrics in a local JSON file, uniquely identified by AWS credentials.
  • Support for multiple Bedrock models.
  • Automatic AWS credential handling.
  • Console alerts for configurable cost and latency thresholds.
  • Method to manually stop the background dashboard process.

Installation

pip install newberry_metrics

Ensure you also have Streamlit installed if it's not included as a direct dependency:

pip install streamlit pandas plotly

AWS Credential Setup

The package uses the AWS credential chain to authenticate with AWS services. You can set up credentials in one of the following ways:

1. Using IAM Role (Recommended for EC2)

  • Attach an IAM role to your EC2 instance with Bedrock permissions.
  • No additional configuration needed.

2. Using AWS CLI

aws configure

This will create a credentials file at ~/.aws/credentials.

3. Using Environment Variables

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=your_region

Usage Examples

1. Initialize TokenEstimator & Launch Dashboard

When you initialize TokenEstimator, it will automatically attempt to launch the Newberry Metrics dashboard as a background process if it's not already running. The dashboard URL (typically http://localhost:8501) and its Process ID (PID) will be printed to your console.

from newberry_metrics import TokenEstimator
import json # For printing examples

# Initialize with your model ID and AWS region
model_id = "anthropic.claude-3-haiku-20240307-v1:0"
region = "us-east-1" # Specify your AWS region

# Optional: Define alert thresholds
cost_alert_threshold = 0.05  # Alert if total session cost exceeds $0.05
latency_alert_threshold_ms = 2000 # Alert if any single call takes > 2000ms

estimator = TokenEstimator(
    model_id=model_id,
    region=region,
    cost_threshold=cost_alert_threshold,      # Optional
    latency_threshold_ms=latency_alert_threshold_ms # Optional
)

# The dashboard should now be running in the background.
# Check your console for the URL and PID.
# Open the URL in your browser to see live metrics as you make calls.
# The dashboard will continue running even if this script finishes.

2. Get Model Pricing

Retrieve the cost per million tokens for the initialized model.

costs = estimator.get_model_cost_per_million()
print(f"Input cost per million tokens: ${costs['input']}")
print(f"Output cost per million tokens: ${costs['output']}")

3. Making API Calls & Tracking Metrics

Use the get_response method to make calls to the Bedrock model. This method automatically tracks metrics (cost, latency, token counts), updates the session JSON file, and checks for alerts. The dashboard will reflect these updates upon refresh.

prompt = "Explain the concept of Large Language Models in simple terms."
max_tokens_to_generate = 150

response_data = estimator.get_response(prompt=prompt, max_tokens=max_tokens_to_generate)

# The response_data contains details about the current call and the updated session totals.
print("\n--- API Call Response & Metrics ---")
print(f"Model's Answer (truncated): {response_data.get('answer', 'N/A')[:100]}...")

current_call = response_data.get('current_call_metrics', {})
print(f"\nMetrics for this Call:")
print(f"  Cost: ${current_call.get('cost', 0):.6f}")
print(f"  Latency: {current_call.get('latency', 0):.3f}s")
print(f"  Input Tokens: {current_call.get('input_tokens', 0)}")
print(f"  Output Tokens: {current_call.get('output_tokens', 0)}")

print(f"\nUpdated Session Totals:")
print(f"  Total Session Cost: ${response_data.get('total_cost_session', 0):.6f}")
print(f"  Average Session Cost: ${response_data.get('average_cost_session', 0):.6f}")
print(f"  Total Calls in Session: {response_data.get('total_calls_session', 0)}")

# Make another call
prompt_2 = "What are some key applications of LLMs?"
response_data_2 = estimator.get_response(prompt=prompt_2, max_tokens=200)
# ... inspect response_data_2 ...
# Refresh your dashboard in the browser to see the new data.

4. Using the Dashboard

  • Automatic Launch: The dashboard starts as a background process when TokenEstimator is initialized (if not already running on port 8501). The URL (default: http://localhost:8501) and its PID are printed to the console.
  • Persistent Process: The dashboard runs independently and will continue to run even after the Python script that launched it has exited.
  • Live Data: The dashboard reads data from the session_metrics_<CREDENTIAL_HASH>.json file.
  • Refresh: Use the refresh button (🔄) on the dashboard to load the latest data from the JSON file after new API calls are made.
  • Features:
    • Key Performance Indicators (KPIs): Average/Total Cost, Average/Total Latency.
    • Charts: Hourly or Daily views for Cost, Latency, and Input/Output Token Distribution.
    • Detailed Table: A paginated table showing metrics for each individual API call in the session.
  • Shutdown: To stop the dashboard, you can:
    • Call TokenEstimator.stop_dashboard() from any Python script where TokenEstimator is accessible.
    • Manually kill the process using the PID provided when the dashboard was launched. A .newberry_dashboard.pid file is also created in the package directory containing the PID.
# Example of stopping the dashboard
# from newberry_metrics import TokenEstimator # If in a new script/session

# TokenEstimator.stop_dashboard()
# print("Attempted to stop the Newberry Metrics dashboard.")

5. Retrieve Current Session Metrics Programmatically

You can get the complete metrics object for the current session at any time.

# from dataclasses import asdict # For printing example

current_session_object = estimator.get_session_metrics()
print(f"\n--- Full Session Metrics Object ---")
print(f"Total calls so far: {current_session_object.total_calls}")
print(f"Total session cost: ${current_session_object.total_cost:.6f}")
print(f"Average session latency: {current_session_object.average_latency:.3f}s")
# print(json.dumps(asdict(current_session_object), indent=2)) # For full details

6. Reset Session Metrics

Reset the tracked metrics for the current session (identified by AWS credentials) back to zero in the session_metrics_*.json file.

estimator.reset_session_metrics()
print("Session metrics have been reset. Refresh the dashboard to see the changes.")

7. Stopping the Dashboard Manually

If you need to stop the dashboard process, you can use the static method TokenEstimator.stop_dashboard(). This method will attempt to find the dashboard's PID from a .newberry_dashboard.pid file (created when the dashboard starts) and terminate the process.

from newberry_metrics import TokenEstimator

# Call this from any Python environment where TokenEstimator is available
TokenEstimator.stop_dashboard()

If stop_dashboard() is unable to terminate the process, or if the PID file is missing/corrupt, you may need to manually kill the process using its PID (which was printed to the console when the dashboard started).

Supported Models

The package includes pricing information for the following Bedrock models (primarily in us-east-1). Ensure the model ID you use matches one of these or that its pricing and payload/response parsing logic is available in bedrock_models.py.

  • amazon.nova-pro-v1:0
  • amazon.nova-micro-v1:0
  • anthropic.claude-3-sonnet-20240229-v1:0
  • anthropic.claude-3-haiku-20240307-v1:0
  • anthropic.claude-3-opus-20240229-v1:0
  • meta.llama2-13b-chat-v1
  • meta.llama2-70b-chat-v1
  • ai21.jamba-1-5-large-v1:0
  • cohere.command-r-v1:0
  • cohere.command-r-plus-v1:0
  • mistral.mistral-7b-instruct-v0:2
  • mistral.mixtral-8x7b-instruct-v0:1 (Pricing based on us-east-1, may vary in other regions. Token counting and payload structure depend on bedrock_models.py.)

Session Metrics & Alerting

The package automatically tracks and persists session metrics.

  • Session File: A unique JSON file named session_metrics_<CREDENTIAL_HASH>.json is created in the directory where the script is run (or where TokenEstimator is initialized). The <CREDENTIAL_HASH> is derived from the AWS credentials and region.
  • Dashboard Source: The Streamlit dashboard (app.py) reads data directly from this JSON file.

Metrics stored in the JSON and displayed on the dashboard include:

  • total_cost, average_cost
  • total_latency, average_latency
  • total_calls
  • api_calls: A detailed list (List[APICallMetrics]) for each call, including its timestamp, cost, latency, input/output tokens, and call counter.

Alerting: If cost_threshold (e.g., 0.10 for $0.10) or latency_threshold_ms (e.g., 1500.0 for 1500ms) are provided during TokenEstimator initialization, warnings are printed to the console if:

  • The total cost for the current session exceeds cost_threshold.
  • The latency of an individual API call exceeds latency_threshold_ms.

Requirements

  • Python >= 3.10
  • boto3 for AWS Bedrock integration
  • streamlit for the dashboard
  • pandas for data manipulation in the dashboard
  • plotly for charts in the dashboard

Contact & Support

License

This project is licensed under the MIT License.


Note: This package is actively maintained. Please ensure you are using the latest version for new features and model support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newberry_metrics-0.1.8.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newberry_metrics-0.1.8-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file newberry_metrics-0.1.8.tar.gz.

File metadata

  • Download URL: newberry_metrics-0.1.8.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for newberry_metrics-0.1.8.tar.gz
Algorithm Hash digest
SHA256 13d0f50e38b5ab6e250e6faec5b4d1f7e4c6f0df98d4cb44728737ddf38ddab3
MD5 0eda83d49ea567b6b121abc63a069004
BLAKE2b-256 5350691f9cffd67feb16ed6c7f364a102f473419905e00006cce900d7d9379c0

See more details on using hashes here.

File details

Details for the file newberry_metrics-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for newberry_metrics-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a841ceb3550b183446a5030377fa0b4b8bd17aeefad094fd1f6df3b595ac6083
MD5 c8c21d4c6ef966c63465bc2973d1bc3d
BLAKE2b-256 b139b9525002de9781fc5b892d10eae06cad39704e119a41881b2368aa81c608

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page