Skip to main content

A Python package for tracking Bedrock API usage metrics (cost, latency, tokens) with DynamoDB storage and alerting.

Project description

Newberry Metrics

A Python package for tracking and analyzing AWS Bedrock API usage metrics, including costs and latency.

Latest Version: 0.1.4

Features

  • Track API call costs and latency
  • Monitor token usage (input and output)
  • Maintain session-based metrics
  • Support for multiple Bedrock models
  • Automatic AWS credential handling
  • Detailed latency tracking and analysis

Installation

pip install newberry_metrics

AWS Credential Setup

The package uses the AWS credential chain to authenticate with AWS services. You can set up credentials in one of the following ways:

1. Using IAM Role (Recommended for EC2)

  • Attach an IAM role to your EC2 instance with Bedrock permissions
  • No additional configuration needed
  • The code will automatically use the instance's IAM role credentials

2. Using AWS CLI

aws configure

This will create a credentials file at ~/.aws/credentials with your access key and secret key.

3. Using Environment Variables

export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=your_region

Usage Examples

1. Initialize TokenEstimator

Initialize the tracker with a specific Bedrock model ID. You can also optionally provide cost and latency thresholds for automatic alerting.

from newberry_metrics import TokenEstimator
import json # For printing examples

# Initialize with your model ID
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

# Optional: Define alert thresholds
cost_alert_threshold = 0.05  # Alert if total session cost exceeds $0.05
latency_alert_threshold_ms = 2000 # Alert if any single call takes > 2000ms

estimator = TokenEstimator(
    model_id=model_id,
    cost_threshold=cost_alert_threshold,      # Optional
    latency_threshold_ms=latency_alert_threshold_ms # Optional
)

2. Get Model Pricing

Retrieve the cost per million tokens for the initialized model.

costs = estimator.get_model_cost_per_million()
print(f"Input cost per million: ${costs['input']}")
print(f"Output cost per million: ${costs['output']}")

3. Making API Calls & Tracking Metrics

The _invoke_bedrock method (though marked private) handles calling the Bedrock model. Crucially, it automatically updates and saves the session metrics (cost, latency, token counts) after each call.

If the thresholds set during initialization are exceeded (total cost or current call latency), alerts will be printed to the console during this step.

# Make an API call - session metrics are automatically tracked & alerts checked
prompt = "Explain the concept of Large Language Models."
# Note: Using the private _invoke_bedrock method here as shown in main.py example
# A public wrapper might be added in future versions.
response = estimator._invoke_bedrock(prompt, max_tokens=200)

current_call_metrics = response['SessionMetrics']['current_call']

answer = response['SessionMetrics'].get('answer', '(Answer not found)') # Answer is also included
print(f"\n--- Single Call Results ---")
print(f"Answer: {answer[:100]}...") # Print truncated answer
print(f"Cost (This Call): ${current_call_metrics['cost']:.6f}")
print(f"Latency (This Call): {current_call_metrics['latency']:.3f}s")
print(f"Tokens (In/Out): {current_call_metrics['input_tokens']}/{current_call_metrics['output_tokens']}")


# Get session metrics from the response
session_metrics = response['SessionMetrics']
print(f"Total session cost: ${session_metrics['total_cost']}")
print(f"Average cost: ${session_metrics['average_cost']}")
print(f"Total latency: {session_metrics['total_latency']} seconds")
print(f"Average latency: {session_metrics['average_latency']} seconds")
print(f"Total calls: {session_metrics['total_calls']}")

# Access the session metrics updated by this call (contained within the response)
session_metrics_after_call = response['SessionMetrics']
print(f"Latest total session cost: ${session_metrics_after_call['total_cost']:.6f}")
print(f"Cost of current call: ${session_metrics_after_call['current_call']['cost']:.6f}")

4. Retrieve Current Session Metrics

You can get the complete metrics object for the current session (reflecting all calls made so far) at any time. This reads the latest state managed by the estimator.

from dataclasses import asdict # For printing example

current_metrics = estimator.get_session_metrics()
print(f"Total calls so far: {current_metrics.total_calls}")
print(f"Average latency: {current_metrics.average_latency:.3f}s")

5. Reset Session Metrics

Reset the tracked metrics for the current session (identified by AWS credentials) back to zero in the persistent store (DynamoDB).

estimator.reset_session_metrics()
print("Session metrics have been reset.")

6. Visualize Session Metrics (Optional)

Generate bar charts summarizing the cost and average latency of the session's API calls, grouped either hourly or daily. Requires matplotlib and pandas to be installed.

# Ensure you have data first by making some calls
# response = estimator._invoke_bedrock("Another prompt...")

try:
    # Show plots grouped by hour (default)
    print("Displaying hourly metrics plot...")
    estimator.visualize_metrics(time_interval='hourly')

    # Or, show plots grouped by day
    print("Displaying daily metrics plot...")
    estimator.visualize_metrics(time_interval='daily')

    # Or, save the plots to files instead of showing them
    # Creates 'session_plots_hourly.png' and 'session_plots_daily.png'
    # print("Saving metrics plots...")
    # estimator.visualize_metrics(time_interval='hourly', save_path='session_plots')
    # estimator.visualize_metrics(time_interval='daily', save_path='session_plots')

except ImportError:
    print("Please install matplotlib and pandas to use visualization: pip install matplotlib pandas")
except Exception as e:
    print(f"An error occurred during visualization: {e}")

2.  **Update the "Requirements" section to:**

```markdown
## Requirements
- Python >= 3.10
- `boto3` for AWS Bedrock integration
- `matplotlib` for visualization
- `pandas` for visualization

These changes add the documentation for the visualize_metrics method and ensure the necessary dependencies are listed. Remember to update the version in setup.py if you release this.

Optional: Analyzing a Specific Response (Advanced)

If you have a raw response object from _invoke_bedrock (or elsewhere), you can calculate its specific cost/latency independently using these helper methods. Note that this calculation is already performed internally by _invoke_bedrock during the tracking process.

Supported Models

The package includes pricing information for the following Bedrock models (primarily in us-east-1). Ensure the model ID you use matches one of these or add its pricing to get_model_cost_per_million if needed.

  • amazon.nova-pro-v1:0 ($0.003/$0.012 per 1K tokens)
  • amazon.nova-micro-v1:0 ($0.000035/$0.00014 per 1K tokens)
  • anthropic.claude-3-sonnet-20240229-v1:0 ($0.003/$0.015 per 1K tokens)
  • anthropic.claude-3-haiku-20240307-v1:0 ($0.00025/$0.00125 per 1K tokens)
  • anthropic.claude-3-opus-20240229-v1:0 ($0.015/$0.075 per 1K tokens)
  • meta.llama2-13b-chat-v1 ($0.00075/$0.001 per 1K tokens)
  • meta.llama2-70b-chat-v1 ($0.00195/$0.00256 per 1K tokens)
  • ai21.jamba-1-5-large-v1:0 ($0.0125/$0.0125 per 1K tokens)
  • cohere.command-r-v1:0 ($0.0005/$0.0015 per 1K tokens)
  • cohere.command-r-plus-v1:0 ($0.003/$0.015 per 1K tokens)
  • mistral.mistral-7b-instruct-v0:2 ($0.0002/$0.0006 per 1K tokens)
  • mistral.mixtral-8x7b-instruct-v0:1 ($0.0007/$0.0021 per 1K tokens) (Pricing based on us-east-1, may vary in other regions)

Session Metrics & Alerting

The package automatically tracks and persists session metrics using Amazon DynamoDB. A dedicated table (default name: BedrockSessionMetrics) is required in your AWS account in the specified region. Each session's data is stored as an item keyed by a unique hash derived from the AWS credentials and region used.

Metrics stored include:

  • total_cost: Cumulative cost for the session.
  • average_cost: Average cost per call in the session.
  • total_latency: Cumulative latency (in seconds) for the session.
  • average_latency: Average latency per call in the session.
  • total_calls: Total number of API calls made in the session.
  • api_calls: A detailed list (List[APICallMetrics]) of each individual API call containing its timestamp, cost, latency, token counts, and call number within the session.

Alerting: If cost_threshold (float, e.g., 0.10 for $0.10) or latency_threshold_ms (float, e.g., 1500.0 for 1500ms) are provided during TokenEstimator initialization, the package will automatically print warning messages to the console if:

  • The total cost for the current session exceeds the cost_threshold after an API call.
  • The latency of any individual API call exceeds the latency_threshold_ms.

Recent Updates

Requirements

  • Python >= 3.10
  • boto3 for AWS Bedrock integration
  • matplotlib for visualization
  • pandas for visualization

Contact & Support

License

This project is licensed under the MIT License.


Note: This package is actively maintained and regularly updated with new features and model support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

newberry_metrics-0.1.4.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

newberry_metrics-0.1.4-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file newberry_metrics-0.1.4.tar.gz.

File metadata

  • Download URL: newberry_metrics-0.1.4.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for newberry_metrics-0.1.4.tar.gz
Algorithm Hash digest
SHA256 849f186928554854df8fe10a385b1bfbe16c60cc66232c71ca9141797b783be2
MD5 64dfc8be2ff275eb3d830881e12d9223
BLAKE2b-256 0b48eca2593f2d97fd6af4eb25770f609c37fc8e0435557fb3b6f8779c62364f

See more details on using hashes here.

File details

Details for the file newberry_metrics-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for newberry_metrics-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 148f321d916e48015ab8be9695cd00fb5318dfcea5902cef50f200e3f0ee4f46
MD5 1fba587d4cbae6561f41e484f4b5e790
BLAKE2b-256 f6d8fa8df55016d97cddb0cdc2c89b6ff4951d94f26772f4dbdabc937d8c0d8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page