Skip to main content

A lightweight Python library and CLI to estimate OpenAI embedding costs.

Project description

PyPI version Build Status

Embed Cost Estimator

A lightweight Python library and CLI to estimate OpenAI embedding costs.

Installation

Install from PyPI:

pip install embed-cost-estimator

Basic CLI Usage (Rough Estimate)

Run a quick rough estimate using a simple chars/4 heuristic:

embed-cost --chunks <NUM_CHUNKS> --chars <AVG_CHARS_PER_CHUNK> [--model <MODEL>]

#--chunks, -n  Number of chunks (required)

#--chars, -c  Average characters per chunk (default: 500)

#--model, -m  Embedding model choice (default: text-embedding-ada-002)

CLI Options

Option Shortcut Type Default Description
--chunks -n integer required Number of chunks for rough estimate
--chars -c integer 500 Average characters per chunk
--model -m choice text-embedding-ada-002 Embedding model to use (see MODEL_RATES)
--help flag Show this help message and exit

Examples:

1. Default model, custom sizes

embed-cost --chunks 1000 --chars 500
#Estimated embedding cost: $0.050000

2. Using a different model

embed-cost --chunks 500 --chars 300 --model text-embedding-3-small
# Estimated embedding cost: $0.003000

Python API

You can call estimate_embedding_cost() in two mutually-exclusive ways:

1. Rough estimate

Rough estimate using a simple chars/4 heuristic

from embed_cost import estimate_embedding_cost

cost = estimate_embedding_cost(
    num_chunks=250,
    chunk_size_chars=400,
    model="text-embedding-3-small",
)

print(f"Rough cost: ${cost:.6f}")

2. Precise mode (exact token counts via tiktoken):

For exact token counts via tiktoken, by passing your list of text chunks

from embed_cost import estimate_embedding_cost

# your pre-chunked list of text segments
chunked_docs = [
    "First chunk of text…",
    "Second chunk of text…",
    # …etc…
]

cost = estimate_embedding_cost(
    chunk_texts=chunked_docs,
    model="text-embedding-ada-002",
)
print(f"Precise cost: ${cost:.6f}")

[!NOTE] You must pass either num_chunks (for rough estimate) or chunk_texts (for precise), but not both. Omitting both or giving a non-positive num_chunks will raise a ValueError.

Example

1. Exact Token Count in Code

from embed_cost import estimate_embedding_cost

# assuming your document is already split:
chunked = ["Lorem ipsum…", "Dolor sit amet…", ]
cost = estimate_embedding_cost(
    chunk_texts=chunked,
)
print(cost)  # e.g. 0.000320

Contributing

We welcome contributions!

  1. Fork the repo and create a feature branch.

  2. Run tests and lint locally:

poetry install            # or pip install -e .
poetry run pytest -q      # or pytest -q
poetry run flake8 src tests
poetry run black --check .
  1. Open a pull request against main.

  2. Maintain 100% test coverage for new code and adhere to Black/Flake8 style.

Please see CONTRIBUTING.md for more details.

License

MIT © Pragasen Naicker

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding_cost_estimator-1.0.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedding_cost_estimator-1.0.0-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file embedding_cost_estimator-1.0.0.tar.gz.

File metadata

  • Download URL: embedding_cost_estimator-1.0.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.2 Darwin/23.6.0

File hashes

Hashes for embedding_cost_estimator-1.0.0.tar.gz
Algorithm Hash digest
SHA256 132b5aeecb8bb9a300c01b7b43b515ff1ae72467bf714ae9bd1d7991373d26ec
MD5 ccb3318a7f66918c9dfdeffbd81dadfc
BLAKE2b-256 7bc37f16ad3eb5658a2dbc2f12ffe05bd7968119ff4802222c1b22f8ec8f876e

See more details on using hashes here.

File details

Details for the file embedding_cost_estimator-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for embedding_cost_estimator-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17394c145ae0b45632161c02a93312942438da64b8422610601beb2c08e7a5bc
MD5 ee00a7f728a05597f4f4241029028c10
BLAKE2b-256 5aef543d41c871d6fd3741c12f92ab847e3a9a88335b1d490bea7208ebc06ac7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page