Skip to main content

A lightweight Python library and CLI to estimate OpenAI embedding costs.

Project description

PyPI version Build Status

Embed Cost Estimator

A lightweight Python library and CLI to estimate OpenAI embedding costs.

Installation

Install from PyPI:

pip install embedding-cost-estimator

Basic CLI Usage (Rough Estimate)

Run a quick rough estimate using a simple chars/4 heuristic:

embed-cost --chunks <NUM_CHUNKS> --chars <AVG_CHARS_PER_CHUNK> [--model <MODEL>]

#--chunks, -n  Number of chunks (required)

#--chars, -c  Average characters per chunk (default: 500)

#--model, -m  Embedding model choice (default: text-embedding-ada-002)

CLI Options

Option Shortcut Type Default Description
--chunks -n integer required Number of chunks for rough estimate
--chars -c integer 500 Average characters per chunk
--model -m choice text-embedding-ada-002 Embedding model to use (see MODEL_RATES)
--help flag Show this help message and exit

Examples:

1. Default model, custom sizes

embed-cost --chunks 1000 --chars 500
#Estimated embedding cost: $0.050000

2. Using a different model

embed-cost --chunks 500 --chars 300 --model text-embedding-3-small
# Estimated embedding cost: $0.003000

Python API

You can call estimate_embedding_cost() in two mutually-exclusive ways:

1. Rough estimate

Rough estimate using a simple chars/4 heuristic

from embed_cost import estimate_embedding_cost

cost = estimate_embedding_cost(
    num_chunks=250,
    chunk_size_chars=400,
    model="text-embedding-3-small",
)

print(f"Rough cost: ${cost:.6f}")

2. Precise mode (exact token counts via tiktoken):

For exact token counts via tiktoken, by passing your list of text chunks

from embed_cost import estimate_embedding_cost

# your pre-chunked list of text segments
chunked_docs = [
    "First chunk of text…",
    "Second chunk of text…",
    # …etc…
]

cost = estimate_embedding_cost(
    chunk_texts=chunked_docs,
    model="text-embedding-ada-002",
)
print(f"Precise cost: ${cost:.6f}")

[!NOTE] You must pass either num_chunks (for rough estimate) or chunk_texts (for precise), but not both. Omitting both or giving a non-positive num_chunks will raise a ValueError.

Example

1. Exact Token Count in Code

from embed_cost import estimate_embedding_cost

# assuming your document is already split:
chunked = ["Lorem ipsum…", "Dolor sit amet…", ]
cost = estimate_embedding_cost(
    chunk_texts=chunked,
)
print(cost)  # e.g. 0.000320

Contributing

We welcome contributions!

  1. Fork the repo and create a feature branch.

  2. Run tests and lint locally:

poetry install            # or pip install -e .
poetry run pytest -q      # or pytest -q
poetry run flake8 src tests
poetry run black --check .
  1. Open a pull request against main.

  2. Maintain 100% test coverage for new code and adhere to Black/Flake8 style.

Please see CONTRIBUTING.md for more details.

License

MIT © Pragasen Naicker

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedding_cost_estimator-1.0.1.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedding_cost_estimator-1.0.1-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file embedding_cost_estimator-1.0.1.tar.gz.

File metadata

  • Download URL: embedding_cost_estimator-1.0.1.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.2 Darwin/23.6.0

File hashes

Hashes for embedding_cost_estimator-1.0.1.tar.gz
Algorithm Hash digest
SHA256 1cc505eee25e8ec86fad80fad74fab1e14b99c53a2aeaf47900afc8f039fc179
MD5 bc4b7ab043af7bf0c42c71a2cede6b12
BLAKE2b-256 d2f80356219da36c99f3ea88d0e8da394479d25b1f1c1b0dadc78fbaf1fa9432

See more details on using hashes here.

File details

Details for the file embedding_cost_estimator-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for embedding_cost_estimator-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ccda2c187acabf278c3b3ca0cd87967acd5abc76b2ef523def91244284dc660a
MD5 d10dcdad609e9cd2533675c5e1674b91
BLAKE2b-256 6e2526955deb3e30f9333d18dd0f015f28f9d5aa89b438ac875e36156c427ca1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page