A lightweight Python library and CLI to estimate OpenAI embedding costs.
Project description
Embed Cost Estimator
A lightweight Python library and CLI to estimate OpenAI embedding costs.
Installation
Install from PyPI:
pip install embed-cost-estimator
Basic CLI Usage (Rough Estimate)
Run a quick rough estimate using a simple chars/4 heuristic:
embed-cost --chunks <NUM_CHUNKS> --chars <AVG_CHARS_PER_CHUNK> [--model <MODEL>]
#--chunks, -n Number of chunks (required)
#--chars, -c Average characters per chunk (default: 500)
#--model, -m Embedding model choice (default: text-embedding-ada-002)
CLI Options
| Option | Shortcut | Type | Default | Description |
|---|---|---|---|---|
--chunks |
-n |
integer | required | Number of chunks for rough estimate |
--chars |
-c |
integer | 500 |
Average characters per chunk |
--model |
-m |
choice | text-embedding-ada-002 |
Embedding model to use (see MODEL_RATES) |
--help |
— | flag | — | Show this help message and exit |
Examples:
1. Default model, custom sizes
embed-cost --chunks 1000 --chars 500
#Estimated embedding cost: $0.050000
2. Using a different model
embed-cost --chunks 500 --chars 300 --model text-embedding-3-small
# Estimated embedding cost: $0.003000
Python API
You can call estimate_embedding_cost() in two mutually-exclusive ways:
1. Rough estimate
Rough estimate using a simple chars/4 heuristic
from embed_cost import estimate_embedding_cost
cost = estimate_embedding_cost(
num_chunks=250,
chunk_size_chars=400,
model="text-embedding-3-small",
)
print(f"Rough cost: ${cost:.6f}")
2. Precise mode (exact token counts via tiktoken):
For exact token counts via tiktoken, by passing your list of text chunks
from embed_cost import estimate_embedding_cost
# your pre-chunked list of text segments
chunked_docs = [
"First chunk of text…",
"Second chunk of text…",
# …etc…
]
cost = estimate_embedding_cost(
chunk_texts=chunked_docs,
model="text-embedding-ada-002",
)
print(f"Precise cost: ${cost:.6f}")
[!NOTE] You must pass either
num_chunks(for rough estimate) orchunk_texts(for precise), but not both. Omitting both or giving a non-positivenum_chunkswill raise aValueError.
Example
1. Exact Token Count in Code
from embed_cost import estimate_embedding_cost
# assuming your document is already split:
chunked = ["Lorem ipsum…", "Dolor sit amet…", …]
cost = estimate_embedding_cost(
chunk_texts=chunked,
)
print(cost) # e.g. 0.000320
Contributing
We welcome contributions!
-
Fork the repo and create a feature branch.
-
Run tests and lint locally:
poetry install # or pip install -e .
poetry run pytest -q # or pytest -q
poetry run flake8 src tests
poetry run black --check .
-
Open a pull request against
main. -
Maintain 100% test coverage for new code and adhere to Black/Flake8 style.
Please see CONTRIBUTING.md for more details.
License
MIT © Pragasen Naicker
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedding_cost_estimator-1.0.0.tar.gz.
File metadata
- Download URL: embedding_cost_estimator-1.0.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.2 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
132b5aeecb8bb9a300c01b7b43b515ff1ae72467bf714ae9bd1d7991373d26ec
|
|
| MD5 |
ccb3318a7f66918c9dfdeffbd81dadfc
|
|
| BLAKE2b-256 |
7bc37f16ad3eb5658a2dbc2f12ffe05bd7968119ff4802222c1b22f8ec8f876e
|
File details
Details for the file embedding_cost_estimator-1.0.0-py3-none-any.whl.
File metadata
- Download URL: embedding_cost_estimator-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.2 Darwin/23.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17394c145ae0b45632161c02a93312942438da64b8422610601beb2c08e7a5bc
|
|
| MD5 |
ee00a7f728a05597f4f4241029028c10
|
|
| BLAKE2b-256 |
5aef543d41c871d6fd3741c12f92ab847e3a9a88335b1d490bea7208ebc06ac7
|