Tools for splitting jobs across multiple OpenAI batches
Project description
gpt-batch-manager
The OpenAI Batch API is great for submitting batch jobs, but it has some strict limits:
- gpt-4o: 90,000 tokens (Tier 1), 1.35M (Tier 2)
- gpt-4o-mini: 2,000,000 tokens (Tier 1), 20M (Tier 2)
If you want to submit larger batches than your current tier allows, you'll have to split the work into multiple batches. This project helps you do that by providing two tools:
gpt-batch-splitter
: Take one or more JSONL files and split them into shards that fit within your tier batch queue limit.gpt-batch-manager
: Submit multiple batches to OpenAI, one at a time.
Usage
Install:
pip install gpt-batch-manager
To use the batch manager, you'll need to set an OPENAI_API_KEY
environment variable.
The best way to do this is by creating a .env
file in your working directory. See the
OpenAI Python API docs for details.
Batch Splitter
Say your tasks are in largefile1.jsonl
and largefile2.jsonl
. You want to submit them
to the OpenAI batch API for gpt-4o-mini and you're on Tier 1, so you need to make them
fit within a 2M token limit.
To produce shards with fewer than 2M tokens, run:
gpt-batch-splitter 1900000 largefile1.jsonl largefile2.jsonl
This will estimate the number of tokens in each request and divvy them up accordingly. It will produce output files like:
shard-000-of-079.jsonl
shard-001-of-079.jsonl
...
shard-078-of-079.jsonl
Since the splitter has to estimate the number of tokens in each request, it's best to use a number somewhat below the limit.
Batch Manager
To stay within your batch limit, you have to submit batch files one-by-one. This is what
gpt-batch-manager
does. Pass it a set of JSONL files:
gpt-batch-manager shard-???-of-???.jsonl
Make sure you have an OPENAI_API_KEY
environment variable set (see above). This will
report the status of each batch as it progresses. Output will eventually appear in:
shard-000-of-079.output.jsonl
shard-001-of-079.output.jsonl
...
This process is fully resumable. You can Ctrl-C it at any time and it will pick up where
it left off. It stores state in /tmp/batch-status.json
. If something goes wrong, you
could try deleting that file to reset.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gpt_batch_manager-1.0.1.tar.gz
.
File metadata
- Download URL: gpt_batch_manager-1.0.1.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3dd418432e1ad9410fac6644039a09cea83f342c6b1a58bbf3bce5c2b457556 |
|
MD5 | d025f8e3bc9411da2e07030480f82a6e |
|
BLAKE2b-256 | 96f5fa0a5c581f0dc1418550224245c3cf1d07117e5c3630c37792c148b7f113 |
File details
Details for the file gpt_batch_manager-1.0.1-py3-none-any.whl
.
File metadata
- Download URL: gpt_batch_manager-1.0.1-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Darwin/23.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ad09eb7900ccdab2c64387ef099ec1ddb4a5756d04eb9843a65bb5449c38f5c |
|
MD5 | 63813a1f70e194f712963041b6936ca9 |
|
BLAKE2b-256 | bda9ce24dab897fbea35817179701a82d3fc77479f0779814780ed4f830eef26 |