Project description

gpt-batch-manager

The OpenAI Batch API is great for submitting batch jobs, but it has some strict limits:

gpt-4o: 90,000 tokens (Tier 1), 1.35M (Tier 2)
gpt-4o-mini: 2,000,000 tokens (Tier 1), 20M (Tier 2)

If you want to submit larger batches than your current tier allows, you'll have to split the work into multiple batches. This project helps you do that by providing two tools:

gpt-batch-splitter: Take one or more JSONL files and split them into shards that fit within your tier batch queue limit.
gpt-batch-manager: Submit multiple batches to OpenAI, one at a time.

Usage

Install:

pip install gpt-batch-manager

To use the batch manager, you'll need to set an OPENAI_API_KEY environment variable. The best way to do this is by creating a .env file in your working directory. See the OpenAI Python API docs for details.

Batch Splitter

Say your tasks are in largefile1.jsonl and largefile2.jsonl. You want to submit them to the OpenAI batch API for gpt-4o-mini and you're on Tier 1, so you need to make them fit within a 2M token limit.

To produce shards with fewer than 2M tokens, run:

gpt-batch-splitter 1900000 largefile1.jsonl largefile2.jsonl

This will estimate the number of tokens in each request and divvy them up accordingly. It will produce output files like:

shard-000-of-079.jsonl
shard-001-of-079.jsonl
...
shard-078-of-079.jsonl

Since the splitter has to estimate the number of tokens in each request, it's best to use a number somewhat below the limit.

Batch Manager

To stay within your batch limit, you have to submit batch files one-by-one. This is what gpt-batch-manager does. Pass it a set of JSONL files:

gpt-batch-manager shard-???-of-???.jsonl

Make sure you have an OPENAI_API_KEY environment variable set (see above). This will report the status of each batch as it progresses. Output will eventually appear in:

shard-000-of-079.output.jsonl
shard-001-of-079.output.jsonl
...

This process is fully resumable. You can Ctrl-C it at any time and it will pick up where it left off. It stores state in /tmp/batch-status.json. If something goes wrong, you could try deleting that file to reset.

Project details

These details have not been verified by PyPI

Project links

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
License
- OSI Approved :: Apache Software License
Programming Language
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

1.0.1

Sep 17, 2024

This version

1.0.0

Sep 17, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpt_batch_manager-1.0.0.tar.gz (9.5 kB view hashes)

Uploaded Sep 17, 2024 Source

Built Distribution

gpt_batch_manager-1.0.0-py3-none-any.whl (10.4 kB view hashes)

Uploaded Sep 17, 2024 Python 3

Hashes for gpt_batch_manager-1.0.0.tar.gz

Hashes for gpt_batch_manager-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`1e8fcd00adb7b3ae2c065419b376779f87ea12ac396ffd82ec9b204c4b963307`
MD5	`edb6c4ad9e4e382ad66553bd40669375`
BLAKE2b-256	`487f6618186de27d48f67b60a06a1a147fcf0526c8d0ab31703469fdff73e450`

Hashes for gpt_batch_manager-1.0.0-py3-none-any.whl

Hashes for gpt_batch_manager-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0cf73f9f25542ffdb8621a817d51d1d25539786575e40ee6f3ed3b6e98e6ccb`
MD5	`915ad26c4f97afc43c15a8069e0ce00e`
BLAKE2b-256	`45c34847e8ab4b1b09f47c0014db2110c58047d0ee6bc34a66f6da92afaa4c6a`