Skip to main content

Quick CLI to test OpenAI-compatible API endpoints.

Project description

openai-tests

Stop hand-building one-off cURL probes for every OpenAI-compatible endpoint.

tests code quality socket codecov

Quickstart | See It Work | Checks | Recipes | Documentation

openai-tests is a small CLI for proving whether an API really behaves like an OpenAI endpoint. It sends known-good requests, checks the response shape and content, compares related API surfaces, and can print the exact redacted HTTP exchange when something looks wrong.

If you only need one raw request, curl is still perfect. If you are validating a gateway, proxy, hosted model, local server, or OpenAI-compatible deployment more than once, this gives you repeatable smoke tests instead of a folder full of hand-edited JSON bodies.

Quickstart

git clone https://github.com/donadiosolutions/openai-tests.git
cd openai-tests
uv sync --all-groups

Run the fastest useful check against OpenAI:

export OPENAI_API_KEY="sk-..."
uv run openai-tests text-simple --model gpt-4.1-mini

Or point the same check at any compatible endpoint:

export OPENAI_TESTS_API_KEY="your-token"
uv run openai-tests text-simple \
  --base-url https://your-openai-compatible-service.example \
  --model your-model

Base URLs may include /v1 or omit it. Both https://example.test and https://example.test/v1 work.

[!IMPORTANT] Trust surface: endpoint tests read API keys from CLI flags or environment variables, send HTTP requests only to the configured --base-url, and redact Authorization in verbose output. asr-simple uses checked-in MP3 fixtures by default and runs espeak-ng only when you supply custom text through --expected-transcript without --audio-file; synthesized WAV files are written to a temporary directory and removed after the run. uv sync installs project dependencies into the local environment; uv run poe socket also runs npm ci from the checked-in lockfile for the pinned Socket CLI. To remove a local checkout, delete the repository directory and any generated .venv or node_modules directories.

See It Work

$ uv run openai-tests text-simple --model gpt-4.1-mini
/v1/chat/completions: PASSED
Question: What is the capital of France?
Response: Paris is the capital of France.

/v1/responses: PASSED
Question: What is the capital of France?
Response: Paris is the capital of France.

Overall: PASSED

That run did more than check for HTTP 200. It asked the same simple question through both /v1/chat/completions and /v1/responses, extracted text from each response, verified the text was usable, and would have warned if the responses endpoint echoed important parameters differently.

When you need to inspect the actual payloads, add --verbose:

uv run openai-tests text-simple \
  --base-url https://your-openai-compatible-service.example \
  --model your-model \
  --verbose

Verbose mode prints the request URL, headers, JSON body, response status, response headers, and raw response body. Bearer tokens are redacted.

Checks

Module What it exercises What it catches
list-models GET /v1/models malformed model-list responses, missing required fields, non-JSON responses, HTTP failures
text-simple /v1/chat/completions and /v1/responses empty text, incompatible response shapes, parameter mismatches, unexpected tool-call-like output
asr-simple /v1/chat/completions with audio input and /v1/audio/transcriptions missing transcripts, wrong transcript content, streaming/non-streaming shape issues, metadata mismatches

Each module is intentionally small. The point is not to benchmark model quality. The point is to answer: "Can this endpoint accept the same request shape my OpenAI client will send, and can I trust the response shape I get back?"

Recipes

List available models

uv run openai-tests list-models \
  --base-url https://api.openai.com

Output is a schema check plus the returned model IDs:

/v1/models: PASSED
Models:
- gpt-4.1-mini
- gpt-4.1
- gpt-4o-transcribe

Overall: PASSED

Compare chat completions and responses

uv run openai-tests text-simple \
  --base-url https://api.openai.com \
  --model gpt-4.1-mini

Use separate models when a provider routes the two APIs differently:

uv run openai-tests text-simple \
  --model gpt-4.1-mini \
  --responses-model gpt-4.1

Test speech recognition

uv run openai-tests asr-simple \
  --base-url https://api.openai.com \
  --model gpt-4o-audio-preview

If the transcriptions endpoint needs a different model than chat completions, pass --transcriptions-model explicitly.

By default, asr-simple sends two checked-in MP3 fixtures:

1. Alpha through Zulu in NATO spelling words
2. The quick brown fox jumps over the lazy dog

To test your own fixture:

uv run openai-tests asr-simple \
  --audio-file ./speech.wav \
  --audio-format wav \
  --expected-transcript \
  "Alpha Bravo Charlie Delta Echo Foxtrot Golf Hotel India Juliet"

To synthesize custom spoken text on demand with espeak-ng, omit --audio-file and provide only the transcript text:

uv run openai-tests asr-simple \
  --expected-transcript "Please transcribe this sentence exactly."

Pass provider-specific knobs

Optional API parameters stay unset until you pass them. JSON values can be inline or loaded from a file with @path.

uv run openai-tests text-simple \
  --responses-metadata-json '{"suite":"compatibility-smoke"}' \
  --responses-temperature 0

Boolean parameters use paired flags, so you can distinguish "unset" from explicit true or false:

uv run openai-tests text-simple --responses-store
uv run openai-tests text-simple --no-responses-store

Status Labels

Status Meaning
PASSED The endpoint returned a usable response and no warnings were produced.
PARTIAL SUCCESS The endpoint returned usable content, but a warning suggests compatibility drift.
FAILED The request failed, the response shape was invalid, or the content check did not pass.

The CLI exits with 0 only when all checked endpoints pass. It exits with 1 for failures or partial successes, and 2 for local configuration errors such as invalid JSON arguments.

For ASR checks, each endpoint result also prints a simple word error rate counter as WER: <percent> (<errors>/<reference words>). The default acceptance rule allows the transcript to pass when either the expected-word threshold is met or the WER stays below 15%. Common NATO-style spelling variants such as viktor, whisky, charly, romeu, uniforme, yanke, and zooloo are normalized before scoring.

Configuration

Common options:

Option Environment fallback Default
--api-key OPENAI_API_KEY, then OPENAI_TESTS_API_KEY no authorization header
--base-url OPENAI_BASE_URL, then OPENAI_TESTS_BASE_URL https://api.openai.com
--model OPENAI_MODEL, then OPENAI_TESTS_MODEL module-specific
--timeout none 30 seconds
--verbose none off

The live integration runner also loads OPENAI_API_KEY from a repository-root .env file before falling back to the inherited environment.

How It Works

The CLI keeps request construction explicit and inspectable. Modules use direct HTTP requests from the Python standard library rather than an SDK, so the payloads stay close to the API surface being tested.

  • Required endpoint fields receive conservative defaults.
  • Optional endpoint fields remain None until the user passes them.
  • None values are pruned before JSON or multipart requests are sent.
  • String-or-object API parameters expose both plain string flags and -json flags.
  • Full HTTP exchanges are captured for verbose output.
  • Secrets are redacted before printing.

The module registry lives in src/openai_tests/registry.py. New endpoint checks belong under src/openai_tests/test_modules/ and are documented under docs/.

Development and CI

Run the standard local checks before merging changes:

uv run poe fmt
uv run poe check
uv run poe socket

uv run poe check runs formatting checks, Ruff linting, type checking, actionlint, unit tests, live OpenAI integration tests, coverage validation, and pre-commit hooks. The repository requires 100% line and branch coverage.

uv run poe socket installs the pinned Socket CLI from package-lock.json, generates CycloneDX manifests, and runs an authenticated read-only Socket scan preflight. It requires SOCKET_API_KEY, SOCKET_API_TOKEN, or SOCKET_CLI_API_TOKEN.

GitHub Actions runs unit and integration in parallel, then a validate job succeeds only when both passed. Socket's GitHub App publishes separate required dependency-security checks.

Documentation

FAQ

Can I use it against a local service with no auth?

Yes. If no API key is provided through --api-key, OPENAI_API_KEY, or OPENAI_TESTS_API_KEY, no Authorization header is sent.

Is this a replacement for a full API conformance suite?

No. It is a focused smoke-test tool. It is meant to catch obvious request/response incompatibilities quickly and repeatedly.

Why not use the OpenAI SDK?

The tests deliberately use direct HTTP requests so the request body, endpoint URL, response status, and raw response are easy to inspect.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openai_tests-0.1.0.tar.gz (224.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openai_tests-0.1.0-py3-none-any.whl (169.9 kB view details)

Uploaded Python 3

File details

Details for the file openai_tests-0.1.0.tar.gz.

File metadata

  • Download URL: openai_tests-0.1.0.tar.gz
  • Upload date:
  • Size: 224.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for openai_tests-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b4a5ab42ee97a1adbebbcbf891ddc700036318cb5c6e5e33694a37aed06ea9f5
MD5 9eaa4022b3187f995ba3a29e5d3e3bcc
BLAKE2b-256 2703945b2ddadf9aa3c552da6a1a1324fbba272d35c6763e387090c00a9dde5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for openai_tests-0.1.0.tar.gz:

Publisher: publish-pypi.yml on donadiosolutions/openai-tests

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openai_tests-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openai_tests-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 169.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for openai_tests-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 236cd554df29e5d52ae8f8c703fcf7d3db554873189aee7abca6fb5bfac8182d
MD5 989262a102b274c2dc3b9211d839f90f
BLAKE2b-256 4351a85d9040bffddfe20eb2a859ce27e62951c85a2866ac8477e2d75345d52b

See more details on using hashes here.

Provenance

The following attestation bundles were made for openai_tests-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on donadiosolutions/openai-tests

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page