Unified multimodal multi-provider Python toolkit for text, audio, image, and video generation.
Project description
easy-ai-clients
easy-ai-clients is a typed, multimodal Python library that wraps multiple AI providers behind a single public API for:
- Text generation — chat, instructions, reasoning, tool use, batch, optional image inputs (vision)
- Speech transcription — audio to text with word timings and speaker diarization
- Speech synthesis — text to spoken audio
- Music generation — text prompt to instrumental audio
- Image generation — text to image
- Image transformation — prompt + input image to new image
- Image composition — prompt + base image + reference image to a new image
- Image editing — inpainting with optional mask
- Video generation — text/image/audio to video
- Lip sync — animate a face image with an audio track
All operations are available in synchronous and asynchronous variants. Provider names are resolved through a flexible alias system, so "gemini", "google", and "google-ai" all map to the same adapter.
Requirements
- Python 3.12 or higher
- Dependencies installed automatically:
httpx,pydantic,Pillow
Install
pip install easy-ai-clients
For local development (includes test and lint tools):
pip install -e ".[dev]"
Checking the version
import easy_ai_clients
print(easy_ai_clients.__version__)
Configuration
Credentials are resolved lazily, in this order:
- Explicit
credentials={...}values passed to a call or toEasyAiClient(...). - Environment variables from the current process.
easy-ai-clients does not auto-load .env files, keeping imports side-effect free.
Environment variable setup
cp .env.example .env
# Edit .env and fill only the providers you plan to use
export OPENAI_API_KEY="sk-..."
export GOOGLE_API_KEY="..."
If your application uses python-dotenv, load it in the application entrypoint — not inside the library:
from dotenv import load_dotenv
load_dotenv()
See docs/configuration.md for full details and per-call credential override patterns.
Quickstart
Text generation
from easy_ai_clients.text import generate
result = generate(
provider="openai",
instructions="Write a short slogan about rain in one sentence.",
credentials={"OPENAI_API_KEY": "sk-..."},
)
print(result.text)
print(result.cost_usd)
Text generation with image input (vision)
Pass one or more images via input_images to describe them, transcribe text in the image, or answer questions grounded in visual content. Accepts local paths, raw bytes, or base64 strings.
from easy_ai_clients.text import generate
result = generate(
provider="openai",
instructions="Describe the image in one concise paragraph.",
input_images=["/path/to/photo.jpg"],
credentials={"OPENAI_API_KEY": "sk-..."},
)
print(result.text)
Vision-capable providers include openai, google, anthropic, and any provider whose selected model supports image inputs.
Speech transcription
from easy_ai_clients.audio import transcribe
result = transcribe(
provider="deepgram",
audio="/path/to/recording.mp3",
credentials={"DEEPGRAM_API_KEY": "..."},
)
print(result.text)
for word in result.words:
print(word.text, word.start_seconds, word.end_seconds)
for segment in result.speaker_segments:
print(segment.speaker, segment.text)
Speech synthesis
from easy_ai_clients.audio import synthesize
result = synthesize(
provider="elevenlabs",
text="Hello, this is a test of text-to-speech synthesis.",
credentials={"ELEVENLABS_API_KEY": "..."},
)
import base64
audio_bytes = base64.b64decode(result.audio_base64)
with open("output.mp3", "wb") as f:
f.write(audio_bytes)
Music generation
from easy_ai_clients.audio import compose
result = compose(
provider="stability",
prompt="Upbeat lo-fi hip-hop, 80 BPM, mellow piano and drums",
duration_seconds=30,
credentials={"STABILITY_API_KEY": "..."},
)
import base64
audio_bytes = base64.b64decode(result.audio_base64)
with open("music.mp3", "wb") as f:
f.write(audio_bytes)
Image generation
from easy_ai_clients.image import generate
result = generate(
provider="openai",
prompt="A cinematic lighthouse in a storm, dramatic lighting",
credentials={"OPENAI_API_KEY": "sk-..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("output.png", "wb") as f:
f.write(image_bytes)
Image transformation
Takes an existing image and transforms it according to a prompt.
from easy_ai_clients.image import transform
result = transform(
provider="stability",
prompt="Turn this photo into a watercolor painting",
image="/path/to/photo.jpg",
strength=0.75,
credentials={"STABILITY_API_KEY": "..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("transformed.png", "wb") as f:
f.write(image_bytes)
Image composition
Combines a base image with a reference image using a prompt. Use it to apply the style, pose, or scene of the reference image to the subject of the base image. Typical prompts: "render the person of image 1 in the pose of image 2", "image 1 in the style of image 2", "place the subject of image 1 in the scene of image 2".
from easy_ai_clients.image import compose
result = compose(
provider="google",
prompt="Render the person of image 1 in the painting style of image 2",
image="/path/to/subject.jpg",
reference_image="/path/to/style_reference.jpg",
credentials={"GOOGLE_API_KEY": "..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("composed.png", "wb") as f:
f.write(image_bytes)
Image editing
Edits specific regions of an image using a prompt. Pass a mask to limit edits to the masked area.
from easy_ai_clients.image import edit
result = edit(
provider="openai",
prompt="Replace the sky with a dramatic sunset",
image="/path/to/photo.png",
mask="/path/to/mask.png", # black pixels = edited (inpainted), white pixels = preserved
credentials={"OPENAI_API_KEY": "sk-..."},
)
import base64
image_bytes = base64.b64decode(result.image_base64)
with open("edited.png", "wb") as f:
f.write(image_bytes)
Video generation
Video files are written directly to a local path. Long-running jobs are polled until completion.
from easy_ai_clients.video import generate
result = generate(
provider="runway",
prompt="A slow pan across a misty mountain valley at dawn",
output_path="output.mp4",
credentials={"RUNWAYML_API_SECRET": "..."},
)
print(result.output_path) # pathlib.Path to the written file
To generate a video driven by both a reference image and an audio track:
from easy_ai_clients.video import generate
result = generate(
provider="heygen",
image="/path/to/avatar.jpg",
audio="/path/to/voiceover.mp3",
output_path="avatar_video.mp4",
credentials={"HEYGEN_API_KEY": "..."},
)
Lip sync
Animate a still image or avatar portrait using a supplied audio track.
from easy_ai_clients.video import lipsync
result = lipsync(
provider="heygen",
image="/path/to/face.jpg",
audio="/path/to/speech.mp3",
output_path="lipsync.mp4",
credentials={"HEYGEN_API_KEY": "..."},
)
print(result.output_path)
Stateful client
Use EasyAiClient to share credentials and default settings across many calls:
from easy_ai_clients import EasyAiClient
client = EasyAiClient(
credentials={"OPENAI_API_KEY": "sk-..."},
timeout_seconds=90,
job_timeout_seconds=900,
max_retries=4,
)
text_result = client.text.generate(
provider="openai",
instructions="Summarize this in 3 bullets.",
context={"topic": "multimodal AI libraries"},
)
image_result = client.image.generate(
provider="openai",
prompt="Abstract geometric art in blue and gold",
)
composed = client.image.compose(
provider="google",
prompt="Put the person of image 1 in the scene of image 2",
image="/path/to/subject.jpg",
reference_image="/path/to/scene.jpg",
)
Async example
Every helper has an _async variant. Combine them with asyncio.gather for concurrent requests:
import asyncio
from easy_ai_clients.image import generate_async
from easy_ai_clients.text import generate_async as text_generate_async
async def main() -> None:
text_task = text_generate_async(
provider="openai",
instructions="Write a one-sentence product description for a smart umbrella.",
credentials={"OPENAI_API_KEY": "sk-..."},
)
image_task = generate_async(
provider="openai",
prompt="A smart umbrella with solar panels and an LED display",
credentials={"OPENAI_API_KEY": "sk-..."},
)
text_result, image_result = await asyncio.gather(text_task, image_task)
print(text_result.text)
print(image_result.image_base64[:32])
asyncio.run(main())
Public API
easy_ai_clients.text
from easy_ai_clients.text import batch_generate, batch_generate_async, generate, generate_async
from easy_ai_clients.models import TextGenerationRequest, TextGenerationResult
| Function | Description |
|---|---|
generate(provider, instructions, ...) |
Generate text from a single prompt. Accepts input_images for vision. |
generate_async(...) |
Async variant. |
batch_generate(requests, ...) |
Run multiple requests to the same provider concurrently. |
batch_generate_async(...) |
Async variant. |
easy_ai_clients.audio
from easy_ai_clients.audio import (
compose, compose_async,
synthesize, synthesize_async,
transcribe, transcribe_async,
)
from easy_ai_clients.models import (
MusicGenerationRequest, MusicGenerationResult,
SpeechSynthesisRequest, SpeechSynthesisResult,
SpeechTranscriptionRequest, SpeechTranscriptionResult,
)
| Function | Description |
|---|---|
transcribe(provider, audio, ...) |
Transcribe audio to text with word timings and speaker diarization. |
synthesize(provider, text, ...) |
Synthesize speech audio from text. |
compose(provider, prompt, ...) |
Generate instrumental music from a text prompt. |
easy_ai_clients.image
from easy_ai_clients.image import (
compose, compose_async,
edit, edit_async,
generate, generate_async,
transform, transform_async,
)
from easy_ai_clients.models import (
ImageCompositionRequest,
ImageEditRequest,
ImageGenerationRequest,
ImageResult,
ImageTransformationRequest,
)
| Function | Description |
|---|---|
generate(provider, prompt, ...) |
Generate an image from a text prompt. |
transform(provider, prompt, image, ...) |
Transform an input image guided by a prompt. |
compose(provider, prompt, image, reference_image, ...) |
Combine a base image with a reference image using a prompt. |
edit(provider, prompt, image, ...) |
Edit an image region with an optional mask. |
easy_ai_clients.video
from easy_ai_clients.video import generate, generate_async, lipsync, lipsync_async
from easy_ai_clients.models import LipSyncRequest, VideoGenerationRequest, VideoResult
| Function | Description |
|---|---|
generate(provider, output_path, ...) |
Generate a video. Accepts optional prompt, image, and audio. |
lipsync(provider, image, audio, output_path, ...) |
Animate a face image with a supplied audio track. |
Exceptions
from easy_ai_clients.exceptions import (
ConfigurationError,
EasyAiClientError,
IncompatibleParameterError,
InvalidParameterError,
InvalidProviderResponseError,
JobFailedError,
MissingCredentialError,
PricingUnavailableError,
ProviderTimeoutError,
TemporaryDownloadError,
UnsupportedModelError,
UnsupportedProviderError,
)
See docs/errors.md for descriptions and handling patterns.
Provider matrix
Text
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
groq |
GROQ_API_KEY |
together |
TOGETHER_API_KEY |
fireworks |
FIREWORKS_API_KEY |
deepseek |
DEEPSEEK_API_KEY |
openrouter |
OPENROUTER_API_KEY |
xai |
XAI_API_KEY |
mistral |
MISTRAL_API_KEY |
anthropic |
ANTHROPIC_API_KEY |
google |
GOOGLE_API_KEY |
cohere |
COHERE_API_KEY |
perplexity |
PERPLEXITY_API_KEY |
deepinfra |
DEEPINFRA_API_KEY |
huggingface |
HUGGINGFACE_API_KEY |
Audio — transcription
| Provider | Env var |
|---|---|
deepgram |
DEEPGRAM_API_KEY |
assemblyai |
ASSEMBLYAI_API_KEY |
speechmatics |
SPEECHMATICS_API_KEY |
revai |
REVAI_API_KEY |
Audio — synthesis
| Provider | Env var |
|---|---|
cartesia |
CARTESIA_API_KEY |
azure |
AZURE_SPEECH_API_KEY, AZURE_SPEECH_REGION |
hume |
HUME_API_KEY |
elevenlabs |
ELEVENLABS_API_KEY |
murf |
MURF_API_KEY |
Audio — music
| Provider | Env var |
|---|---|
google |
GOOGLE_API_KEY |
elevenlabs |
ELEVENLABS_API_KEY |
stability |
STABILITY_API_KEY |
beatoven |
BEATOVEN_API_KEY |
loudly |
LOUDLY_API_KEY |
Image — generate
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
ideogram |
IDEOGRAM_API_KEY |
stability |
STABILITY_API_KEY |
hedra |
HEDRA_API_KEY |
Image — transform
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
ideogram |
IDEOGRAM_API_KEY |
stability |
STABILITY_API_KEY |
Image — compose
| Provider | Env var |
|---|---|
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
Image — edit
| Provider | Env var |
|---|---|
openai |
OPENAI_API_KEY |
google |
GOOGLE_API_KEY |
bfl |
BFL_API_KEY |
ideogram |
IDEOGRAM_API_KEY |
stability |
STABILITY_API_KEY |
Video — generate without audio
| Provider | Env var |
|---|---|
runway |
RUNWAYML_API_SECRET |
luma |
LUMA_API_KEY |
fal |
FAL_KEY |
hedra |
HEDRA_API_KEY |
Video — generate with audio
| Provider | Env var |
|---|---|
google |
GOOGLE_API_KEY |
heygen |
HEYGEN_API_KEY |
did |
DID_API_KEY |
hedra |
HEDRA_API_KEY |
Video — lip sync
| Provider | Env var |
|---|---|
heygen |
HEYGEN_API_KEY |
did |
DID_API_KEY |
hedra |
HEDRA_API_KEY |
Error handling
easy-ai-clients uses typed exceptions so your code can handle failures intentionally.
from easy_ai_clients.exceptions import MissingCredentialError, UnsupportedProviderError
from easy_ai_clients.text import generate
try:
result = generate(provider="openai", instructions="Write one sentence.")
except MissingCredentialError as exc:
# exc.provider — the canonical provider name
# exc.env_vars — tuple of env var names that must be set
print(f"Missing credentials for {exc.provider}: {exc.env_vars}")
except UnsupportedProviderError as exc:
print(f"Unknown provider: {exc}")
A MissingCredentialError for one provider does not affect other providers. Package imports and unrelated operations remain fully usable.
See docs/errors.md for the full exception hierarchy and handling recommendations.
Provider aliases
Many providers accept common aliases:
| Alias | Resolves to |
|---|---|
gemini, google-ai, imagen, veo, lyria |
google |
flux, black-forest-labs |
bfl |
mistralai |
mistral |
runwayml, runway-ml |
runway |
lumalabs, luma-dream-machine |
luma |
hey-gen |
heygen |
d-id |
did |
eleven-labs |
elevenlabs |
assembly-ai |
assemblyai |
pplx |
perplexity |
deep-infra |
deepinfra |
hugging-face, hf |
huggingface |
Additional docs
docs/configuration.md— credential resolution, python-dotenv integration, per-call overridesdocs/providers.md— complete credential tables for every provider and operationdocs/errors.md— exception hierarchy, handling patterns, retry guidance
Contributing
See CONTRIBUTING.md for development setup, running tests, and the release process.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easy_ai_clients-0.3.0.tar.gz.
File metadata
- Download URL: easy_ai_clients-0.3.0.tar.gz
- Upload date:
- Size: 59.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a2cc96114012355f5b27bea28df950fb8a77bb14c67850641f0eacd44d8a682
|
|
| MD5 |
442a187d9594d7b2a8e006e630689070
|
|
| BLAKE2b-256 |
2905a418414dfb7c6c8589437cc5b1808a3a7c11a1dae324cf1e712bf657d7dd
|
File details
Details for the file easy_ai_clients-0.3.0-py3-none-any.whl.
File metadata
- Download URL: easy_ai_clients-0.3.0-py3-none-any.whl
- Upload date:
- Size: 68.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37b5f4b893ab2cb0ff412745ab808774bb7c90ebf329eb8076f7771df7194f79
|
|
| MD5 |
e940569546823030cf65389c311c75bb
|
|
| BLAKE2b-256 |
af48096e128e9e9360227f10a613298f29b69de6e7aea635762390250a5d5547
|