AI library and helpers (Python/Poetry/Typer - LM Studio & llama.cpp)
Project description
TransAI
AI library and helpers (Python/Poetry/Typer - LM Studio & llama.cpp).
- Primary use case: Python API/interface with local AI models
- Works with: local AI models via LM Studio or llama.cpp
- Status: stable
- License: Apache-2.0
Since version 1.0.0 it is a PyPI package: https://pypi.org/project/transai/
Table of contents
- TransAI
License
Copyright 2025 Daniel Balparda balparda@github.com
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License here.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Third-party notices
This project depends on third-party software. Key runtime dependencies:
- transcrypto (Apache-2.0) — CLI modules, logging, utilities
- llama-cpp-python (MIT) — llama.cpp Python bindings
- lmstudio — LM Studio client library
- Pillow (MIT-CMU) — image processing
- pydantic (MIT) — data validation and JSON schema
See pyproject.toml for the full dependency list.
Installation
To use in your project:
pip3 install transai
and then import the library:
from transai.core import ai, lms, llama
from transai.utils import images
For the CLI tool, after installation just run:
transai --help
Supported platforms
- OS: Linux, macOS, Windows (wherever
llama-cpp-pythonandlmstudioare supported) - Architectures: x86_64, arm64
- Python: 3.12+
Known dependencies (Prerequisites)
- python 3.12+ — documentation
- transcrypto 2.5+ — CLI modules, logging, humanization, config management, etc. — documentation
- Pillow 12.2+ — image processing and format conversion
- pydantic 2.12+ — data validation and JSON schema generation
- llama-cpp-python 0.3.20+ — llama.cpp Python bindings for local GGUF model inference
- lmstudio 1.5+ — LM Studio client library for the LM Studio API
- rich — terminal output formatting (via transcrypto)
- typer — CLI framework (via transcrypto)
What TransAI is
TransAI is a Python library and CLI tool that provides a unified interface for running local AI models through two backends:
- LM Studio (
LMStudioWorker): connects to a running LM Studio server on localhost via thelmstudioclient library. This is the recommended and default backend. - llama.cpp (
LlamaWorker): loads GGUF model files directly into memory usingllama-cpp-python. Useful when you want full control without running an LM Studio server.
Both backends share the same abstract interface (AIWorker), so you can swap backends without changing your application code. Models can be queried with plain text prompts or with structured output (Pydantic models), vision models can process images, and tool-capable models can call Python functions.
What TransAI is not
- Not a cloud AI service — it only works with local models
- Not a model downloader — you must have models available locally (via LM Studio or as GGUF files)
- Not a training framework — inference only
- Not a high-level agent framework — it provides the low-level model interface layer
Key concepts and terminology
- AIWorker: abstract base class defining the interface for loading and querying AI models
- LMStudioWorker: concrete worker that connects to a local LM Studio server
- LlamaWorker: concrete worker that loads GGUF files directly via llama.cpp
- AIModelConfig: TypedDict with all model loading parameters (context, temperature, GPU, seed, etc.)
- Model ID: a string identifying the model, typically in the format
model-name@quantization(e.g.,qwen3-8b@Q8_0); should match what you would use withlms get <model_id>orhttps://huggingface.co/<model_id> - GGUF: the quantized model file format used by llama.cpp
- CLIP projector: a companion model file enabling vision capabilities in multi-modal models
- Speculative decoding: a technique for faster inference by generating multiple tokens in parallel
Known limitations
- LM Studio backend requires a running LM Studio server on localhost (127.0.0.1)
- llama.cpp backend requires GGUF model files on disk
- Vision support in llama.cpp depends on CLIP projector file availability and supported architectures (Qwen2-VL, MiniCPM, Llama3-Vision, Moondream, NanoLLava, Obsidian, Llava)
- No telemetry, no network calls beyond localhost (LM Studio server)
Library API usage
Loading a model
transai.core.ai exposes a convenience constructor MakeAIModelConfig(**overrides) which
returns a fully-populated AIModelConfig TypedDict with sensible defaults.
from transai.core import ai, lms, llama
# --- Using LM Studio ---
with lms.LMStudioWorker() as worker:
config, metadata = worker.LoadModel(ai.MakeAIModelConfig(
model_id='qwen3-vl-32b-instruct@Q8_0',
vision=True,
temperature=0.5, # only override the ones you care about!
# all other fields will have sensible defaults; currently also supported are:
# seed, context, gpu_ratio, gpu_layers, use_mmap, fp16, flash, spec_tokens, kv_cache
))
# ... use worker.ModelCall() ...
# --- Using llama.cpp ---
import pathlib
with llama.LlamaWorker(pathlib.Path('~/.lmstudio/models/')) as worker:
config, metadata = worker.LoadModel(ai.AIModelConfig(
model_id='qwen3-8b@Q8_0',
# ... same config field possibilities ...
))
# ... use worker.ModelCall() ...
Querying a model (text)
response: str = worker.ModelCall(
model_id='qwen3-8b@Q8_0',
system_prompt='You are a helpful assistant.',
user_prompt='What is the capital of France?',
output_format=str,
)
print(response) # "The capital of France is Paris."
Querying a model (structured JSON)
To get a structured object back from the model, just create a pydantic.BaseModel class as shown below. Make sure to add pydocs and pydantic.Field description to the fields, as all the information (name, type, descriptions) are sent to the model.
import pydantic
class CityInfo(pydantic.BaseModel):
"""City information"""
city: str = pydantic.Field(description='city name')
country: str = pydantic.Field(description='country name')
population: int = pydantic.Field(description='city population')
districts: list[str] = pydantic.Field(description='list of city district names')
result: CityInfo = worker.ModelCall(
model_id='qwen3-8b@Q8_0',
system_prompt='Extract a city information, its country, population, and list of districts.',
user_prompt='Tell me about Paris, France.',
output_format=CityInfo,
)
print(result.city) # "Paris"
print(result.population) # 2161000
Vision models (images)
import pathlib
response: str = worker.ModelCall(
model_id='qwen3-vl-32b-instruct@Q8_0',
system_prompt='Describe what you see.',
user_prompt='What is in this image?',
output_format=str,
images=[pathlib.Path('photo.jpg')], # or raw bytes, or file path string
)
Images are automatically resized to fit within 1024px (longest edge) before being sent to the model.
Tool use (function calling)
Pass Python callables (or fully-qualified dotted names) as tools. The model may invoke them during the conversation and TransAI handles the execution round-trip automatically:
import math
def celsius_to_fahrenheit(celsius: float) -> float:
"""Convert Celsius to Fahrenheit.
Args:
celsius: temperature in Celsius
Returns:
temperature in Fahrenheit
"""
return celsius * 9 / 5 + 32
# tools must be a list of callables; the model may call them zero or more times
response: str = worker.ModelCall(
model_id='qwen3-8b@Q8_0',
system_prompt='You are a helpful assistant.',
user_prompt='What is 23°C in Fahrenheit? Also, what is the GCD of 48 and 36?',
output_format=str,
tools=[celsius_to_fahrenheit, math.gcd],
)
Image utilities
The transai.utils.images module provides helpers for image preprocessing:
from transai.utils import images
# Resize an image for vision models (max 1024px, returns PNG bytes)
png_bytes: bytes = images.ResizeImageForVision(raw_image_bytes)
# Extract frames from an animated image (GIF, APNG, etc.)
for frame_png in images.AnimationFrames(animated_gif_bytes):
# each frame is PNG bytes, resized to max 336px
pass
AI Guide
Models suggestions as of April/2026. Just an opinion, not to be taken seriously. Do your own tests.
Vision Models
These models can process images.
| Model Flag Value | Size | Type | Tool? | Reason? | Comment |
|---|---|---|---|---|---|
qwen3-vl-32b-instruct@Q8_0 |
36GB | llm/qwen3vl/GGUF |
Y | Very good, slow. | |
qwen3-vl-32b-instruct@F16 |
67GB | llm/qwen3vl/GGUF |
Y | --fp16 - Very good, slow. Q8_0 version is faster-ish and still very good. |
|
qwen3.5-35b-a3b@Q8_0 * |
38GB | llm/qwen35moe/GGUF |
Y | Y | Decent, slow. |
zai-org/glm-4.6v-flash@8bit * |
12GB | llm/glm4v/MLX |
Y | Y | Decent, slow. |
Blind Models
These models cannot process images (blind).
| Model Flag Value | Size | Type | Tool? | Reason? | Comment |
|---|---|---|---|---|---|
qwen3-8b@Q8_0 |
8.7GB | llm/qwen3/GGUF |
Y | Good, medium-speed. | |
gpt-oss-20b@MXFP4 * |
12GB | llm/gpt_oss/MLX |
Y | Y | Poor, slow. |
zai-org/glm-4.7-flash@8bit * |
32GB | llm/glm4v/MLX |
Y | Y | Good, inconsistent. |
CLI Interface
Quick start
Query a local AI model via LM Studio (server must be running):
transai query "What is the capital of France?"
Query using the llama.cpp backend (direct GGUF loading, no server needed):
transai --no-lms --root ~/.lmstudio/models/ query "Give me an onion soup recipe."
Query with tool use (pass fully-qualified Python callable names; model calls them automatically):
transai query --tools math.gcd --tools os.getcwd "What is the GCD of 48 and 36? Also what is my current directory?"
Global flags
| Flag | Description | Default |
|---|---|---|
--help |
Show help | off |
--version |
Show version and exit | off |
-v, -vv, -vvv, --verbose |
Verbosity (nothing=ERROR, -v=WARNING, -vv=INFO, -vvv=DEBUG) |
ERROR |
--color/--no-color |
Force enable/disable colored output (respects NO_COLOR env var if not provided) |
--color |
-r, --root |
Local models root directory (only needed for --no-lms) |
LM Studio default if it exists |
--lms/--no-lms |
Use LM Studio backend vs llama.cpp backend | --lms |
-m, --model |
Model to load (e.g., qwen3-8b@Q8_0) |
qwen3-8b@Q8_0 |
-t, --tokens |
Speculative decoding tokens (2-200) | disabled |
-s, --seed |
Random seed for reproducibility | random |
--context |
Max context tokens (16-16777216) | 32768 |
-x, --temperature |
Sampling temperature (0.0-2.0) | 0.15 |
-g, --gpu |
GPU ratio (0.1-1.0) | 0.80 |
--gpu-layers |
GPU layers to offload (-1 = as many as possible) | -1 |
--fp16/--no-fp16 |
FP16 precision mode | --no-fp16 |
--mmap/--no-mmap |
Memory-mapped file loading | --mmap |
--flash/--no-flash |
Flash attention | --flash |
--kv-cache |
KV-cache precision type (GGML type, 4-128) | model default |
CLI Commands Documentation
This software auto-generates docs for CLI apps:
Color and formatting
Rich provides color output in logging and CLI output. The app:
- Respects
NO_COLORenvironment variable - Has
--no-color/--colorflag: if given, overrides theNO_COLORenvironment variable - If there is no environment variable and no flag is given, defaults to having color
To control color see Rich's markup conventions.
Project Design
Architecture overview
TransAI uses an abstract base class pattern for backend abstraction:
CLI (transai.py + cli/query.py)
│
├─ LMStudioWorker (core/lms.py) ──▶ LM Studio server (localhost)
│
└─ LlamaWorker (core/llama.py) ──▶ GGUF files on disk
│
└─ Both implement AIWorker (core/ai.py)
│
└─ Image utilities (utils/images.py)
AIWorkerdefinesLoadModel()andModelCall()as the public interfaceLMStudioWorkerandLlamaWorkerimplement_Load()and_Call()internally- The CLI layer (
transai.py,cli/query.py) orchestrates configuration and delegates to workers - Image preprocessing is handled by
utils/images.py
Modules
| Module | Responsibility |
|---|---|
transai.py |
CLI app definition, global options, TransAIConfig dataclass |
cli/query.py |
query command implementation |
core/ai.py |
AIWorker abstract base class, AIModelConfig, shared constants and types |
core/lms.py |
LMStudioWorker — LM Studio backend implementation |
core/llama.py |
LlamaWorker — llama.cpp backend implementation (GGUF loading, CLIP detection, vision handlers) |
utils/images.py |
Image resizing for vision models, animation frame extraction |
Development Instructions
File structure
.
├── CHANGELOG.md ⟸ latest changes/releases
├── LICENSE
├── Makefile
├── transai.md ⟸ auto-generated CLI doc (by `make docs` or `make ci`)
├── poetry.lock ⟸ maintained by Poetry, do not manually edit
├── pyproject.toml ⟸ most important configurations live here
├── README.md ⟸ this documentation
├── SECURITY.md ⟸ security policy
├── requirements.txt
├── .pre-commit-config.yaml ⟸ pre-submit configs
├── .github/
│ ├── copilot-instructions.md ⟸ GitHub Copilot project-specific instructions
│ ├── dependabot.yaml ⟸ Github dependency update pipeline
│ └── workflows/
│ ├── ci.yaml ⟸ Github CI pipeline
│ └── codeql.yaml ⟸ Github security scans and code quality pipeline
├── .vscode/
│ └── settings.json ⟸ VSCode configs
├── scripts/
│ └── make_test_images.py ⟸ helper script for generating test images
├── src/
│ └── transai/
│ ├── __init__.py ⟸ version and package metadata
│ ├── __main__.py ⟸ `python -m transai` entry point
│ ├── transai.py ⟸ main CLI app entry point (Run(), Main())
│ ├── py.typed ⟸ PEP 561 marker for type stubs
│ ├── cli/
│ │ └── query.py ⟸ `transai query` command implementation
│ ├── core/
│ │ ├── ai.py ⟸ AIWorker abstract base class, AIModelConfig, shared types
│ │ ├── llama.py ⟸ LlamaWorker (llama.cpp backend)
│ │ └── lms.py ⟸ LMStudioWorker (LM Studio backend)
│ └── utils/
│ └── images.py ⟸ image preprocessing for vision models
├── tests/ ⟸ unit tests
│ ├── transai_test.py
│ ├── cli/
│ │ └── query_test.py
│ ├── core/
│ │ ├── ai_test.py
│ │ ├── llama_test.py
│ │ └── lms_test.py
│ └── utils/
│ └── images_test.py
└── tests_integration/
└── test_installed_cli.py ⟸ integration tests (wheel build + install)
Development Setup
Install Python
On Linux:
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install git python3 python3-dev python3-venv build-essential software-properties-common
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.12
On Mac:
brew update
brew upgrade
brew cleanup -s
brew install git python@3.12
Install Poetry (recommended: pipx)
Install pipx (if you don't have it):
python3 -m pip install --user pipx
python3 -m pipx ensurepath
If you previously had Poetry installed, but not through pipx make sure to remove it first: brew uninstall poetry (mac) / sudo apt-get remove python3-poetry (linux). You should install Poetry with pipx and configure poetry to create .venv/ locally. This keeps Poetry isolated from project virtual environments and python for the environments is isolated from python for Poetry. Do:
pipx install poetry
poetry --version
If you will use PyPI to publish:
poetry config pypi-token.pypi <TOKEN> # add your personal PyPI project token, if any
Make sure .venv is local
This project expects a project-local virtual environment at ./.venv (VSCode settings assume it).
poetry config virtualenvs.in-project true
Get the repository
git clone https://github.com/balparda/transai.git transai
cd transai
Create environment and install dependencies
From the repository root:
poetry env use python3.12 # creates the .venv with the correct Python version
poetry sync # sync env to project's poetry.lock file
poetry env info # no-op: just to check that environment looks good
poetry check # no-op: make sure all pyproject.toml fields are being used correctly
poetry run transai --help # simple test if everything loaded OK
make ci # should pass OK on clean repo
To activate and use the environment do:
poetry env activate # (optional) will print activation command for environment, but you can just use:
source .venv/bin/activate # because .venv SHOULD BE LOCAL
...
pytest -vvv # for example, or other commands you want to execute in-environment
...
deactivate # to close environment
Optional: VSCode setup
This repo ships a .vscode/settings.json configured to:
- use
./.venv/bin/python - run
pytest - use Ruff as formatter
- disable deprecated pylint/flake8 integrations
- configure Google-style docstrings via autoDocstring
- use Code Spell Checker
Recommended VSCode extensions:
- Python (
ms-python.python) - Python Environments (
ms-python.vscode-python-envs) - Python Debugger (
ms-python.debugpy) - Pylance (
ms-python.vscode-pylance) - Mypy Type Checker (
ms-python.mypy-type-checker) - Ruff (
charliermarsh.ruff) - autoDocstring – Python Docstring Generator (
njpwerner.autodocstring) - Code Spell Checker (
streetsidesoftware.code-spell-checker) - markdownlint (
davidanson.vscode-markdownlint) - Markdown All in One (
yzhang.markdown-all-in-one) - helps maintain thisREADME.mdtable of contents - Markdown Preview Enhanced (
shd101wyy.markdown-preview-enhanced, optional) - GitHub Copilot (
github.copilot) - AI assistant; reads.github/copilot-instructions.mdfor project-specific coding conventions (indentation, naming, workflow)
Testing
Unit tests / Coverage
make test # plain test run, no integration tests
make integration # run the integration tests
poetry run pytest -vvv # verbose test run, includes integration tests
make cov # coverage run, equivalent to: poetry run pytest --cov=src --cov-report=term-missing
A test can be marked with a "tag" by just adding a decorator:
@pytest.mark.slow
def test_foo_method() -> None:
"""Test."""
...
These tags are defined in pyproject.toml, in section [tool.pytest.ini_options.markers]:
| Tag | Meaning |
|---|---|
slow |
test is slow (> 1s) |
flaky |
AVOID! — test is known to be flaky |
stochastic |
test is capable of failing (even if very unlikely) |
integration |
integration test (wheel build + install) |
You can use them to filter tests:
poetry run pytest -vvv -m slow # run only the slow tests
You can find the slowest tests by running:
poetry run pytest -vvv -q --durations=20
You can search for flaky tests by running make flakes, which runs all tests 100 times.
Instrumenting your code
You can instrument your code to find bottlenecks:
$ source .venv/bin/activate
$ which transai
/path/to/.venv/bin/transai # <== place this in the command below:
$ pyinstrument -r html -o output1.html -- /path/to/.venv/bin/transai <your-cli-command> <your-cli-flags>
$ deactivate
This will save a file output1.html to the project directory with the timings for all method calls. Make sure to cleanup these html files later.
Integration / e2e tests
Integration tests validate packaging and the installed console script by:
- building a wheel from the repository
- installing that wheel into a fresh temporary virtualenv
- running the installed console script(s) to verify behavior (e.g.,
--versionand basic commands)
The canonical integration test is tests_integration/test_installed_cli.py. Tests in this suite are marked with pytest.mark.integration.
Run the integration tests with:
make integration # or: poetry run pytest -m integration -q
Linting / formatting / static analysis
make lint # equivalent to: poetry run ruff check .
make fmt # equivalent to: poetry run ruff format .
To check formatting without rewriting:
poetry run ruff format --check .
Type checking
make type # equivalent to: poetry run mypy src tests tests_integration
(Pyright is primarily for editor-time; MyPy is what CI enforces.)
Versioning and releases
Versioning scheme
This project follows a pragmatic versioning approach:
- Patch: bug fixes / docs / small improvements.
- Minor: new features or non-breaking changes.
- Major: breaking API changes.
See: CHANGELOG.md
Updating versions
Bump project version (patch/minor/major)
Poetry can bump versions:
# bump the version!
poetry version minor # updates 1.0.0 to 1.1.0, for example
# or:
poetry version patch # updates 1.0.0 to 1.0.1
# or:
poetry version <version-number>
# (also updates `pyproject.toml` and `poetry.lock`)
This updates [project].version in pyproject.toml. Remember to also update src/transai/__init__.py to match (this repo gets/prints __version__ from there)!
Update dependency versions
The project has a dependabot config file in .github/dependabot.yaml that weekly (defaulting to Tuesdays) scans both Github actions and the project dependencies and creates PRs to update them.
To update poetry.lock file to more current versions do poetry update, it will ignore the current lock, update, and rewrite the poetry.lock file. If you have cache problems poetry cache clear PyPI --all will clean it.
To add a new dependency you should do:
poetry add "pkg>=1.2.3" # regenerates lock, updates env (adds dep to prod code)
poetry add -G dev "pkg>=1.2.3" # adds dep to dev code ("group" dev)
# also remember: "pkg@^1.2.3" = latest 1.* ; "pkg@~1.2.3" = latest 1.2.* ; "pkg@1.2.3" exact
Keep tool versions aligned. Remember to check your diffs before submitting (especially poetry.lock) to avoid surprises!
Exporting the requirements.txt file
This project does not generate requirements.txt automatically (Poetry uses poetry.lock). If you need a requirements.txt for Docker/legacy tooling, use Poetry's export plugin (poetry-plugin-export) by simply running:
make req # or: poetry export --format requirements.txt --without-hashes --output requirements.txt
CI and docs
Make sure to run make docs or even better make ci. Both will update the CLI markdown docs and requirements.txt automatically.
Git tag and commit
Publish to GIT, including a TAG:
git commit -a -m "release version 1.0.0"
git tag 1.0.0
git push
git push --tags
Publish to PyPI
If you already have your PyPI token registered with Poetry (see Install Poetry) then just:
poetry build
poetry publish
Remember to update CHANGELOG.md.
Security
Please refer to the security policy in SECURITY.md for supported versions and how to report vulnerabilities.
The project has a codeql config file in .github/workflows/codeql.yaml that weekly (defaulting to Fridays) scans the project for code quality and security issues. It will also run on all commits. Github security issues will be opened in the project if anything is found.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transai-1.1.0.tar.gz.
File metadata
- Download URL: transai-1.1.0.tar.gz
- Upload date:
- Size: 48.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
168bfe8f51b12238caf66e219917727d4f375f38e98d5ba020f4f8536837117e
|
|
| MD5 |
910afb7634838b480b0065eaf086de52
|
|
| BLAKE2b-256 |
46839770b84c408ef1b59b4cb553138d75399cda3fca63b7782989c944919c96
|
File details
Details for the file transai-1.1.0-py3-none-any.whl.
File metadata
- Download URL: transai-1.1.0-py3-none-any.whl
- Upload date:
- Size: 44.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c79d86203c4480a58b263171ef7b4808a727c3b787198f351e94d79adb2cae67
|
|
| MD5 |
f8eb6e5072487ddae97ebfeda20a3a04
|
|
| BLAKE2b-256 |
1e4bee172aeff7e6242d9e403f1e31eea986666e314e3469c16f1f7522be6871
|