Unified AI model serving framework with API streaming support

These details have not been verified by PyPI

Project description

isA_Model - AI Model Serving & Training Platform

Operators: see docs/PRODUCTION_READINESS.md for the component-by-component status matrix (what's actually deployed vs Helm-only vs planned).

A comprehensive Python platform for AI model serving, training, and optimization. Provides unified interface for multiple AI providers, intelligent model selection, LLM caching, multi-modal capabilities, and Lightning-based training workflows.

Current Version: 0.6.0

Architecture Overview
Core Components
Installation
Quick Start
AI Model Serving
Lightning Training
Multi-Modal Services
Examples
Documentation
Development

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                    isA_Model Platform                   │
│                                                             │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────┐ │
│  │  Model Serving  │  │  Lightning       │  │  Core       │ │
│  │                 │  │  Training       │  │  Services   │ │
│  │ • Multi-Provider│  │                 │  │ • Config     │ │
│  │ • LLM Caching   │  │ • APO/GRPO      │  │ • Discovery │ │
│  │ • Tool Calling   │  │ • Closed-Loop   │  │ • Logging    │ │
│  │ • Multi-Modal   │  │ • Custom        │  │ • Events     │ │
│  └─────────────────┘  └─────────────────┘  └─────────────┘ │
└─────────────────────────────────────────────────────────────────┘

Core Components

1. AI Model Serving (`isa_model/inference/`)

Multi-Provider Support: OpenAI, Replicate, Ollama, Cerebras, OpenRouter
Intelligent Caching: Production-grade LLM caching with Redis backend
Tool Calling: OpenAI-compatible function calling interface
Multi-Modal: Text, Vision, Audio, Video, Embeddings
Streaming Support: Real-time streaming for all providers

2. Lightning Training (`isa_model/training/lightning/`)

Algorithm Framework: APO, GRPO, Closed-Loop, Custom algorithms
Data Pipeline: Automated trace collection and conversion
Job Management: RESTful API for training lifecycle
Event-Driven: NATS-based coordination and monitoring
Storage Abstraction: Memory and PostgreSQL backends

3. Core Services (`isa_model/core/`)

Configuration: Environment-based config management
Discovery: Consul-based service registration
Logging: Structured logging with Loki integration
Pricing: Cost tracking and optimization
Database: PostgreSQL gRPC client abstraction

4. Deployment (`isa_model/deployment/`)

Kubernetes: Production-ready K8s manifests
Docker: Multi-stage Dockerfiles for all components
Modal: Serverless deployment support
Triton: NVIDIA Triton Inference Server integration

Installation

Basic Installation

pip install isa_model

Installation with Optional Dependencies

# Cloud API providers (OpenAI, Replicate, Cerebras, Modal)
pip install isa_model[cloud]

# Local inference (PyTorch + transformers)
pip install isa_model[local]

# Audio processing
pip install isa_model[audio]

# Vision processing
pip install isa_model[vision]

# LangChain integration
pip install isa_model[langchain]

# Monitoring (MLflow, Prometheus, Redis)
pip install isa_model[monitoring]

# Full installation (all features)
pip install isa_model[all]

# Optimized for staging/production
pip install isa_model[staging]

Quick Start

Using the Async Client (Recommended)

The AsyncISAModel client provides an OpenAI-compatible interface:

from isa_model.inference_client import AsyncISAModel
import asyncio

async def main():
    async with AsyncISAModel(base_url="http://localhost:8082") as client:
        # Simple chat
        response = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Hello!"}]
        )
        print(response.choices[0].message.content)

        # Streaming chat
        stream = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Count to 5"}],
            stream=True
        )
        async for chunk in stream:
            if chunk.choices[0].delta.content:
                print(chunk.choices[0].delta.content, end="", flush=True)

asyncio.run(main())

Using AIFactory (Direct Service Access)

For more control, use the AIFactory to get service instances:

from isa_model.inference.ai_factory import AIFactory

factory = AIFactory.get_instance()

# Use OpenAI with API key
llm = factory.get_llm(
    model_name="gpt-4o-mini", 
    provider="openai", 
    api_key="your-openai-api-key-here"
)

# Use local Ollama model (no API key needed)
llm = factory.get_llm(model_name="llama3.1", provider="ollama")

Core Features

Multi-Modal AI Services

LLM (Text Generation): OpenAI (GPT-4, GPT-4o-mini), Ollama (Llama, Qwen), Cerebras, OpenRouter (DeepSeek-R1)
Vision: Image analysis (GPT-4o, ISA OmniParser), Image generation (DALL-E, Flux, Nano-Banana)
Audio: Speech-to-Text (Whisper, GPT-4o-transcribe), Text-to-Speech (OpenAI TTS, Replicate)
Video: Text-to-Video (ByteDance Seedance-1-Pro)
Embeddings: Text embeddings (OpenAI, Ollama), Document reranking (Jina Reranker v2)

Intelligent Features

Smart Model Selection: Automatically choose the best model based on task and input
LLM Caching: Two-layer cache (streaming + non-streaming) with 50-100x speedup
Tool Calling: Function calling with OpenAI-compatible interface
Streaming Support: Real-time streaming for all text generation
Format Negotiation: Supports OpenAI dict, LangChain message formats

Enterprise Features

Cost Tracking: Automatic cost calculation and tracking
Graceful Degradation: Cache failures don't break requests
Feature Flags: Environment-based feature control
Monitoring: Redis-backed metrics, hit rate tracking
Multi-Provider: Easy provider switching without code changes

API Client Usage

Comprehensive Example

See docs/guidance/examples/model_client_examples_async.py for complete examples covering:

from isa_model.inference_client import AsyncISAModel

async with AsyncISAModel(base_url="http://localhost:8082") as client:
    # 1. Simple chat
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    
    # 2. Streaming chat
    stream = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Tell a story"}],
        stream=True
    )
    async for chunk in stream:
        print(chunk.choices[0].delta.content, end="")
    
    # 3. JSON mode (structured output)
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Generate a person profile"}],
        response_format={"type": "json_object"}
    )
    
    # 4. Function calling
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What's the weather?"}],
        tools=[{
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string"}
                    }
                }
            }
        }]
    )
    
    # 5. Vision analysis
    vision = await client.vision.completions.create(
        image="https://example.com/image.jpg",
        prompt="Describe this image",
        model="gpt-4o-mini",
        provider="openai"
    )
    
    # 6. Image generation
    image = await client.images.generate(
        prompt="A beautiful sunset over mountains",
        model="dall-e-3",
        size="1024x1024",
        provider="openai"
    )
    
    # 7. Embeddings
    embedding = await client.embeddings.create(
        input="This is a test sentence",
        model="text-embedding-3-small"
    )
    
    # 8. Speech-to-Text
    transcription = await client.audio.transcriptions.create(
        file="audio.wav",
        model="gpt-4o-mini-transcribe"
    )

Client Test Results

Async Client: 11/11 examples passed (100% success rate)
Sync Client: 5/9 attempted (streaming and TTS have limitations)

Recommendation: Always use AsyncISAModel for production workloads.

LLM Caching

NEW in v0.5.7: Production-grade LLM inference caching with Phase 2 implementation complete.

Features

Streaming Cache with Replay: 15ms per chunk delay for natural streaming feel
Non-Streaming Cache: Instant responses (~5ms vs 500ms)
Temperature-Based TTL: Smart expiration (temp=0 → 24h, temp=0.3 → 1h, temp=0.7 → 5min)
Graceful Degradation: Cache failure = automatic pass-through to LLM
Real-time Monitoring: Hit rate, replay stats, time saved tracking

Quick Setup

# Enable cache
export ENABLE_LLM_CACHE=true
export REDIS_HOST=localhost
export REDIS_PORT=50055

# Start service
python -m isa_model.serving.api.main

Performance Gains

Scenario	First Request	Cached Request	Speedup	Cost Saving
Non-streaming chat	500ms	5ms	100x	100%
Streaming chat	3000ms	2500ms	1.2x	100%
Code generation	2000ms	8ms	250x	100%

Expected Savings (40% hit rate, 1000 req/day):

Daily: $0.40
Monthly: $12
Annual: $144

For high-traffic systems (100K req/day): $1,200/month savings

Cache Management

# Get cache statistics
curl http://localhost:8082/api/v1/cache/stats

# Invalidate model cache (when model updates)
curl -X POST http://localhost:8082/api/v1/cache/invalidate/openai/gpt-4o-mini

# Clear all cache
curl -X POST http://localhost:8082/api/v1/cache/clear

See docs/CACHE_QUICKSTART.md for complete documentation.

DeepSeek-R1 Reasoning Model

NEW in v0.5.7: Support for DeepSeek-R1, a powerful reasoning model that shows its thought process.

Features

Visible Reasoning: See the model's thinking with show_reasoning=True
Streaming Tool Calling: Call tools while streaming reasoning process
Token Tracking: Separate tracking for reasoning tokens vs completion tokens
Cost Optimization: Reasoning tokens charged at input token rate ($0.55/1M)

Basic Usage

from isa_model.inference.ai_factory import AIFactory

factory = AIFactory()
llm = factory.get_llm(provider="openrouter", model_name="deepseek-r1")

# Without reasoning (only final answer)
response = await llm.ainvoke("If 2x + 5 = 11, what is x?", show_reasoning=False)

# With reasoning (see thought process)
response = await llm.ainvoke("If 2x + 5 = 11, what is x?", show_reasoning=True)
# Output includes: [思考: ...] tags showing reasoning steps

# Get token usage
usage = llm.get_last_token_usage()
print(f"Reasoning tokens: {usage['reasoning_tokens']}")
print(f"Completion tokens: {usage['completion_tokens']}")

Streaming with Reasoning

async for chunk in llm.astream("Calculate 15 × 23", show_reasoning=True):
    if chunk.startswith('[思考:') and chunk.endswith(']'):
        # Reasoning tokens (gray text)
        reasoning = chunk[4:-1]
        print(f"\033[90m{reasoning}\033[0m", end="", flush=True)
    else:
        # Normal content
        print(chunk, end="", flush=True)

See docs/guidance/examples/deepseek_r1_reasoning_example.py and docs/guidance/deepseek-r1.md for complete examples.

Multi-Modal Services

Speech-to-Text (4 Models)

# Basic transcription (fastest, cheapest)
transcription = await client.audio.transcriptions.create(
    file="audio.wav",
    model="gpt-4o-mini-transcribe"  # NEW default model
)

# High quality transcription
transcription = await client.audio.transcriptions.create(
    file="audio.wav",
    model="gpt-4o-transcribe"  # Highest quality
)

# With speaker diarization
transcription = await client.audio.transcriptions.create(
    file="audio.wav",
    model="gpt-4o-transcribe-diarize",
    enable_diarization=True,
    response_format="diarized_json"
)
# Returns: segments with speaker labels, timestamps

# Legacy Whisper model
transcription = await client.audio.transcriptions.create(
    file="audio.wav",
    model="whisper-1"  # Legacy
)

Video Generation

# Text-to-Video with ByteDance Seedance-1-Pro
response = await client._underlying_client.invoke(
    input_data="The sun rises slowly between tall buildings...",
    task="generate",
    service_type="video_generation",
    provider="replicate",
    model="seedance-1-pro",
    duration=5,
    fps=24,
    resolution="1080p",
    aspect_ratio="16:9"
)

Multi-Image Input

# Google Nano-Banana (Multi-Image Style Transfer)
response = await client._underlying_client.invoke(
    input_data="Make the sheets in the style of the logo",
    task="img2img",
    service_type="image_generation",
    provider="replicate",
    model="nano-banana",
    init_image=[
        "https://example.com/image1.png",
        "https://example.com/image2.png"
    ],
    aspect_ratio="match_input_image"
)

ISA Proprietary Services

# ISA OmniParser - UI Detection
vision = await client.vision.completions.create(
    image="https://example.com/ui-screenshot.jpg",
    prompt="Detect UI elements",
    model="isa-omniparser-ui-detection",
    provider="isa"
)

# ISA Jina Reranker v2 - Document Reranking
response = await client._underlying_client.invoke(
    input_data="What is machine learning?",
    task="rerank",
    service_type="embedding",
    provider="isa",
    model="isa-jina-reranker-v2-service",
    documents=[
        "Machine learning is a subset of AI...",
        "Python is a programming language...",
        "Neural networks are computational models..."
    ]
)

Tool Calling

OpenAI-Compatible Function Calling

from isa_model.inference_client import AsyncISAModel
import json

WEATHER_TOOL = {
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"}
            },
            "required": ["location"]
        }
    }
}

async with AsyncISAModel() as client:
    # Request with tool
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
        tools=[WEATHER_TOOL]
    )
    
    # Check if tool was called
    if response.choices[0].message.tool_calls:
        tool_call = response.choices[0].message.tool_calls[0]
        print(f"Tool: {tool_call.function.name}")
        print(f"Args: {tool_call.function.arguments}")
        
        # Execute tool (your implementation)
        args = json.loads(tool_call.function.arguments)
        result = get_weather(**args)
        
        # Continue conversation with tool result
        messages = [
            {"role": "user", "content": "What's the weather in Tokyo?"},
            {
                "role": "assistant",
                "tool_calls": [{
                    "id": tool_call.id,
                    "type": "function",
                    "function": {
                        "name": tool_call.function.name,
                        "arguments": tool_call.function.arguments
                    }
                }]
            },
            {
                "role": "tool",
                "tool_call_id": tool_call.id,
                "content": json.dumps(result)
            }
        ]
        
        final = await client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages
        )
        print(final.choices[0].message.content)

Streaming Tool Calling (DeepSeek-R1)

# Tool calls appear at the end of stream in delta.tool_calls
stream = await client.chat.completions.create(
    model="deepseek-r1",
    provider="openrouter",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=[WEATHER_TOOL],
    stream=True,
    show_reasoning=True
)

tool_calls = []
async for chunk in stream:
    delta = chunk.choices[0].delta
    
    # Collect reasoning and content
    if delta.content:
        print(delta.content, end="", flush=True)
    
    # Collect tool calls (appear at end)
    if delta.tool_calls:
        tool_calls.extend(delta.tool_calls)

# Execute tools after stream completes
for tc in tool_calls:
    args = json.loads(tc.function.arguments)
    result = execute_tool(**args)

See docs/guidance/examples/tool_call_streaming_example.py for complete agent loop implementation.

Examples

All runnable examples are in docs/guidance/examples/:

model_client_examples_async.py: Comprehensive async client examples (11/11 passed)
- Simple chat, streaming, multiple providers
- JSON mode, function calling
- Vision, embeddings, image generation
- Format negotiation, error handling
- Speech-to-Text, ISA services
model_client_examples_sync.py: Sync client (basic usage only, has limitations)
deepseek_r1_reasoning_example.py: DeepSeek-R1 reasoning examples
- Basic math, complex problems
- Streaming with reasoning
- Code generation, multi-turn chat
tool_call_streaming_example.py: Tool calling examples
- Basic streaming tool calls
- Complete agent loop
- DeepSeek-R1 reasoning + tools
nano_banana_example.py: Multi-image style transfer
seedance_video_example.py: Text-to-video generation

See docs/guidance/examples/README.md for detailed documentation.

Documentation

Comprehensive documentation is available in the docs/ directory:

docs/
├── overview/           → Project vision, goals, architecture
├── research/           → Research findings and exploration
├── domain/             → Domain concepts and knowledge models
├── prd/                → Product requirements documents
├── design/             → Technical design specifications
└── guidance/           → Developer guides and tutorials
    └── examples/       → Runnable Python scripts

Getting Started

Quick Start: Get started in 5 minutes
LLM Services: Text generation and chat
Tool Calling: Function calling guide
Providers: Configure model providers
Caching: Cache optimization
DeepSeek R1: Reasoning model with tool calls

Project Documentation

Project Overview: Vision, goals, architecture
Product Requirements: Feature specifications
Technical Design: System design documents

Development

Installing for Development

git clone <repository-url>
cd isA_Model

# Install with all dependencies
pip install -e ".[all]"

# Or install with specific extras
pip install -e ".[cloud,langchain,dev]"

Environment Setup

For local development, copy the example deployment env file into a gitignored local override:

cp deployment/environments/dev.env.example deployment/environments/dev.env
# or create deployment/environments/dev.local.env instead

Then fill in your local secrets, for example:

OPENAI_API_KEY=your-openai-key
REPLICATE_API_TOKEN=your-replicate-token
INTERNAL_SERVICE_SECRET=your-local-internal-secret

Running the Server

# Start the FastAPI server
python -m isa_model.serving.api.main

# Or with uvicorn
uvicorn isa_model.serving.api.fastapi_server:app --host 0.0.0.0 --port 8082

Running Tests

# Run async client examples (recommended)
python docs/guidance/examples/model_client_examples_async.py

# Run specific tests
python tests/test_stt_models.py

# Run cache tests
bash tests/cache_test.sh

Building and Publishing

# Update version in pyproject.toml
# Current version: 0.6.0

# Build the package
python -m build

# Upload to PyPI
python -m twine upload dist/isa_model-0.6.0* --username __token__ --password "$PYPI_API_TOKEN"

What's New in v0.5.7

LLM Caching (Phase 2 Complete)

Streaming Cache + Replay: Natural streaming feel with 15ms/chunk delay
Non-Streaming Cache: 100x speedup for deterministic queries
Temperature-Based TTL: Smart caching based on output randomness
Real-time Monitoring: Hit rate tracking, time saved statistics
Production Ready: Feature flags, graceful degradation, zero-impact deployment

DeepSeek-R1 Support

Visible Reasoning: See model's thought process with show_reasoning=True
Streaming Tool Calling: Function calling with reasoning visibility
Token Tracking: Separate reasoning token counting and cost tracking
Agent Loop Support: Complete multi-turn conversation with tools

Enhanced Multi-Modal

Speech-to-Text: 4 models (Whisper, gpt-4o-mini-transcribe, gpt-4o-transcribe, gpt-4o-transcribe-diarize)
Video Generation: ByteDance Seedance-1-Pro text-to-video
Multi-Image Input: Google Nano-Banana style transfer
ISA Services: OmniParser UI detection, Jina Reranker v2

Client Improvements

100% Pass Rate: AsyncISAModel client (11/11 examples)
Format Negotiation: OpenAI dict + LangChain message support
Better Error Handling: Informative error messages and graceful failures
Resource Cleanup: Proper context manager support

Infrastructure

Consul Integration: Service discovery and dynamic routing
Redis Caching: Production-grade caching backend
Monitoring: Comprehensive metrics and logging
Feature Flags: Environment-based feature control

Supported Providers

Provider	LLM	Vision	Audio	Image Gen	Video	Embeddings
OpenAI	✅	✅	✅	✅	❌	✅
Replicate	✅	✅	✅	✅	✅	❌
Ollama	✅	✅	❌	❌	❌	✅
Cerebras	✅	❌	❌	❌	❌	❌
OpenRouter	✅	❌	❌	❌	❌	❌
ISA	❌	✅	❌	❌	❌	✅

Note: OpenRouter provider includes DeepSeek-R1 reasoning model.

Cost Optimization

LLM Caching Benefits

With 40% cache hit rate on 1,000 requests/day:

Daily savings: $0.40
Monthly savings: $12
Annual savings: $144

For high-traffic production (100K req/day):

Monthly savings: $1,200+

Model Selection Strategy

Development/Testing: Use gpt-4o-mini or ollama (local, free)
Production: Cache with temperature=0 for deterministic queries
Creative Tasks: Use higher temperature, shorter TTL
Code Generation: Cache aggressively (24h TTL for temp=0)

Architecture

isa_model/
├── client.py                  # Unified ISAModelClient
├── inference_client.py        # OpenAI-compatible client
├── inference/
│   ├── ai_factory.py         # Service factory
│   ├── services/             # Service implementations
│   │   ├── llm/             # LLM services
│   │   ├── vision/          # Vision services
│   │   ├── audio/           # Audio services (STT/TTS)
│   │   ├── img/             # Image generation
│   │   ├── video/           # Video generation
│   │   └── embedding/       # Embedding services
│   └── cache/               # LLM caching layer
├── serving/
│   └── api/                 # FastAPI server
├── core/
│   ├── config/              # Configuration management
│   ├── models/              # Model registry
│   └── services/            # Core services
└── deployment/              # Kubernetes, Docker configs

Roadmap

Phase 3: Semantic Caching (Planned)

Embedding-based similarity matching
Cache hits even with different wording
Target: 60-80% hit rate (vs 40% exact match)

Future Features

Cache warming on model updates
Distributed locking for multi-instance consistency
Per-user cache namespaces
A/B testing framework
Advanced cost analytics

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Make your changes with tests
Submit a pull request

See CONTRIBUTING.md for detailed guidelines.

Support

Documentation: See docs/ directory
Examples: See docs/guidance/examples/ directory
Issues: Open an issue on GitHub
Discussions: GitHub Discussions

Acknowledgments

Built with:

FastAPI for high-performance API serving
Redis for production-grade caching
OpenAI SDK compatibility layer
LangChain integration support
Comprehensive provider ecosystem

Ready to get started? Check out docs/guidance/examples/ for comprehensive usage examples!

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.0

Jun 7, 2026

0.5.7

Oct 30, 2025

0.5.6

Oct 29, 2025

0.5.5

Oct 29, 2025

0.5.4

Oct 25, 2025

0.5.3

Oct 23, 2025

0.5.2

Oct 21, 2025

0.5.1

Oct 21, 2025

0.4.7

Oct 12, 2025

0.4.6

Oct 11, 2025

0.4.5

Oct 10, 2025

0.4.4

Oct 10, 2025

0.4.3

Oct 10, 2025

0.4.0

Jul 28, 2025

0.3.91

Jul 2, 2025

0.3.9

Jul 2, 2025

0.3.8

Jul 2, 2025

0.3.7

Jul 2, 2025

0.3.6

Jul 2, 2025

0.3.5

Jun 29, 2025

0.3.4

Jun 22, 2025

0.3.3

Jun 22, 2025

0.3.2

Jun 22, 2025

0.3.1

Jun 22, 2025

0.3.0

Jun 21, 2025

0.2.9

Jun 21, 2025

0.2.8

Jun 21, 2025

0.0.8

Jun 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isa_model-0.6.0.tar.gz (1.2 MB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

isa_model-0.6.0-py3-none-any.whl (1.4 MB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file isa_model-0.6.0.tar.gz.

File metadata

Download URL: isa_model-0.6.0.tar.gz
Upload date: Jun 7, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for isa_model-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`e223f28e56d22767224873f6a94ae25ff0b5ea08b0ab5b147be218c7fdd7dc1c`
MD5	`3e616cb04c51138dfddc4a3a62db0a5e`
BLAKE2b-256	`e91461447da312fd3bdf25e33a32e237212fbdeadb0c2df7e48d24409e68d2e4`

See more details on using hashes here.

File details

Details for the file isa_model-0.6.0-py3-none-any.whl.

File metadata

Download URL: isa_model-0.6.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 1.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for isa_model-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4bccb779ce81c3e37985c31b7e77d34fbcfb6cbdba9f4a8df1445da74f12df0`
MD5	`f3c056b1a2aaf896208a07cf1ddfa314`
BLAKE2b-256	`44957e21baf4ab1f041548e5c4b5d6fa930c9cc73c05705c850e63ac7063f5b3`

See more details on using hashes here.

isa-model 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

isA_Model - AI Model Serving & Training Platform

Table of Contents

Architecture Overview

Core Components

1. AI Model Serving (isa_model/inference/)

2. Lightning Training (isa_model/training/lightning/)

3. Core Services (isa_model/core/)

4. Deployment (isa_model/deployment/)

Installation

Basic Installation

Installation with Optional Dependencies

Quick Start

Using the Async Client (Recommended)

Using AIFactory (Direct Service Access)

Core Features

Multi-Modal AI Services

Intelligent Features

Enterprise Features

API Client Usage

Comprehensive Example

Client Test Results

LLM Caching

Features

Quick Setup

Performance Gains

Cache Management

DeepSeek-R1 Reasoning Model

Features

Basic Usage

Streaming with Reasoning

Multi-Modal Services

Speech-to-Text (4 Models)

Video Generation

Multi-Image Input

ISA Proprietary Services

Tool Calling

OpenAI-Compatible Function Calling

Streaming Tool Calling (DeepSeek-R1)

Examples

Documentation

Getting Started

Project Documentation

Development

Installing for Development

Environment Setup

Running the Server

Running Tests

Building and Publishing

What's New in v0.5.7

LLM Caching (Phase 2 Complete)

DeepSeek-R1 Support

Enhanced Multi-Modal

Client Improvements

Infrastructure

Supported Providers

Cost Optimization

LLM Caching Benefits

Model Selection Strategy

Architecture

Roadmap

Phase 3: Semantic Caching (Planned)

Future Features

License

Contributing

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

1. AI Model Serving (`isa_model/inference/`)

2. Lightning Training (`isa_model/training/lightning/`)

3. Core Services (`isa_model/core/`)

4. Deployment (`isa_model/deployment/`)