A Python module for efficient multi-model AI inference with memory management

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Multi-AI

A powerful Python module for managing and utilizing multiple AI models for various tasks including text generation, image analysis, speech synthesis, and audio transcription.

Features

Vision Models: Extract text from images using Qwen-VL
Text Models: Generate text using Qwen-Text
Speech Models: Convert text to speech using Zonos TTS
Audio Models: Transcribe audio using Qwen-Audio
Memory Management: Efficient handling of model loading and GPU memory
CLI Interface: Easy-to-use command-line interface for all operations

Installation

# Clone the repository
git clone https://github.com/yourusername/multi-ai.git
cd multi-ai

# Install the package
pip install -e .

Usage

Command Line Interface

The module provides a CLI for easy access to all features:

# List available models
multi-ai list

# Generate text
multi-ai generate qwen-text "Write a poem about nature"

# Extract text from image
multi-ai generate qwen-vl "Extract all text from this image" --image path/to/image.png

# Convert text to speech
multi-ai tts "Hello, world!" --output output.wav

# Transcribe audio
multi-ai transcribe path/to/audio.wav

Python API

Basic Usage

from multi_ai import ModelManager

# Initialize the model manager
manager = ModelManager(device="cuda")  # or "cpu"

# Load and use a model
model = manager.load_model("qwen-text")
response = model.generate("Write a poem about nature")
print(response)

# Unload the model when done
manager.unload_model("qwen-text")

Vision Text Extraction

from multi_ai import ModelManager

# Initialize manager
manager = ModelManager(device="cuda")

# Load vision model
vision_model = manager.load_model("qwen-vl")

# Extract text from image
extracted_text = vision_model.generate_with_image(
    "path/to/image.png",
    "Extract and output all text from this image, preserving the original formatting."
)
print(extracted_text)

# Clean up
manager.unload_model("qwen-vl")

Text-to-Speech Generation

from multi_ai import ModelManager

# Initialize manager
manager = ModelManager(device="cuda")

# Load TTS model
tts_model = manager.load_model("zonos-tts")

# Generate speech
tts_model.generate_speech(
    "Hello, this is a test of the text-to-speech system.",
    "output.wav"
)

# Clean up
manager.unload_model("zonos-tts")

Audio Transcription

from multi_ai import ModelManager

# Initialize manager (use CPU for audio model)
manager = ModelManager(device="cpu")

# Load audio model
audio_model = manager.load_model("qwen-audio")

# Transcribe audio
transcription = audio_model.generate_with_audio(
    "path/to/audio.wav",
    "Transcribe the following audio accurately."
)
print(transcription)

# Clean up
manager.unload_model("qwen-audio")

Complete Pipeline Example

Here's an example of a complete pipeline that extracts text from an image, converts it to speech, and then transcribes it back:

import os
from multi_ai import ModelManager
import torch
import gc

def split_into_sentences(text):
    """Split text into sentences, handling common abbreviations."""
    text = re.sub(r'([A-Z])\. ', r'\1\.  ', text)
    text = re.sub(r'(Mr|Mrs|Dr|Ms|Prof|vs|etc|e\.g|i\.e)\. ', r'\1\.  ', text)
    sentences = re.split(r'(?<=[.!?])\s+', text)
    return [s.strip() for s in sentences if s.strip()]

def process_pipeline(image_path):
    # Initialize manager
    device = "cuda" if torch.cuda.is_available() else "cpu"
    manager = ModelManager(device=device)
    
    try:
        # Step 1: Extract text from image
        print("Extracting text from image...")
        vision_model = manager.load_model("qwen-vl")
        extracted_text = vision_model.generate_with_image(
            image_path,
            "Extract and output all text from this image, preserving the original formatting."
        )
        manager.unload_model("qwen-vl")
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            gc.collect()
        
        # Step 2: Generate speech
        print("Generating speech...")
        tts_model = manager.load_model("zonos-tts")
        sentences = split_into_sentences(extracted_text)
        
        os.makedirs("output_audio", exist_ok=True)
        audio_files = []
        for i, sentence in enumerate(sentences):
            output_file = f"output_audio/sentence_{i}.wav"
            tts_model.generate_speech(sentence, output_file)
            audio_files.append(output_file)
        manager.unload_model("zonos-tts")
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            gc.collect()
        
        # Step 3: Transcribe audio
        print("Transcribing audio...")
        manager.device = "cpu"  # Use CPU for audio model
        audio_model = manager.load_model("qwen-audio")
        
        transcribed_text = ""
        for audio_file in audio_files:
            transcription = audio_model.generate_with_audio(
                audio_file,
                "Transcribe the following audio accurately."
            )
            transcribed_text += transcription + " "
        manager.unload_model("qwen-audio")
        
        # Clean up audio files
        for audio_file in audio_files:
            os.remove(audio_file)
        os.rmdir("output_audio")
        
        return extracted_text, transcribed_text
        
    finally:
        manager.clear_all_models()

# Run the pipeline
extracted, transcribed = process_pipeline("path/to/image.png")
print("\nExtracted text:", extracted)
print("\nTranscribed text:", transcribed)

Memory Management

The module includes efficient memory management features:

from multi_ai import ModelManager
import torch
import gc

# Initialize manager
manager = ModelManager(device="cuda")

try:
    # Load and use models
    model1 = manager.load_model("qwen-text")
    # ... use model1 ...
    manager.unload_model("qwen-text")
    
    # Clear CUDA cache between models
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        gc.collect()
    
    model2 = manager.load_model("qwen-vl")
    # ... use model2 ...
    manager.unload_model("qwen-vl")
    
finally:
    # Clean up all models
    manager.clear_all_models()

Model Configuration

The module supports various model configurations:

Qwen-VL: Vision-language model for image analysis
Qwen-Text: Text generation model
Zonos-TTS: Text-to-speech model
Qwen-Audio: Audio transcription model

Each model can be configured with specific parameters:

# Example: Configure model with specific parameters
model = manager.load_model(
    "qwen-text",
    max_tokens=1000,
    temperature=0.7,
    top_p=0.9
)

Error Handling

The module includes comprehensive error handling:

from multi_ai import ModelManager, ModelError

try:
    manager = ModelManager(device="cuda")
    model = manager.load_model("qwen-text")
    response = model.generate("Hello")
except ModelError as e:
    print(f"Error loading or using model: {e}")
finally:
    manager.clear_all_models()

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.6

Apr 13, 2025

0.1.5

Apr 13, 2025

0.1.3

Apr 13, 2025

0.1.2

Apr 13, 2025

This version

0.1.1

Apr 13, 2025

0.1.0

Apr 13, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimodel-ai-0.1.1.tar.gz (12.4 kB view details)

Uploaded Apr 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

multimodel_ai-0.1.1-py2.py3-none-any.whl (11.6 kB view details)

Uploaded Apr 13, 2025 Python 2Python 3

File details

Details for the file multimodel-ai-0.1.1.tar.gz.

File metadata

Download URL: multimodel-ai-0.1.1.tar.gz
Upload date: Apr 13, 2025
Size: 12.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for multimodel-ai-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`8715c8a1ec5de4247430c031952d4ca725affe180e24cf5f59e7a6d59eddc303`
MD5	`d3476d54937ff78eae9d8036858a153f`
BLAKE2b-256	`93b62db8467e7ff257820dbd7d68f00bacb3076cd588d71d76c5b0cee9c16138`

See more details on using hashes here.

File details

Details for the file multimodel_ai-0.1.1-py2.py3-none-any.whl.

File metadata

Download URL: multimodel_ai-0.1.1-py2.py3-none-any.whl
Upload date: Apr 13, 2025
Size: 11.6 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for multimodel_ai-0.1.1-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae93ef8fcb1c9447686d83451f5da328229d39874dd66c83378e8d73076cf96f`
MD5	`3a4c600786cf6643147e54159551fe4e`
BLAKE2b-256	`8ea772b1bdd58f5fda114ce24dd066382b9a4e9367eab3f0ad69ba1acb06c1fc`

See more details on using hashes here.

multimodel-ai 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Multi-AI

Features

Installation

Usage

Command Line Interface

Python API

Basic Usage

Vision Text Extraction

Text-to-Speech Generation

Audio Transcription

Complete Pipeline Example

Memory Management

Model Configuration

Error Handling

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes