Skip to main content

A Python module for efficient multi-model AI inference with memory management

Project description

Multi-AI

A powerful Python module for managing and utilizing multiple AI models for various tasks including text generation, image analysis, speech synthesis, and audio transcription.

Features

  • Vision Models: Extract text from images using Qwen-VL
  • Text Models: Generate text using Qwen-Text
  • Speech Models: Convert text to speech using Zonos TTS
  • Audio Models: Transcribe audio using Qwen-Audio
  • Memory Management: Efficient handling of model loading and GPU memory
  • CLI Interface: Easy-to-use command-line interface for all operations

Installation

# Clone the repository
git clone https://github.com/yourusername/multi-ai.git
cd multi-ai

# Install the package
pip install -e .

Usage

Command Line Interface

The module provides a CLI for easy access to all features:

# List available models
multi-ai list

# Generate text
multi-ai generate qwen-text "Write a poem about nature"

# Extract text from image
multi-ai generate qwen-vl "Extract all text from this image" --image path/to/image.png

# Convert text to speech
multi-ai tts "Hello, world!" --output output.wav

# Transcribe audio
multi-ai transcribe path/to/audio.wav

Python API

Basic Usage

from multi_ai import ModelManager

# Initialize the model manager
manager = ModelManager(device="cuda")  # or "cpu"

# Load and use a model
model = manager.load_model("qwen-text")
response = model.generate("Write a poem about nature")
print(response)

# Unload the model when done
manager.unload_model("qwen-text")

Vision Text Extraction

from multi_ai import ModelManager

# Initialize manager
manager = ModelManager(device="cuda")

# Load vision model
vision_model = manager.load_model("qwen-vl")

# Extract text from image
extracted_text = vision_model.generate_with_image(
    "path/to/image.png",
    "Extract and output all text from this image, preserving the original formatting."
)
print(extracted_text)

# Clean up
manager.unload_model("qwen-vl")

Text-to-Speech Generation

from multi_ai import ModelManager

# Initialize manager
manager = ModelManager(device="cuda")

# Load TTS model
tts_model = manager.load_model("zonos-tts")

# Generate speech
tts_model.generate_speech(
    "Hello, this is a test of the text-to-speech system.",
    "output.wav"
)

# Clean up
manager.unload_model("zonos-tts")

Audio Transcription

from multi_ai import ModelManager

# Initialize manager (use CPU for audio model)
manager = ModelManager(device="cpu")

# Load audio model
audio_model = manager.load_model("qwen-audio")

# Transcribe audio
transcription = audio_model.generate_with_audio(
    "path/to/audio.wav",
    "Transcribe the following audio accurately."
)
print(transcription)

# Clean up
manager.unload_model("qwen-audio")

Complete Pipeline Example

Here's an example of a complete pipeline that extracts text from an image, converts it to speech, and then transcribes it back:

import os
from multi_ai import ModelManager
import torch
import gc

def split_into_sentences(text):
    """Split text into sentences, handling common abbreviations."""
    text = re.sub(r'([A-Z])\. ', r'\1\.  ', text)
    text = re.sub(r'(Mr|Mrs|Dr|Ms|Prof|vs|etc|e\.g|i\.e)\. ', r'\1\.  ', text)
    sentences = re.split(r'(?<=[.!?])\s+', text)
    return [s.strip() for s in sentences if s.strip()]

def process_pipeline(image_path):
    # Initialize manager
    device = "cuda" if torch.cuda.is_available() else "cpu"
    manager = ModelManager(device=device)
    
    try:
        # Step 1: Extract text from image
        print("Extracting text from image...")
        vision_model = manager.load_model("qwen-vl")
        extracted_text = vision_model.generate_with_image(
            image_path,
            "Extract and output all text from this image, preserving the original formatting."
        )
        manager.unload_model("qwen-vl")
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            gc.collect()
        
        # Step 2: Generate speech
        print("Generating speech...")
        tts_model = manager.load_model("zonos-tts")
        sentences = split_into_sentences(extracted_text)
        
        os.makedirs("output_audio", exist_ok=True)
        audio_files = []
        for i, sentence in enumerate(sentences):
            output_file = f"output_audio/sentence_{i}.wav"
            tts_model.generate_speech(sentence, output_file)
            audio_files.append(output_file)
        manager.unload_model("zonos-tts")
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            gc.collect()
        
        # Step 3: Transcribe audio
        print("Transcribing audio...")
        manager.device = "cpu"  # Use CPU for audio model
        audio_model = manager.load_model("qwen-audio")
        
        transcribed_text = ""
        for audio_file in audio_files:
            transcription = audio_model.generate_with_audio(
                audio_file,
                "Transcribe the following audio accurately."
            )
            transcribed_text += transcription + " "
        manager.unload_model("qwen-audio")
        
        # Clean up audio files
        for audio_file in audio_files:
            os.remove(audio_file)
        os.rmdir("output_audio")
        
        return extracted_text, transcribed_text
        
    finally:
        manager.clear_all_models()

# Run the pipeline
extracted, transcribed = process_pipeline("path/to/image.png")
print("\nExtracted text:", extracted)
print("\nTranscribed text:", transcribed)

Memory Management

The module includes efficient memory management features:

from multi_ai import ModelManager
import torch
import gc

# Initialize manager
manager = ModelManager(device="cuda")

try:
    # Load and use models
    model1 = manager.load_model("qwen-text")
    # ... use model1 ...
    manager.unload_model("qwen-text")
    
    # Clear CUDA cache between models
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
        gc.collect()
    
    model2 = manager.load_model("qwen-vl")
    # ... use model2 ...
    manager.unload_model("qwen-vl")
    
finally:
    # Clean up all models
    manager.clear_all_models()

Model Configuration

The module supports various model configurations:

  • Qwen-VL: Vision-language model for image analysis
  • Qwen-Text: Text generation model
  • Zonos-TTS: Text-to-speech model
  • Qwen-Audio: Audio transcription model

Each model can be configured with specific parameters:

# Example: Configure model with specific parameters
model = manager.load_model(
    "qwen-text",
    max_tokens=1000,
    temperature=0.7,
    top_p=0.9
)

Error Handling

The module includes comprehensive error handling:

from multi_ai import ModelManager, ModelError

try:
    manager = ModelManager(device="cuda")
    model = manager.load_model("qwen-text")
    response = model.generate("Hello")
except ModelError as e:
    print(f"Error loading or using model: {e}")
finally:
    manager.clear_all_models()

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multimodel-ai-0.1.1.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multimodel_ai-0.1.1-py2.py3-none-any.whl (11.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file multimodel-ai-0.1.1.tar.gz.

File metadata

  • Download URL: multimodel-ai-0.1.1.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for multimodel-ai-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8715c8a1ec5de4247430c031952d4ca725affe180e24cf5f59e7a6d59eddc303
MD5 d3476d54937ff78eae9d8036858a153f
BLAKE2b-256 93b62db8467e7ff257820dbd7d68f00bacb3076cd588d71d76c5b0cee9c16138

See more details on using hashes here.

File details

Details for the file multimodel_ai-0.1.1-py2.py3-none-any.whl.

File metadata

  • Download URL: multimodel_ai-0.1.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.2

File hashes

Hashes for multimodel_ai-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 ae93ef8fcb1c9447686d83451f5da328229d39874dd66c83378e8d73076cf96f
MD5 3a4c600786cf6643147e54159551fe4e
BLAKE2b-256 8ea772b1bdd58f5fda114ce24dd066382b9a4e9367eab3f0ad69ba1acb06c1fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page