Skip to main content

A lightweight AI inference server for running models locally or in Google Colab

Project description

🚀 LocalLab

Build Status LocalLab Version Python Version License

LocalLab is a powerful, lightweight AI inference server designed to deliver cutting-edge language model capabilities on your local machine or through Google Colab. It empowers developers and researchers to run sophisticated AI models on local hardware, optimizing resources with advanced features such as dynamic model loading, memory optimizations, and real-time system monitoring.

What Problem Does LocalLab Solve?

  • Local Inference: Run advanced language models without relying on expensive cloud services.
  • Optimized Performance: Utilize state-of-the-art techniques like quantization, attention slicing, and CPU offloading for maximum efficiency.
  • Seamless Deployment: Easily switch between local deployment and Google Colab, leveraging ngrok for public accessibility.
  • Effective Resource Management: Automatically monitor and manage CPU, RAM, and GPU usage to ensure smooth operation.

System Requirements

Minimum Requirements

Component Local Deployment Google Colab
RAM 4GB Free tier (12GB)
CPU 2 cores 2 cores
Python 3.8+ 3.8+
Storage 2GB free -
GPU Optional Available in free tier

Recommended Requirements

Component Local Deployment Google Colab
RAM 8GB+ Pro tier (24GB)
CPU 4+ cores Pro tier (4 cores)
Python 3.9+ 3.9+
Storage 5GB+ free -
GPU CUDA-compatible Pro tier GPU

Key Features

  • Multiple Model Support: Pre-configured models along with the ability to load custom ones on demand.
  • Advanced Optimizations: Support for FP16, INT8, and INT4 quantization, Flash Attention, and attention slicing.
  • Comprehensive Logging System: Colorized console output with server status tracking, request monitoring, and performance metrics.
  • Robust Resource Monitoring: Real-time insights into system performance and resource usage.
  • Flexible Client Libraries: Comprehensive clients available for both Python and Node.js.
  • Google Colab Friendly: Dedicated workflow for deploying via Google Colab with public URL access.

Unique Visual Overview

Below is a high-level diagram of LocalLab's architecture.

graph TD
    A["User"] --> B["LocalLab Client (Python/Node.js)"]
    B --> C["LocalLab Server"]
    C --> D["Model Manager"]
    D --> E["Hugging Face Models"]
    C --> F["Optimizations"]
    C --> G["Resource Monitoring"]

Google Colab Workflow

sequenceDiagram
    participant U as "User (Colab)"
    participant S as "LocalLab Server"
    participant N as "Ngrok Tunnel"
    U->>S: Run start_server(ngrok=True)
    S->>N: Establish public tunnel
    N->>U: Return public URL
    U->>S: Connect via public URL

Documentation & Usage Guides

For full documentation and detailed guides, please visit our documentation page.

Get Started

  1. Installation:

    pip install locallab
    
  2. Starting the Server Locally:

    from locallab import start_server
    start_server()
    
  3. Starting the Server on Google Colab:

    !pip install locallab
    
    # Set up your ngrok auth token (REQUIRED for public access)
    # Get your free token from: https://dashboard.ngrok.com/get-started/your-authtoken
    import os
    os.environ["NGROK_AUTH_TOKEN"] = "your_token_here"
    
    # Optional: Configure model and optimizations
    os.environ["HUGGINGFACE_MODEL"] = "microsoft/phi-2"  # Choose your preferred model
    os.environ["LOCALLAB_ENABLE_QUANTIZATION"] = "true"  # Enable model optimizations
    
    # Start the server with ngrok for public access
    from locallab import start_server
    start_server(use_ngrok=True)  # Creates a public URL accessible from anywhere
    
  4. Connecting your Client:

    from locallab.client import LocalLabClient
    
    # Use the ngrok URL displayed in the output above
    client = LocalLabClient("https://xxxx-xxx-xxx-xxx.ngrok.io")
    
    # Test the connection
    response = client.generate("Hello, how are you?")
    print(response)
    

Join the Community


LocalLab is designed to bring the power of advanced language models directly to your workspace—efficiently, flexibly, and affordably. Give it a try and revolutionize your AI projects!

Project details


Release history Release notifications | RSS feed

This version

0.2.3

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

locallab-0.2.3.tar.gz (32.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

locallab-0.2.3-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file locallab-0.2.3.tar.gz.

File metadata

  • Download URL: locallab-0.2.3.tar.gz
  • Upload date:
  • Size: 32.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for locallab-0.2.3.tar.gz
Algorithm Hash digest
SHA256 e6ddae9848bca992e814032c38428295552b3a4be591c2ab7b631eb37d5e4287
MD5 b4ab9c4350c969e2d7f70906aa4739fe
BLAKE2b-256 708ca0cb30f811b8e3201bed41a3d7bf2015e8c886d4f7db216219bbca7e4bc2

See more details on using hashes here.

File details

Details for the file locallab-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: locallab-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for locallab-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a9517d6985ed8cbb9b35e8f92d3023171258e2795efd8080210f2ada2466d21c
MD5 6a1dc95b1828a4cef9f378c5a64e3389
BLAKE2b-256 e51ed16f71fdfa219ed45a60eec8a4768018bd0e7399aad093429d40675ac32f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page