Skip to main content

sageLLM backend provider abstraction and mock implementation

Project description

sagellm-backend

CI PyPI version Python Version License Code style: ruff

Hardware abstraction layer for sageLLM inference engine, providing unified backend interfaces for CUDA, Ascend, and other accelerators.

Features

  • Unified Hardware Abstraction: Single API for multiple hardware backends
  • Mock Backend: Test without real hardware
  • CUDA Support: Native CUDA backend implementation
  • HuggingFace Integration: Pre-configured engine for HF Transformers
  • Capability Matrix: Hardware capability discovery and validation

Installation

pip install isagellm-backend

Quick Start

git clone git@github.com:intellistream/sagellm-backend.git
cd sagellm-backend
./quickstart.sh

# Run tests
pytest tests/ -v

Usage Examples

Basic Backend Usage

from sagellm_backend import MockBackendProvider, DType

# Create backend
backend = MockBackendProvider(
    supported_dtypes=[DType.FP16, DType.BF16],
    has_collective=True,
)

# Query capabilities
cap = backend.capability()
print(cap.supported_dtypes)

# Allocate KV block
block = backend.kv_block_alloc(128, DType.FP16)

HuggingFace CUDA Engine

from sagellm_backend.engine.hf_cuda import HFCudaEngine, HFCudaEngineConfig

# Standard configuration
config = HFCudaEngineConfig(
    engine_id="hf-001",
    model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
    device="cuda:0",
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=256,
    max_batch_size=8,
)
engine = HFCudaEngine(config)
await engine.start()

# Run inference
response = await engine.execute(request)

For testing without GPU:

# Mock mode
mock_config = HFCudaEngineConfig(
    engine_id="hf-mock",
    model_path="mock-model",
    device="cuda:0",
    dtype="float16",
    device_map="auto",
    load_in_8bit=False,
    load_in_4bit=False,
    trust_remote_code=False,
    max_new_tokens=64,
    max_batch_size=2,
    mock_mode=True,
)
mock_engine = HFCudaEngine(mock_config)
await mock_engine.start()

Extending with New Backends

# Create provider in providers/ directory
class AscendBackendProvider:
    def capability(self) -> CapabilityDescriptor:
        return CapabilityDescriptor(
            supported_dtypes=[DType.FP16, DType.BF16, DType.INT8],
            # ...
        )
    
    # Implement other interface methods...

# Register via entry point in pyproject.toml
[project.entry-points."sagellm.backends"]
ascend_cann = "sagellm_backend.providers.ascend:create_ascend_backend"

Documentation

License

Proprietary

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isagellm_backend-0.1.1.4-cp311-none-any.whl (179.9 kB view details)

Uploaded CPython 3.11

File details

Details for the file isagellm_backend-0.1.1.4-cp311-none-any.whl.

File metadata

File hashes

Hashes for isagellm_backend-0.1.1.4-cp311-none-any.whl
Algorithm Hash digest
SHA256 4966988106b023a913909997d5703932b1cb1ecb1cccdc6353fc07dc8217e254
MD5 fecbc2e418e132af763b0b6dd08e7a5b
BLAKE2b-256 c0809ce3ee4ac1738cf39a93d9ce8192a6c174a18a14e859a04992fdbc986a02

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page