Skip to main content

EnergonAI: An Inference System for Large Transformer Models

Project description

A service framework for large-scale model inference, Energon-AI has the following characteristics:

  • Parallelism for Large-scale Models: With tensor parallel operations, pipeline parallel wrapper, distributed checkpoint loading, and customized CUDA kernel, EnergonAI can enable efficient parallel inference for larges-scale models.

  • Pre-built large models: There are pre-built implementation for popular models, such as OPT. It supports the cache technique for the generation task and distributed parameter loading.

  • Engine encapsulation: There has an abstraction layer called engine. It encapsulates the single instance multiple devices (SIMD) execution with the remote procedure call, making it acts as the single instance single device (SISD) execution.

  • An online service system: Based on FastAPI, users can launch a web service of the distributed infernce quickly. The online service makes special optimizations for the generation task. It adopts both left padding and bucket batching techniques for improving the efficiency.

For models trained by Colossal-AI, they can be easily transferred to Energon-AI. For single-device models, they require manual coding works to introduce tensor parallelism and pipeline parallelism.

Installation

pip install energonai

Github Repo

https://github.com/hpcaitech/EnergonAI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

energon-0.0.1.tar.gz (63.2 kB view details)

Uploaded Source

File details

Details for the file energon-0.0.1.tar.gz.

File metadata

  • Download URL: energon-0.0.1.tar.gz
  • Upload date:
  • Size: 63.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8

File hashes

Hashes for energon-0.0.1.tar.gz
Algorithm Hash digest
SHA256 936a81d8f38ef91f69fead7b8441c0080950eded54ff788112a2a36fbd1ff3bc
MD5 f5a5db7d5eed7647e29bd9b76ffbba8b
BLAKE2b-256 3a4a60c424d6511d2362fdb7e62857bc1e955df2a4adae43b6300bdbfd39b850

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page