Skip to main content

EnergonAI: An Inference System for Large Transformer Models

Project description

Architecture

Energon-AI

GitHub license

A service framework for large-scale model inference, Energon-AI has the following characteristics:

  • Parallelism for Large-scale Models: With tensor parallel operations, pipeline parallel wrapper, distributed checkpoint loading, and customized CUDA kernel, EnergonAI can enable efficient parallel inference for larges-scale models.
  • Pre-built large models: There are pre-built implementation for popular models, such as OPT. It supports the cache technique for the generation task and distributed parameter loading.
  • Engine encapsulation: There has an abstraction layer called engine. It encapsulates the single instance multiple devices (SIMD) execution with the remote procedure call, making it acts as the single instance single device (SISD) execution.
  • An online service system: Based on FastAPI, users can launch a web service of the distributed infernce quickly. The online service makes special optimizations for the generation task. It adopts both left padding and bucket batching techniques for improving the efficiency.

For models trained by Colossal-AI, they can be easily transferred to Energon-AI. For single-device models, they require manual coding works to introduce tensor parallelism and pipeline parallelism.

Installation

There are three ways to install energonai.

  • Install from pypi
pip install energonai
  • Install from source
$ git clone git@github.com:hpcaitech/EnergonAI.git
$ pip install -r requirements.txt
$ pip install .
  • Use docker
$ docker pull hpcaitech/energon-ai:latest

Build an online OPT service in 5 minutes

  1. Download OPT model: To launch the distributed inference service quickly, you can download the checkpoint of OPT-125M here. You can get details for loading other sizes of models here.

  2. Launch an HTTP service: To launch a service, we need to provide python scripts to describe the model type and related configurations, and start an http service. An OPT example is EnergonAI/examples/opt.
    The entrance of the service is a bash script server.sh. The config of the service is at opt_config.py, which defines the model type, the checkpoint file path, the parallel strategy, and http settings. You can adapt it for your own case. For example, set the model class as opt_125M and set the correct checkpoint path as follows. Set the tensor parallelism degree the same as your gpu number.

        model_class = opt_125M
        checkpoint = 'your_file_path'
        tp_init_size = #gpu
    

    Now, we can launch a service:

        bash server.sh
    

    Then open https://[ip]:[port]/docs in your browser and try out!

Publication

You can find technical details in our blog and manuscript:

Build an online OPT service using Colossal-AI in 5 minutes

EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models

@misc{du2022energonai, 
      title={EnergonAI: An Inference System for 10-100 Billion Parameter Transformer Models}, 
      author={Jiangsu Du and Ziming Liu and Jiarui Fang and Shenggui Li and Yongbin Li and Yutong Lu and Yang You},
      year={2022},
      eprint={2209.02341},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Contributing

If interested in making your own contribution to the project, please refer to Contributing for guidance.

Thanks so much!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

energonai-0.0.2.tar.gz (86.6 kB view details)

Uploaded Source

File details

Details for the file energonai-0.0.2.tar.gz.

File metadata

  • Download URL: energonai-0.0.2.tar.gz
  • Upload date:
  • Size: 86.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.6.8

File hashes

Hashes for energonai-0.0.2.tar.gz
Algorithm Hash digest
SHA256 7ade5176a1c10b23542f12f3411bb5ea6e125db28f713407051b4a2b663a5043
MD5 2e23c576682196370886e8a70f9ba579
BLAKE2b-256 53c0fd38f7a1f9a8841a5c7f20706437e64bb6c80c2ba57a279f3c7d82a135bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page