Skip to main content

A toolkit for big model inference

Project description

BMInference

English | 简体中文

BMInference (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

  • Low Resource. Instead of running on large-scale GPU clusters, the package enables the running of the inference process for large-scale pretrained language models on personal computers!
  • Open. Model parameters and configurations are all publicly released, you don't need to access a PLM via online APIs, just run it on your computer!
  • Green. Run pretrained language models with fewer machines and GPUs, also with less energy consumption.

Demo

Here we provide an online demo based on the package with CPM2.

Install

  • From source: python setup.py install

  • From docker: docker build . -f docker/base.Dockerfile

Here we list the minimum and recommended configurations for running BMInference.

Minimum Configuration Recommended Configuration
Memory 16GB 24GB
GPU NVIDIA GeForce GTX 1060 6GB NVIDIA Tesla V100 16GB
PCI-E PCI-E 3.0 x16 PCI-E 3.0 x16

Quick Start

Here we provide a esay script for using BMInference.

Firstly, import a model from the model base (e.g. CPM1, CPM2, EVA2).

import bigmodels
cpm2 = bigmodels.models.CPM2()

Then define the text and use the <span> token to denote the blank to fill in.

text = "北京环球度假区相关负责人介绍,北京环球影城指定单日门票将采用<span>制度,即推出淡季日、平季日、旺季日和特定日门票。<span>价格为418元,<span>价格为528元,<span>价格为638元,<span>价格为<span>元。北京环球度假区将提供90天滚动价格日历,以方便游客提前规划行程。"

Use the generate function to obtain the results and replace <span> tokens with the results.

for result in cpm2.generate(text, 
    top_p=1.0,
    top_n=10, 
    temperature=0.9,
    frequency_penalty=0,
    presence_penalty=0
):
    value = result["text"]
    text = text.replace("<span>", "\033[0;32m" + value + "\033[0m", 1)
print(text)

Finally, you can get the predicted text. For more examples, go to the examples folder.

Performances

Here we report the speeds of CPM2 encoder and decoder we have tested on different platforms. You can also run benchmark/cpm2/encoder.py and benchmark/cpm2/decoder.py to test the speed on your machine!

GPU Encoder Speed (tokens/s) Decoder Speed (tokens/s)
NVIDIA GeForce GTX 1060 533 1.6
NVIDIA GeForce GTX 1080Ti 1200 12

Contributing

Links to the user community and contributing guidelines.

License

The package is released under the Apache 2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bminference-0.0.2.tar.gz (29.5 kB view details)

Uploaded Source

File details

Details for the file bminference-0.0.2.tar.gz.

File metadata

  • Download URL: bminference-0.0.2.tar.gz
  • Upload date:
  • Size: 29.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10

File hashes

Hashes for bminference-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d4c9e2f43acdcaa01c082211e8d9bccfca05f3667a1cc559c7156f8cae7057b2
MD5 65dad39b1b4cd68af1095b63e37ece2b
BLAKE2b-256 6179c79ac962f71a2607615146ee8ba8a197184421661e28f6b39a717a9c5150

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page