A toolkit for big model inference
Project description
BMInference
English | 简体中文
BMInference (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).
- Low Resource. Instead of running on large-scale GPU clusters, the package enables the running of the inference process for large-scale pretrained language models on personal computers!
- Open. Model parameters and configurations are all publicly released, you don't need to access a PLM via online APIs, just run it on your computer!
- Green. Run pretrained language models with fewer machines and GPUs, also with less energy consumption.
Demo
Here we provide an online demo based on the package with CPM2.
Install
-
From source:
python setup.py install
-
From docker:
docker build . -f docker/base.Dockerfile
Here we list the minimum and recommended configurations for running BMInference.
Minimum Configuration | Recommended Configuration | |
---|---|---|
Memory | 16GB | 24GB |
GPU | NVIDIA GeForce GTX 1060 6GB | NVIDIA Tesla V100 16GB |
PCI-E | PCI-E 3.0 x16 | PCI-E 3.0 x16 |
Quick Start
Here we provide a esay script for using BMInference.
Firstly, import a model from the model base (e.g. CPM1, CPM2, EVA2).
import bigmodels
cpm2 = bigmodels.models.CPM2()
Then define the text and use the <span>
token to denote the blank to fill in.
text = "北京环球度假区相关负责人介绍,北京环球影城指定单日门票将采用<span>制度,即推出淡季日、平季日、旺季日和特定日门票。<span>价格为418元,<span>价格为528元,<span>价格为638元,<span>价格为<span>元。北京环球度假区将提供90天滚动价格日历,以方便游客提前规划行程。"
Use the generate
function to obtain the results and replace <span>
tokens with the results.
for result in cpm2.generate(text,
top_p=1.0,
top_n=10,
temperature=0.9,
frequency_penalty=0,
presence_penalty=0
):
value = result["text"]
text = text.replace("<span>", "\033[0;32m" + value + "\033[0m", 1)
print(text)
Finally, you can get the predicted text. For more examples, go to the examples
folder.
Performances
Here we report the speeds of CPM2 encoder and decoder we have tested on different platforms. You can also run benchmark/cpm2/encoder.py
and benchmark/cpm2/decoder.py
to test the speed on your machine!
GPU | Encoder Speed (tokens/s) | Decoder Speed (tokens/s) |
---|---|---|
NVIDIA GeForce GTX 1060 | 533 | 1.6 |
NVIDIA GeForce GTX 1080Ti | 1200 | 12 |
Contributing
Links to the user community and contributing guidelines.
License
The package is released under the Apache 2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file bminference-0.0.2.tar.gz
.
File metadata
- Download URL: bminference-0.0.2.tar.gz
- Upload date:
- Size: 29.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4c9e2f43acdcaa01c082211e8d9bccfca05f3667a1cc559c7156f8cae7057b2 |
|
MD5 | 65dad39b1b4cd68af1095b63e37ece2b |
|
BLAKE2b-256 | 6179c79ac962f71a2607615146ee8ba8a197184421661e28f6b39a717a9c5150 |