Skip to main content

Fast inference engine for Transformer models

Project description

CI PyPI version Documentation Gitter Forum

CTranslate2

CTranslate2 is a C++ and Python library for efficient inference with Transformer models.

The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.

The following model types are currently supported:

  • Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
  • Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon
  • Encoder-only models: BERT, DistilBERT, XLM-RoBERTa

Compatible models should be first converted into an optimized model format. The library includes converters for multiple frameworks:

The project is production-oriented and comes with backward compatibility guarantees, but it also includes experimental features related to model compression and inference acceleration.

Key features

  • Fast and efficient execution on CPU and GPU
    The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc.
  • Quantization and reduced precision
    The model serialization and computation support weights with reduced precision: 16-bit floating points (FP16), 16-bit brain floating points (BF16), 16-bit integers (INT16), 8-bit integers (INT8) and AWQ quantization (INT4).
  • Multiple CPU architectures support
    The project supports x86-64 and AArch64/ARM64 processors and integrates multiple backends that are optimized for these platforms: Intel MKL, oneDNN, OpenBLAS, Ruy, and Apple Accelerate.
  • Automatic CPU detection and code dispatch
    One binary can include multiple backends (e.g. Intel MKL and oneDNN) and instruction set architectures (e.g. AVX, AVX2) that are automatically selected at runtime based on the CPU information.
  • Parallel and asynchronous execution
    Multiple batches can be processed in parallel and asynchronously using multiple GPUs or CPU cores.
  • Dynamic memory usage
    The memory usage changes dynamically depending on the request size while still meeting performance requirements thanks to caching allocators on both CPU and GPU.
  • Lightweight on disk
    Quantization can make the models 4 times smaller on disk with minimal accuracy loss.
  • Simple integration
    The project has few dependencies and exposes simple APIs in Python and C++ to cover most integration needs.
  • Configurable and interactive decoding
    Advanced decoding features allow autocompleting a partial sequence and returning alternatives at a specific location in the sequence.
  • Support tensor parallelism for distributed inference
    Very large model can be split into multiple GPUs. Following this documentation to set up the required environment.

Some of these features are difficult to achieve with standard deep learning frameworks and are the motivation for this project.

Installation and usage

CTranslate2 can be installed with pip:

pip install ctranslate2

The Python module is used to convert models and can translate or generate text with few lines of code:

translator = ctranslate2.Translator(translation_model_path)
translator.translate_batch(tokens)

generator = ctranslate2.Generator(generation_model_path)
generator.generate_batch(start_tokens)

See the documentation for more information and examples.

Benchmarks

We translate the En->De test set newstest2014 with multiple models:

  • OpenNMT-tf WMT14: a base Transformer trained with OpenNMT-tf on the WMT14 dataset (4.5M lines)
  • OpenNMT-py WMT14: a base Transformer trained with OpenNMT-py on the WMT14 dataset (4.5M lines)
  • OPUS-MT: a base Transformer trained with Marian on all OPUS data available on 2020-02-26 (81.9M lines)

The benchmark reports the number of target tokens generated per second (higher is better). The results are aggregated over multiple runs. See the benchmark scripts for more details and reproduce these numbers.

Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings.

CPU

Tokens per second Max. memory BLEU
OpenNMT-tf WMT14 model
OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) 209.2 2653MB 26.93
OpenNMT-py WMT14 model
OpenNMT-py 3.0.4 (with PyTorch 1.13.1) 275.8 2012MB 26.77
- int8 323.3 1359MB 26.72
CTranslate2 3.6.0 658.8 849MB 26.77
- int16 733.0 672MB 26.82
- int8 860.2 529MB 26.78
- int8 + vmap 1126.2 598MB 26.64
OPUS-MT model
Transformers 4.26.1 (with PyTorch 1.13.1) 147.3 2332MB 27.90
Marian 1.11.0 344.5 7605MB 27.93
- int16 330.2 5901MB 27.65
- int8 355.8 4763MB 27.27
CTranslate2 3.6.0 525.0 721MB 27.92
- int16 596.1 660MB 27.53
- int8 696.1 516MB 27.65

Executed with 4 threads on a c5.2xlarge Amazon EC2 instance equipped with an Intel(R) Xeon(R) Platinum 8275CL CPU.

GPU

Tokens per second Max. GPU memory Max. CPU memory BLEU
OpenNMT-tf WMT14 model
OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) 1483.5 3031MB 3122MB 26.94
OpenNMT-py WMT14 model
OpenNMT-py 3.0.4 (with PyTorch 1.13.1) 1795.2 2973MB 3099MB 26.77
FasterTransformer 5.3 6979.0 2402MB 1131MB 26.77
- float16 8592.5 1360MB 1135MB 26.80
CTranslate2 3.6.0 6634.7 1261MB 953MB 26.77
- int8 8567.2 1005MB 807MB 26.85
- float16 10990.7 941MB 807MB 26.77
- int8 + float16 8725.4 813MB 800MB 26.83
OPUS-MT model
Transformers 4.26.1 (with PyTorch 1.13.1) 1022.9 4097MB 2109MB 27.90
Marian 1.11.0 3241.0 3381MB 2156MB 27.92
- float16 3962.4 3239MB 1976MB 27.94
CTranslate2 3.6.0 5876.4 1197MB 754MB 27.92
- int8 7521.9 1005MB 792MB 27.79
- float16 9296.7 909MB 814MB 27.90
- int8 + float16 8362.7 813MB 766MB 27.90

Executed with CUDA 11 on a g5.xlarge Amazon EC2 instance equipped with a NVIDIA A10G GPU (driver version: 510.47.03).

Additional resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

ctranslate2-4.5.0-cp312-cp312-win_amd64.whl (19.5 MB view details)

Uploaded CPython 3.12 Windows x86-64

ctranslate2-4.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

ctranslate2-4.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.5 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

ctranslate2-4.5.0-cp312-cp312-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

ctranslate2-4.5.0-cp311-cp311-win_amd64.whl (19.5 MB view details)

Uploaded CPython 3.11 Windows x86-64

ctranslate2-4.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.6 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

ctranslate2-4.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

ctranslate2-4.5.0-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

ctranslate2-4.5.0-cp310-cp310-win_amd64.whl (19.5 MB view details)

Uploaded CPython 3.10 Windows x86-64

ctranslate2-4.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

ctranslate2-4.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.2 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

ctranslate2-4.5.0-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

ctranslate2-4.5.0-cp39-cp39-win_amd64.whl (19.5 MB view details)

Uploaded CPython 3.9 Windows x86-64

ctranslate2-4.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

ctranslate2-4.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

ctranslate2-4.5.0-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

ctranslate2-4.5.0-cp38-cp38-win_amd64.whl (19.5 MB view details)

Uploaded CPython 3.8 Windows x86-64

ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (38.4 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (17.2 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

ctranslate2-4.5.0-cp38-cp38-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.8 macOS 11.0+ ARM64

File details

Details for the file ctranslate2-4.5.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 a16a784ec7924166bdf3e86754feda0441f04d9851fc3412f34f1e2de7cbd51b
MD5 959e1b320bb4ebdf1fce11f9a7bd27ac
BLAKE2b-256 6697e50a97b0025baac851ce68928ee51ceadc9f0f9e0b9b543dd32da56d5571

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 de3c5877fce31a0fcf3b5edbc8d4e6e22fd94a86c6b49680740ef41130efffc1
MD5 833e50acd572e3de0e4e32f92e984ebe
BLAKE2b-256 e2f03be15ad93c44cf60cd014f8e6f9ee604fc992b671451e480fae40f79ef87

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c158f2ada6e3347388ad13c69e4a6a729ba40c035a400dd447995950ecf5e62f
MD5 9255ab47779391cba2ecf12f525cefb1
BLAKE2b-256 cc463615f9bdb9bc18f05b4371bb974befc380b73f6ba415e813e9d7ac0c2fb5

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1a0509f172edc994aec6870fe0a90c799d85fd7ddf564059d25b60932ab2e2c4
MD5 e16d8e4650d81fdadbd4a9c2a8de7857
BLAKE2b-256 3054d65d3ae24ffd82581e4b0823960d81cfe753dd8f118cf9ef2106632e1909

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 253993fbbe20cd7e2602de81e6159b259dadb47b9b59486d928396bd4a4ecdaa
MD5 db464fab2ad1aa42ab349b2dbe418dfc
BLAKE2b-256 573e75b99791ab4a89bf79236f30ec1e42a73e51aeaf29c88edd4800cc4f9e3c

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 89db5b18dfc7f7bf84cafaf7cc36e885aafcaeac936977eefd3e4768fd7b2879
MD5 b52bff8a771412c8df5f37ccfa43e582
BLAKE2b-256 8190014e110c5c0877f65d65a5cd05d448f589cf9efef426f5709f5e931fc812

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4c56ccf1aa723ba85f4ea56b4d945dc7d2ea7f074b5eb716c85be0c8e0311c24
MD5 fa0571bb77f7ac6bc3e924cbdaf99211
BLAKE2b-256 636b3ae6dc7ac3126fdbeab5ef1b93dd752869dbc1e129c051d64ee9390531c7

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1bc072da977abdd4b09f0d50a45de745818a247608aa3f2865ef9a579ff11851
MD5 89378aa11f08264786ea6256d6723d7a
BLAKE2b-256 7e4f3b409614fe15c517d3db03c436efdaead805c7a8740b23df3cad9e6a126d

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 5d9ec0a201d3c33ada1bb00929b3ff3d80642b34ca0d94465556dfa197d127c4
MD5 616036b34b41a51d065217cd46bbeb5e
BLAKE2b-256 e903b4235aa4951330510c431b084d9b71d3bafe9bf0849fbcac397c8e863fc0

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b97ee9b15f75f84c35827df97ebe9c676f96c2e5118a2ed4d3efcf3c3e04a599
MD5 da168e83cdcebc05d2240e2b98feb01d
BLAKE2b-256 bcb53c3c4c91149d50d8a4f9c40390d5914f70078996c8840c2358f6a4f56bd6

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5328ec73b430ba1a99a85bc3b038291e7bbedc0c9987b354b3c8ca395a3b7e06
MD5 3b8286c2f85fc8fd7fb06f82fb18d023
BLAKE2b-256 aebdc8e2da2d56aa1a2f5304165d3c89bacb297ab7b1bbe137e3118f531a837d

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 241da685f8f7cb10b7afceeb3d879f778b56e6a1d55fc2964ddc949c80c9c7bb
MD5 cfa65e734fe0942588e7f8ba63526bf4
BLAKE2b-256 c967ffa1fcda2c8265a710d34b12b6bf6d6ce904d8cad99a88479c0d3561505b

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 ccbccbdddb02e7c3b24666f2bc52cd475ca666fda8a317d23a97645eafd66dbe
MD5 4f317c38baf07bc505a7fc3f8ef14845
BLAKE2b-256 b13081f057f5958cc3a09312f549fa71003f756f4b83fcd70aa88d26beb8d72a

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0af82185aa961869362c06ce33443b5207237790233b1614ccf92307a671aa72
MD5 be75ea31a5e9a29d39005453341fa24f
BLAKE2b-256 bce0416b18d376411d405cf51db6a017ffce92fbb2b6558fc763bb1d6de874b8

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f790e77458b83e109a743d0f07e9e5c023208314f5c824c26d1e3ebc62a12f71
MD5 201b8e7e27ccfc918d6121e84423daa7
BLAKE2b-256 2381999f3009040966512d28de0032d436308cdb4504bc0ec1e67e1b6419c398

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d9e8120817c51515175ab163655dc14b4e21eb381d7196fd43b843b0d50efaf1
MD5 0ba6bb04b3314fa3317a65032bc6d1d4
BLAKE2b-256 768b3ff26ad00b47f02c165486794eb8058d00ac7b2dcdc3bbbd6a0b809fcd52

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 5924e9adeff8b30ca0851e0f5ff13639d08e47d1219d27f615c0936a3cdedb57
MD5 c73042896f70033c323aa99a533f5e90
BLAKE2b-256 66e2537b6e0443150aa4a7604edf84c8126c1be57d96bad8fbffc7e3e5560495

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 45a45dabca3f9d8eb718685a792f9a7fc10af7362d318271181f16ebf54669b8
MD5 4a74611a81d55ae55825e605780b3a4c
BLAKE2b-256 da104fd138a830e9b5ac0b78a69a3b7c002e90e8623b41b68ef12c08d4839549

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 919a5feab74f33694b66c0a5637f07ba7cf4995af87d960aca50e4cbe53b4054
MD5 7b0e67411cd3b10f14110044624371f8
BLAKE2b-256 c8fa6fbe61746f2b80427621cd7e3fa0b2bc4ed1f1290488f318c25bdd6f72f3

See more details on using hashes here.

File details

Details for the file ctranslate2-4.5.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2-4.5.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7c221153ecdda81e24679a07f0b577926879325a0347a89f8afaf2593641cb9b
MD5 e03d3cfb482f6cbc9ef65d822d68c40c
BLAKE2b-256 dde7ef64d334036e3134db6336c3f283276d7803ab2e48dc91a7ddc94546ebc2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page