Skip to main content

Fast inference engine for Transformer models

Project description

CI PyPI version Documentation Gitter Forum

CTranslate2

CTranslate2 is a C++ and Python library for efficient inference with Transformer models.

The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.

The following model types are currently supported:

  • Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
  • Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon, Qwen2
  • Encoder-only models: BERT, DistilBERT, XLM-RoBERTa

Compatible models should be first converted into an optimized model format. The library includes converters for multiple frameworks:

The project is production-oriented and comes with backward compatibility guarantees, but it also includes experimental features related to model compression and inference acceleration.

Key features

  • Fast and efficient execution on CPU and GPU
    The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc.
  • Quantization and reduced precision
    The model serialization and computation support weights with reduced precision: 16-bit floating points (FP16), 16-bit brain floating points (BF16), 16-bit integers (INT16), 8-bit integers (INT8) and AWQ quantization (INT4).
  • Multiple CPU architectures support
    The project supports x86-64 and AArch64/ARM64 processors and integrates multiple backends that are optimized for these platforms: Intel MKL, oneDNN, OpenBLAS, Ruy, and Apple Accelerate.
  • Automatic CPU detection and code dispatch
    One binary can include multiple backends (e.g. Intel MKL and oneDNN) and instruction set architectures (e.g. AVX, AVX2) that are automatically selected at runtime based on the CPU information.
  • Parallel and asynchronous execution
    Multiple batches can be processed in parallel and asynchronously using multiple GPUs or CPU cores.
  • Dynamic memory usage
    The memory usage changes dynamically depending on the request size while still meeting performance requirements thanks to caching allocators on both CPU and GPU.
  • Lightweight on disk
    Quantization can make the models 4 times smaller on disk with minimal accuracy loss.
  • Simple integration
    The project has few dependencies and exposes simple APIs in Python and C++ to cover most integration needs.
  • Configurable and interactive decoding
    Advanced decoding features allow autocompleting a partial sequence and returning alternatives at a specific location in the sequence.
  • Support tensor parallelism for distributed inference
    Very large model can be split into multiple GPUs. Following this documentation to set up the required environment.

Some of these features are difficult to achieve with standard deep learning frameworks and are the motivation for this project.

Installation and usage

CTranslate2 can be installed with pip:

pip install ctranslate2

The Python module is used to convert models and can translate or generate text with few lines of code:

translator = ctranslate2.Translator(translation_model_path)
translator.translate_batch(tokens)

generator = ctranslate2.Generator(generation_model_path)
generator.generate_batch(start_tokens)

See the documentation for more information and examples.

Benchmarks

We translate the En->De test set newstest2014 with multiple models:

  • OpenNMT-tf WMT14: a base Transformer trained with OpenNMT-tf on the WMT14 dataset (4.5M lines)
  • OpenNMT-py WMT14: a base Transformer trained with OpenNMT-py on the WMT14 dataset (4.5M lines)
  • OPUS-MT: a base Transformer trained with Marian on all OPUS data available on 2020-02-26 (81.9M lines)

The benchmark reports the number of target tokens generated per second (higher is better). The results are aggregated over multiple runs. See the benchmark scripts for more details and reproduce these numbers.

Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings.

CPU

Tokens per second Max. memory BLEU
OpenNMT-tf WMT14 model
OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) 209.2 2653MB 26.93
OpenNMT-py WMT14 model
OpenNMT-py 3.0.4 (with PyTorch 1.13.1) 275.8 2012MB 26.77
- int8 323.3 1359MB 26.72
CTranslate2 3.6.0 658.8 849MB 26.77
- int16 733.0 672MB 26.82
- int8 860.2 529MB 26.78
- int8 + vmap 1126.2 598MB 26.64
OPUS-MT model
Transformers 4.26.1 (with PyTorch 1.13.1) 147.3 2332MB 27.90
Marian 1.11.0 344.5 7605MB 27.93
- int16 330.2 5901MB 27.65
- int8 355.8 4763MB 27.27
CTranslate2 3.6.0 525.0 721MB 27.92
- int16 596.1 660MB 27.53
- int8 696.1 516MB 27.65

Executed with 4 threads on a c5.2xlarge Amazon EC2 instance equipped with an Intel(R) Xeon(R) Platinum 8275CL CPU.

GPU

Tokens per second Max. GPU memory Max. CPU memory BLEU
OpenNMT-tf WMT14 model
OpenNMT-tf 2.31.0 (with TensorFlow 2.11.0) 1483.5 3031MB 3122MB 26.94
OpenNMT-py WMT14 model
OpenNMT-py 3.0.4 (with PyTorch 1.13.1) 1795.2 2973MB 3099MB 26.77
FasterTransformer 5.3 6979.0 2402MB 1131MB 26.77
- float16 8592.5 1360MB 1135MB 26.80
CTranslate2 3.6.0 6634.7 1261MB 953MB 26.77
- int8 8567.2 1005MB 807MB 26.85
- float16 10990.7 941MB 807MB 26.77
- int8 + float16 8725.4 813MB 800MB 26.83
OPUS-MT model
Transformers 4.26.1 (with PyTorch 1.13.1) 1022.9 4097MB 2109MB 27.90
Marian 1.11.0 3241.0 3381MB 2156MB 27.92
- float16 3962.4 3239MB 1976MB 27.94
CTranslate2 3.6.0 5876.4 1197MB 754MB 27.92
- int8 7521.9 1005MB 792MB 27.79
- float16 9296.7 909MB 814MB 27.90
- int8 + float16 8362.7 813MB 766MB 27.90

Executed with CUDA 11 on a g5.xlarge Amazon EC2 instance equipped with a NVIDIA A10G GPU (driver version: 510.47.03).

Additional resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ctranslate2_arty-4.6.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (38.0 MB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.17+ x86-64

ctranslate2_arty-4.6.2-cp314-cp314t-macosx_11_0_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.14tmacOS 11.0+ x86-64

ctranslate2_arty-4.6.2-cp314-cp314t-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.14tmacOS 11.0+ ARM64

ctranslate2_arty-4.6.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (38.0 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

ctranslate2_arty-4.6.2-cp314-cp314-macosx_11_0_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.14macOS 11.0+ x86-64

ctranslate2_arty-4.6.2-cp314-cp314-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

ctranslate2_arty-4.6.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (38.0 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

ctranslate2_arty-4.6.2-cp313-cp313-macosx_11_0_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.13macOS 11.0+ x86-64

ctranslate2_arty-4.6.2-cp313-cp313-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

ctranslate2_arty-4.6.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (38.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

ctranslate2_arty-4.6.2-cp312-cp312-macosx_11_0_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.12macOS 11.0+ x86-64

ctranslate2_arty-4.6.2-cp312-cp312-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

ctranslate2_arty-4.6.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (37.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

ctranslate2_arty-4.6.2-cp311-cp311-macosx_11_0_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.11macOS 11.0+ x86-64

ctranslate2_arty-4.6.2-cp311-cp311-macosx_11_0_arm64.whl (1.3 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

ctranslate2_arty-4.6.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (37.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

ctranslate2_arty-4.6.2-cp310-cp310-macosx_11_0_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.10macOS 11.0+ x86-64

ctranslate2_arty-4.6.2-cp310-cp310-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

ctranslate2_arty-4.6.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (37.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

ctranslate2_arty-4.6.2-cp39-cp39-macosx_11_0_x86_64.whl (11.9 MB view details)

Uploaded CPython 3.9macOS 11.0+ x86-64

ctranslate2_arty-4.6.2-cp39-cp39-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file ctranslate2_arty-4.6.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp314-cp314t-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 ca1c208e3b9f0e1a593f356a8cf4ead8bc80b94de336cdd4c01afed8e2eb1dd1
MD5 4dd986960c19ab5679d8f0d660440554
BLAKE2b-256 22783b9362d64333e734c190b34f6a103e9b386e620421b2cd07056a3733316b

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp314-cp314t-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp314-cp314t-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 ebee39f4afd35c4507de722f80980152e63786b61e3ac9a65d387d7bfc371577
MD5 678531ee46c44a258f32e64f0795c841
BLAKE2b-256 a32166fffffb6d1e1a23139698525ae39edf6b073d22122395738022d647dd48

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp314-cp314t-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp314-cp314t-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f0a34f614a87ebb7fd78f9d25da1a5b68f047daa699967c36976717b694c59c9
MD5 e04014c3c2eabeb0c852e5d8b659fbe3
BLAKE2b-256 ee94ce08d9eb88877ff946cb5607bdba5a24118d4ce7640f9d6757e4ad9a6f78

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp314-cp314-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 2c9664357f0f4c0b66051fe0fc0eef5379445f689363897edc03f81fe9df131c
MD5 502d66e781028cca12adb1e6b6e2d35d
BLAKE2b-256 27df627d0a23460ea5c4f04667e8a251c9c3e1678b4301c30576d1576a2dedeb

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp314-cp314-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp314-cp314-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 7d62e1a0e4546aa3432601d2e074c26c7657cd5ff18a41cb0f493801fc4f5843
MD5 a2cccbe9a6f8fd25bea1a785e5975e9b
BLAKE2b-256 01aeb957f4224cc9bafdba9eaeb058532b79ddbb6e14340bc357f7630a09b7ef

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1770390e8f4c471df7a949bc5bce26dd748bf28fb410efd4265fab1ab0878019
MD5 98528ba1ca3287520216f2bdd00ad83f
BLAKE2b-256 b2ae7b97a329c0625fa318371119407e203effa02c41e84f917ce38d7ec43028

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp313-cp313-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 4ac9d799f029475957028c05ac7451db24e652e8e2f8444c0ae8b92b7be68fb7
MD5 c9ee41ec092c79d7c1312daaaccb58e9
BLAKE2b-256 377bcfd7b1afc74202bb17c05b44f97b22027a2b67fc038687e5b5492fe292cc

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp313-cp313-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp313-cp313-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 009ec8fe3e96c07df25fc0b47e21b3b90f297cf02ce9dc6e20ad2dd344188c2f
MD5 13123a0466b95043e3becea7ed5568bb
BLAKE2b-256 b4ebc37a56408db911b71e69891db116ae38c4d4fa8bcefad2215c5a4bc39acf

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a38c823c9c8fe967fb24d9d0d6845f0f11bda7974340c301a1d218858b0cf2d4
MD5 1f07832eb72430c1b174a3502370705a
BLAKE2b-256 3663722c74122a45acac7eb3081657b5e468b74cc08603681716c86f9d727d5d

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 13883fa7ce53edc68a46e537dbd1d6495a710d6af30369c526c135660a476c21
MD5 330ba2a5ac3ceb181605fa2bdb94a169
BLAKE2b-256 47c77cf979581a070b37d201ebd2b29f08556146d380a019ddb160bae88fe998

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp312-cp312-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp312-cp312-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 30c188e4c1477b190c97db2e2c01d13b0987814c5e4670cbceb19dea8cb12335
MD5 d7654acff6f66f7661abe6e9d744d43a
BLAKE2b-256 a2461aa5960ca3af389b6b9cef22dafcd8ed88bda4ef8c8424a1645739bcef4f

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 46ebd4ee6c3549881c27b3194396ae23abccb07c89bef7f25b64b542f5f1d6d6
MD5 b80f8ac450000eb5b33b480535e06d50
BLAKE2b-256 f96184692ae56fb5fe3d5e3f56c1fc6ba7c444cc2557737f6bc471129404b72e

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 0b30efc0ba8c2ff71cc050ab42d67a6247d25ca4569b49b682f319dbd4adacd8
MD5 29df365d8621cd52e3a3c35436bbbfdc
BLAKE2b-256 f062c5931dc774147bc57e399a7596c25198ae64c4c0a348834dc97e93177c15

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp311-cp311-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp311-cp311-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 78bc22b68858c0bf588eb597663ff18df5226250b1b5cbb7dfad14d2e4e71608
MD5 bce1ca5428b7c61983e27c068ae31828
BLAKE2b-256 002b477423afb1fcb9c254f469597ccd7b69c32647f91351995c4fa08d34af3b

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d4d72e3c0ddb052fddfa0764c6caf22da293f57401e705c779be13f920def0db
MD5 7e89dcd5526a4a3410a36f14a7b2dda0
BLAKE2b-256 37eb37c63d82d8cbba9c22b80eaa09335e29f2fa97762570e57aca89bd6da041

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 2abeecc721e31d98558c7c4cc44d9c6442a3dd3b51ff29554e472160e1c5d268
MD5 e7164d911df7250d1835d84690bc4f89
BLAKE2b-256 9ab6d9d669149d28c0f6343ee2b337d09a422dc21495734daec08945acec6ba0

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp310-cp310-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp310-cp310-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 b616965ae7523d10e9bd999e1958aa55b8c3c327131761e1279c3c35481a0efd
MD5 cbc0c234b22c079cde92aa75eabbfbec
BLAKE2b-256 f33e819cd20082b683416b05942cd330b3a146de70f5a78af67b4c63ec801bee

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5315b01eb417cef267b5579cb831eced24dee02aefdccd95d8d4b12b7463b756
MD5 5a130fa23dc311a358668ac443ae9312
BLAKE2b-256 2daa8339554a4e035ba37319e26448d36c5b9d7a03aad3ceaf20c8838f955fa4

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 96958c46993a2a892d3dd6f7f9a9f621e8b2b146879048949e45f5c9baf0c8cf
MD5 4802fa43706b62b923acf42bfc1d77f8
BLAKE2b-256 aba557068ea3979bc3ce1f6abc176fb0965761b27916d181e1a9ba1e7726c8e4

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp39-cp39-macosx_11_0_x86_64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp39-cp39-macosx_11_0_x86_64.whl
Algorithm Hash digest
SHA256 1257595611ce752bd27900bc3660b06ecf71fecfd6de28bc30fde1c0a176e3b1
MD5 304481d9947c2f4135d10b7b61b9906e
BLAKE2b-256 0c0a1ea111b93e8b096c30e3de91529de1a15b9728da77cc72ac85a822331f24

See more details on using hashes here.

File details

Details for the file ctranslate2_arty-4.6.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for ctranslate2_arty-4.6.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a0cc2d3b384909a79f091d7b395145be35c2badc08fecc71f2284ddd364e0a2b
MD5 b3f64f89780450c86b56d557daef5839
BLAKE2b-256 48a147d193bcd5822699065de2e66200fe946ca5af2d72a16d388964d8414f37

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page