Automatically shard your large model between multiple GPUs, works without torch.distributed
Project description
tensor_parallel
Run your PyTorch model on multiple GPUs from basic python
import torch
from transformers import T5Tokenizer, T5ForConditionalGeneration
from tensor_parallel import tensor_parallel # <- interface for automatic optimal backend selection
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")
model = tensor_parallel(model, ["cuda:0", "cuda:1"]) # <- magic happens here
# only half of the model is placed on each GPU reducing memory footprint twofold
inputs = tokenizer("Translate from German to English: How are you?", return_tensors="pt")["input_ids"].to("cuda:0")
outputs = model.generate(inputs, num_beams=5)
print(tokenizer.decode(outputs[0])) # Wie sind Sie?
Installation
The recomended way to install this package is to use pip:
pip install tensor_parallel
Code style
We use black and isort for all pull requests.
Before committing your code, simply run black . && isort .
and you will be fine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tensor_parallel-1.0.19.tar.gz
(16.7 kB
view hashes)
Built Distribution
Close
Hashes for tensor_parallel-1.0.19-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f0055f46f3d2787fae2cbf6300e770afa04b6c42b5e47c2ab131dd05119e2ba |
|
MD5 | 15f43abfe83cc49371cf133df7eb77f0 |
|
BLAKE2b-256 | 0c3dc56f9db66aafcba0e0f6f67f3cc192a28f0af0adc6feb09a3e9668228ea7 |