Fast and easy LLM serving.
Project description
mistral.rs PyO3 Bindings: mistralrs
mistralrs
is a Python package which provides an API for mistral.rs
. We build mistralrs
with the maturin
build manager.
Installation from PyPi
-
Install Rust: https://rustup.rs/
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env
-
mistralrs
depends on theopenssl
library.
To install it on Ubuntu:
sudo apt install libssl-dev
sudo apt install pkg-config
- Install it!
-
CUDA
pip install mistralrs-cuda
-
Metal
pip install mistralrs-metal
-
Apple Accelerate
pip install mistralrs-accelerate
-
Intel MKL
pip install mistralrs-mkl
-
Without accelerators
pip install mistralrs
All installations will install the mistralrs
package. The suffix on the package installed by pip
only controls the feature activation.
Installation from source
-
Install required packages
openssl
(ex.,sudo apt install libssl-dev
)pkg-config
(ex.,sudo apt install pkg-config
)
-
Install Rust: https://rustup.rs/
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env
-
Set HF token correctly (skip if already set or your model is not gated, or if you want to use the
token_source
parameters in Python or the command line.)mkdir ~/.cache/huggingface touch ~/.cache/huggingface/token echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
-
Download the code
git clone https://github.com/EricLBuehler/mistral.rs.git cd mistral.rs
-
cd
into the correct directory for buildingmistralrs
:cd mistralrs-pyo3
-
Install
maturin
, our Rust + Python build system: Maturin requires a Python virtual environment such asvenv
orconda
to be active. Themistralrs
package will be installed into that environment.pip install maturin[patchelf]
-
Install
mistralrs
Installmistralrs
by executing the following in this directory where features such ascuda
orflash-attn
may be specified with the--features
argument just like they would be forcargo run
.The base build command is:
maturin develop -r
- To build for CUDA:
maturin develop -r --features cuda
- To build for CUDA with flash attention:
maturin develop -r --features "cuda flash-attn"
- To build for Metal:
maturin develop -r --features metal
- To build for Accelerate:
maturin develop -r --features accelerate
- To build for MKL:
maturin develop -r --features mkl
Please find API docs here and the type stubs here, which are another great form of documentation.
We also provide a cookbook here!
Example
from mistralrs import ModelKind, MistralLoader, ChatCompletionRequest
kind = ModelKind.QuantizedGGUF
loader = MistralLoader(
model_id="mistralai/Mistral-7B-Instruct-v0.1",
kind=kind,
no_kv_cache=False,
repeat_last_n=64,
quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
runner = loader.load()
res = runner.send_chat_completion_request(
ChatCompletionRequest(
model="mistral",
messages=[
{"role": "user", "content": "Tell me a story about the Rust type system."}
],
max_tokens=256,
frequency_penalty=1.0,
top_p=0.1,
temperature=0.1,
)
)
print(res)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file mistralrs_cuda-0.1.5.tar.gz
.
File metadata
- Download URL: mistralrs_cuda-0.1.5.tar.gz
- Upload date:
- Size: 162.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.8.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 702c57025cd119b7162559e42b849869a2bb9234f9398d65fa75df847443a31b |
|
MD5 | 6f978aebafc616aaf2258d454475fc3f |
|
BLAKE2b-256 | 27a34d78c0ad3deb2b0b15f9c1792301239ef797b05d3db5597ce809d0b581a6 |