Skip to main content

Fast and easy LLM serving.

Project description

mistral.rs PyO3 Bindings: mistralrs

mistralrs is a Python package which provides an API for mistral.rs. We build mistralrs with the maturin build manager.

Installation from PyPi

  1. Install Rust: https://rustup.rs/

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  2. mistralrs depends on the openssl library.

To install it on Ubuntu:

sudo apt install libssl-dev
sudo apt install pkg-config
  1. Install it!
  • CUDA

    pip install mistralrs-cuda

  • Metal

    pip install mistralrs-metal

  • Apple Accelerate

    pip install mistralrs-accelerate

  • Intel MKL

    pip install mistralrs-mkl

  • Without accelerators

    pip install mistralrs

All installations will install the mistralrs package. The suffix on the package installed by pip only controls the feature activation.

Installation from source

  1. Install required packages

    • openssl (ex., sudo apt install libssl-dev)
    • pkg-config (ex., sudo apt install pkg-config)
  2. Install Rust: https://rustup.rs/

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  3. Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

    mkdir ~/.cache/huggingface
    touch ~/.cache/huggingface/token
    echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
    
  4. Download the code

    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs
    
  5. cd into the correct directory for building mistralrs: cd mistralrs-pyo3

  6. Install maturin, our Rust + Python build system: Maturin requires a Python virtual environment such as venv or conda to be active. The mistralrs package will be installed into that environment.

    pip install maturin[patchelf]
    
  7. Install mistralrs Install mistralrs by executing the following in this directory where features such as cuda or flash-attn may be specified with the --features argument just like they would be for cargo run.

    The base build command is:

    maturin develop -r
    
    • To build for CUDA:
    maturin develop -r --features cuda
    
    • To build for CUDA with flash attention:
    maturin develop -r --features "cuda flash-attn"
    
    • To build for Metal:
    maturin develop -r --features metal
    
    • To build for Accelerate:
    maturin develop -r --features accelerate
    
    • To build for MKL:
    maturin develop -r --features mkl
    

Please find API docs here and the type stubs here, which are another great form of documentation.

We also provide a cookbook here!

Example

from mistralrs import ModelKind, MistralLoader, ChatCompletionRequest

kind = ModelKind.QuantizedGGUF
loader = MistralLoader(
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    kind=kind,
    no_kv_cache=False,
    repeat_last_n=64,
    quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
runner = loader.load()
res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        frequency_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralrs_cuda-0.1.11.tar.gz (171.1 kB view details)

Uploaded Source

File details

Details for the file mistralrs_cuda-0.1.11.tar.gz.

File metadata

  • Download URL: mistralrs_cuda-0.1.11.tar.gz
  • Upload date:
  • Size: 171.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.18

File hashes

Hashes for mistralrs_cuda-0.1.11.tar.gz
Algorithm Hash digest
SHA256 5867d105546af43d9077d3e73513f4a913729636d129f58a569d685bac3f90ba
MD5 1641188bdfe7e9f0e1b2c8ae16f38910
BLAKE2b-256 1b790688aec12f89b050cbd6ac127816a34de1f0787cd6fc3951be8edb6a81e4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page