Skip to main content

Fast and easy LLM serving.

Project description

mistral.rs PyO3 Bindings: mistralrs

mistralrs is a Python package which provides an API for mistral.rs. We build mistralrs with the maturin build manager.

Installation from PyPi

  1. Install Rust: https://rustup.rs/

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  2. mistralrs depends on the openssl library.

To install it on Ubuntu:

sudo apt install libssl-dev
sudo apt install pkg-config
  1. Install it!
  • CUDA

    pip install mistralrs-cuda

  • Metal

    pip install mistralrs-metal

  • Apple Accelerate

    pip install mistralrs-accelerate

  • Intel MKL

    pip install mistralrs-mkl

  • Without accelerators

    pip install mistralrs

All installations will install the mistralrs package. The suffix on the package installed by pip only controls the feature activation.

Installation from source

  1. Install required packages

    • openssl (ex., sudo apt install libssl-dev)
    • pkg-config (ex., sudo apt install pkg-config)
  2. Install Rust: https://rustup.rs/

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  3. Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

    mkdir ~/.cache/huggingface
    touch ~/.cache/huggingface/token
    echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
    
  4. Download the code

    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs
    
  5. cd into the correct directory for building mistralrs: cd mistralrs-pyo3

  6. Install maturin, our Rust + Python build system: Maturin requires a Python virtual environment such as venv or conda to be active. The mistralrs package will be installed into that environment.

    pip install maturin[patchelf]
    
  7. Install mistralrs Install mistralrs by executing the following in this directory where features such as cuda or flash-attn may be specified with the --features argument just like they would be for cargo run.

    The base build command is:

    maturin develop -r
    
    • To build for CUDA:
    maturin develop -r --features cuda
    
    • To build for CUDA with flash attention:
    maturin develop -r --features "cuda flash-attn"
    
    • To build for Metal:
    maturin develop -r --features metal
    
    • To build for Accelerate:
    maturin develop -r --features accelerate
    
    • To build for MKL:
    maturin develop -r --features mkl
    

Please find API docs here and the type stubs here, which are another great form of documentation.

We also provide a cookbook here!

Example

from mistralrs import ModelKind, MistralLoader, ChatCompletionRequest

kind = ModelKind.QuantizedGGUF
loader = MistralLoader(
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    kind=kind,
    no_kv_cache=False,
    repeat_last_n=64,
    quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
runner = loader.load()
res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        frequency_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralrs_metal-0.1.5.tar.gz (162.6 kB view details)

Uploaded Source

File details

Details for the file mistralrs_metal-0.1.5.tar.gz.

File metadata

  • Download URL: mistralrs_metal-0.1.5.tar.gz
  • Upload date:
  • Size: 162.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for mistralrs_metal-0.1.5.tar.gz
Algorithm Hash digest
SHA256 ba0da16aec8572c28517c5bce060b16e0ee09ea852116476e28cefeca64c9f90
MD5 07adce583af5340e3a7b3d1ba8ba389e
BLAKE2b-256 c19485e8a69431272f01d3d506c29bf592c96c4754bb3de2bb6bb2a6bdcfaf92

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page