Skip to main content

Fast and easy LLM serving.

Project description

mistral.rs PyO3 Bindings: mistralrs

mistralrs is a Python package which provides an API for mistral.rs. We build mistralrs with the maturin build manager.

Installation from PyPi

  1. Install Rust: https://rustup.rs/

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  2. mistralrs depends on the openssl library.

To install it on Ubuntu:

sudo apt install libssl-dev
sudo apt install pkg-config
  1. Install it!
  • CUDA

    pip install mistralrs-cuda

  • Metal

    pip install mistralrs-metal

  • Apple Accelerate

    pip install mistralrs-accelerate

  • Intel MKL

    pip install mistralrs-mkl

  • Without accelerators

    pip install mistralrs

All installations will install the mistralrs package. The suffix on the package installed by pip only controls the feature activation.

Installation from source

  1. Install required packages

    • openssl (ex., sudo apt install libssl-dev)
    • pkg-config (ex., sudo apt install pkg-config)
  2. Install Rust: https://rustup.rs/

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  3. Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

    mkdir ~/.cache/huggingface
    touch ~/.cache/huggingface/token
    echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
    
  4. Download the code

    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs
    
  5. cd into the correct directory for building mistralrs: cd mistralrs-pyo3

  6. Install maturin, our Rust + Python build system: Maturin requires a Python virtual environment such as venv or conda to be active. The mistralrs package will be installed into that environment.

    pip install maturin[patchelf]
    
  7. Install mistralrs Install mistralrs by executing the following in this directory where features such as cuda or flash-attn may be specified with the --features argument just like they would be for cargo run.

    The base build command is:

    maturin develop -r
    
    • To build for CUDA:
    maturin develop -r --features cuda
    
    • To build for CUDA with flash attention:
    maturin develop -r --features "cuda flash-attn"
    
    • To build for Metal:
    maturin develop -r --features metal
    
    • To build for Accelerate:
    maturin develop -r --features accelerate
    
    • To build for MKL:
    maturin develop -r --features mkl
    

Please find API docs here and the type stubs here, which are another great form of documentation.

We also provide a cookbook here!

Example

from mistralrs import ModelKind, MistralLoader, ChatCompletionRequest

kind = ModelKind.QuantizedGGUF
loader = MistralLoader(
    model_id="mistralai/Mistral-7B-Instruct-v0.1",
    kind=kind,
    no_kv_cache=False,
    repeat_last_n=64,
    quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
)
runner = loader.load()
res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        frequency_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralrs_metal-0.1.1.tar.gz (141.2 kB view details)

Uploaded Source

File details

Details for the file mistralrs_metal-0.1.1.tar.gz.

File metadata

  • Download URL: mistralrs_metal-0.1.1.tar.gz
  • Upload date:
  • Size: 141.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.1

File hashes

Hashes for mistralrs_metal-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2312065adf670c1f140d4ef0d1670ea9076b8caaeecfdbd8d886f185c20a85d7
MD5 0cdc7bc0a8d1d550434d2c46d9ecf47d
BLAKE2b-256 e761b643c37b38c1e1fb5ec52cf4ae0d7d03a8bca61138513ae9ed77a10b1cc4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page