Skip to main content

Fast and easy LLM serving.

Project description

mistral.rs PyO3 Bindings: mistralrs

mistralrs is a Python package which provides an API for mistral.rs. We build mistralrs with the maturin build manager.

Installation from PyPi

  1. Install Rust: https://rustup.rs/

    Example on Ubuntu:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  2. mistralrs depends on the openssl library.

Example on Ubuntu:

sudo apt install libssl-dev
sudo apt install pkg-config
  1. Install it!
  • CUDA

    pip install mistralrs-cuda -v

  • Metal

    pip install mistralrs-metal -v

  • Apple Accelerate

    pip install mistralrs-accelerate -v

  • Intel MKL

    pip install mistralrs-mkl -v

  • Without accelerators

    pip install mistralrs -v

All installations will install the mistralrs package. The suffix on the package installed by pip only controls the feature activation.

Installation from source

  1. Install required packages

    • openssl (Example on Ubuntu: sudo apt install libssl-dev)
    • pkg-config (Example on Ubuntu: sudo apt install pkg-config)
  2. Install Rust: https://rustup.rs/

    Example on Ubuntu:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  3. Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

    Example on Ubuntu:

    mkdir ~/.cache/huggingface
    touch ~/.cache/huggingface/token
    echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
    
  4. Download the code

    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs
    
  5. cd into the correct directory for building mistralrs: cd mistralrs-pyo3

  6. Install maturin, our Rust + Python build system: Maturin requires a Python virtual environment such as venv or conda to be active. The mistralrs package will be installed into that environment.

    pip install maturin[patchelf]
    
  7. Install mistralrs Install mistralrs by executing the following in this directory where features such as cuda or flash-attn may be specified with the --features argument just like they would be for cargo run.

    The base build command is:

    maturin develop -r
    
    • To build for CUDA:
    maturin develop -r --features cuda
    
    • To build for CUDA with flash attention:
    maturin develop -r --features "cuda flash-attn"
    
    • To build for Metal:
    maturin develop -r --features metal
    
    • To build for Accelerate:
    maturin develop -r --features accelerate
    
    • To build for MKL:
    maturin develop -r --features mkl
    

Please find API docs here and the type stubs here, which are another great form of documentation.

We also provide a cookbook here!

Example

from mistralrs import Runner, Which, ChatCompletionRequest

runner = Runner(
    which=Which.GGUF(
        tok_model_id="mistralai/Mistral-7B-Instruct-v0.1",
        quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
        quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
        tokenizer_json=None,
        repeat_last_n=64,
    )
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res.choices[0].message.content)
print(res.usage)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralrs_cuda-0.1.19.tar.gz (220.1 kB view details)

Uploaded Source

File details

Details for the file mistralrs_cuda-0.1.19.tar.gz.

File metadata

  • Download URL: mistralrs_cuda-0.1.19.tar.gz
  • Upload date:
  • Size: 220.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.18

File hashes

Hashes for mistralrs_cuda-0.1.19.tar.gz
Algorithm Hash digest
SHA256 e36b4049372c2dfc0bc839b0a636b9973f8b841446e95201815aab842e54ac7d
MD5 0e313e3eed4295223f84baea7165f642
BLAKE2b-256 8a572c864fa1ca1152e240d3b599dc29d741945bff322908f4f5f60e42307eaf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page