Skip to main content

Fast and easy LLM serving.

Project description

mistral.rs PyO3 Bindings: mistralrs

mistralrs is a Python package which provides an API for mistral.rs. We build mistralrs with the maturin build manager.

Installation from PyPi

  1. Install Rust: https://rustup.rs/

    Example on Ubuntu:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  2. mistralrs depends on the openssl library.

Example on Ubuntu:

sudo apt install libssl-dev
sudo apt install pkg-config
  1. Install it!
  • CUDA

    pip install mistralrs-cuda -v

  • Metal

    pip install mistralrs-metal -v

  • Apple Accelerate

    pip install mistralrs-accelerate -v

  • Intel MKL

    pip install mistralrs-mkl -v

  • Without accelerators

    pip install mistralrs -v

All installations will install the mistralrs package. The suffix on the package installed by pip only controls the feature activation.

Installation from source

  1. Install required packages

    • openssl (Example on Ubuntu: sudo apt install libssl-dev)
    • pkg-config (Example on Ubuntu: sudo apt install pkg-config)
  2. Install Rust: https://rustup.rs/

    Example on Ubuntu:

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source $HOME/.cargo/env
    
  3. Set HF token correctly (skip if already set or your model is not gated, or if you want to use the token_source parameters in Python or the command line.)

    Example on Ubuntu:

    mkdir ~/.cache/huggingface
    touch ~/.cache/huggingface/token
    echo <HF_TOKEN_HERE> > ~/.cache/huggingface/token
    
  4. Download the code

    git clone https://github.com/EricLBuehler/mistral.rs.git
    cd mistral.rs
    
  5. cd into the correct directory for building mistralrs: cd mistralrs-pyo3

  6. Install maturin, our Rust + Python build system: Maturin requires a Python virtual environment such as venv or conda to be active. The mistralrs package will be installed into that environment.

    pip install maturin[patchelf]
    
  7. Install mistralrs Install mistralrs by executing the following in this directory where features such as cuda or flash-attn may be specified with the --features argument just like they would be for cargo run.

    The base build command is:

    maturin develop -r
    
    • To build for CUDA:
    maturin develop -r --features cuda
    
    • To build for CUDA with flash attention:
    maturin develop -r --features "cuda flash-attn"
    
    • To build for Metal:
    maturin develop -r --features metal
    
    • To build for Accelerate:
    maturin develop -r --features accelerate
    
    • To build for MKL:
    maturin develop -r --features mkl
    

Please find API docs here and the type stubs here, which are another great form of documentation.

We also provide a cookbook here!

Example

from mistralrs import Runner, Which, ChatCompletionRequest

runner = Runner(
    which=Which.GGUF(
        tok_model_id="mistralai/Mistral-7B-Instruct-v0.1",
        quantized_model_id="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
        quantized_filename="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
    )
)

res = runner.send_chat_completion_request(
    ChatCompletionRequest(
        model="mistral",
        messages=[
            {"role": "user", "content": "Tell me a story about the Rust type system."}
        ],
        max_tokens=256,
        presence_penalty=1.0,
        top_p=0.1,
        temperature=0.1,
    )
)
print(res.choices[0].message.content)
print(res.usage)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistralrs_mkl-0.3.1.tar.gz (441.7 kB view details)

Uploaded Source

File details

Details for the file mistralrs_mkl-0.3.1.tar.gz.

File metadata

  • Download URL: mistralrs_mkl-0.3.1.tar.gz
  • Upload date:
  • Size: 441.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.18

File hashes

Hashes for mistralrs_mkl-0.3.1.tar.gz
Algorithm Hash digest
SHA256 81fa59fdcb2794f85d5420dfc2807a2712da1362eb49ddaff383325a953b3acc
MD5 401a0f5c92d98610ec1cabe364a660b0
BLAKE2b-256 15591cdb5815be7e946d9832e754a8e524765c0ff68ac5dbbd4f06a5f72ac53f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page