Skip to main content

No project description provided

Project description

Mistral Inference

Open In Colab

This repository contains minimal code to run our 7B, 8x7B and 8x22B models.

Blog 7B: https://mistral.ai/news/announcing-mistral-7b/
Blog 8x7B: https://mistral.ai/news/mixtral-of-experts/
Blog 8x22B: https://mistral.ai/news/mixtral-8x22b/
Blog Codestral 22B: https://mistral.ai/news/codestral Blog Codestral Mamba 7B: https://mistral.ai/news/codestral-mamba/ Blog Mathstral 7B: https://mistral.ai/news/mathstral/

Discord: https://discord.com/invite/mistralai
Documentation: https://docs.mistral.ai/
Guardrailing: https://docs.mistral.ai/usage/guardrailing

Installation

Note: You will use a GPU to install mistral-inference, as it currently requires xformers to be installed and xformers itself needs a GPU for installation.

PyPI

pip install mistral-inference

Local

cd $HOME && git clone https://github.com/mistralai/mistral-inference
cd $HOME/mistral-inference && poetry install .

Model download

Name Download md5sum
7B Instruct https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar 80b71fcb6416085bcb4efad86dfb4d52
8x7B Instruct https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar (Updated model coming soon!) 8e2d3930145dc43d3084396f49d38a3f
8x22 Instruct https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-Instruct-v0.3.tar 471a02a6902706a2f1e44a693813855b
7B Base https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-v0.3.tar 0663b293810d7571dad25dae2f2a5806
8x7B Updated model coming soon! -
8x22B https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-v0.3.tar a2fa75117174f87d1197e3a4eb50371a
Codestral 22B https://models.mistralcdn.com/codestral-22b-v0-1/codestral-22B-v0.1.tar 1ea95d474a1d374b1d1b20a8e0159de3
Mathstral 7B https://models.mistralcdn.com/mathstral-7b-v0-1/mathstral-7B-v0.1.tar 5f05443e94489c261462794b1016f10b
Codestral-Mamba 7B https://models.mistralcdn.com/codestral-mamba-7b-v0-1/codestral-mamba-7B-v0.1.tar d3993e4024d1395910c55db0d11db163

Note:

  • Important:
  • All of the listed models above support function calling. For example, Mistral 7B Base/Instruct v3 is a minor update to Mistral 7B Base/Instruct v2, with the addition of function calling capabilities.
  • The "coming soon" models will include function calling as well.
  • You can download the previous versions of our models from our docs.

Create a local folder to store models

export MISTRAL_MODEL=$HOME/mistral_models
mkdir -p $MISTRAL_MODEL

Download any of the above links and extract the content, e.g.:

export M7B_DIR=$MISTRAL_MODEL/7B_instruct
wget https://models.mistralcdn.com/mistral-7b-v0-3/mistral-7B-Instruct-v0.3.tar
mkdir -p $M7B_DIR
tar -xf mistral-7B-Instruct-v0.3.tar -C $M7B_DIR

or

export M8x7B_DIR=$MISTRAL_MODEL/8x7b_instruct
wget https://models.mistralcdn.com/mixtral-8x7b-v0-1/Mixtral-8x7B-v0.1-Instruct.tar
mkdir -p $M8x7B_DIR
tar -xf Mixtral-8x7B-v0.1-Instruct.tar -C $M8x7B_DIR

Usage

The following sections give an overview of how to run the model from the Command-line interface (CLI) or directly within Python.

CLI

  • Demo

To test that a model works in your setup, you can run the mistral-demo command. The 7B models can be tested on a single GPU as follows:

mistral-demo $M7B_DIR

Large models, such 8x7B and 8x22B have to be run in a multi-GPU setup. For these models, you can use the following command:

torchrun --nproc-per-node 2 --no-python mistral-demo $M8x7B_DIR

Note: Change --nproc-per-node to more GPUs if available.

  • Chat

To interactively chat with the models, you can make use of the mistral-chat command.

mistral-chat $M7B_DIR --instruct

For large models, you can make use of torchrun.

torchrun --nproc-per-node 2 --no-python mistral-chat $M8x7B_DIR --instruct

Note: Change --nproc-per-node to more GPUs if necessary (e.g. for 8x22B).

  • Chat with Codestral

To use Codestral as a coding assistant you can run the following command using mistral-chat. Make sure $M22B_CODESTRAL is set to a valid path to the downloaded codestral folder, e.g. $HOME/mistral_models/Codestral-22B-v0.1

mistral-chat $M22B_CODESTRAL --instruct --max_tokens 256

If you prompt it with "Write me a function that computes fibonacci in Rust", the model should generate something along the following lines:

Sure, here's a simple implementation of a function that computes the Fibonacci sequence in Rust. This function takes an integer `n` as an argument and returns the `n`th Fibonacci number.

fn fibonacci(n: u32) -> u32 {
    match n {
        0 => 0,
        1 => 1,
        _ => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

fn main() {
    let n = 10;
    println!("The {}th Fibonacci number is: {}", n, fibonacci(n));
}

This function uses recursion to calculate the Fibonacci number. However, it's not the most efficient solution because it performs a lot of redundant calculations. A more efficient solution would use a loop to iteratively calculate the Fibonacci numbers.

You can continue chatting afterwards, e.g. with "Translate it to Python".

  • Chat with Codestral-Mamba

To use Codestral-Mamba as a coding assistant you can run the following command using mistral-chat. Make sure $7B_CODESTRAL_MAMBA is set to a valid path to the downloaded codestral-mamba folder, e.g. $HOME/mistral_models/mamba-codestral-7B-v0.1.

You then need to additionally install the following packages:

pip install packaging mamba-ssm causal-conv1d transformers

before you can start chatting:

mistral-chat $7B_CODESTRAL_MAMBA --instruct --max_tokens 256
  • Chat with Mathstral

To use Mathstral as an assistant you can run the following command using mistral-chat. Make sure $7B_MATHSTRAL is set to a valid path to the downloaded codestral folder, e.g. $HOME/mistral_models/mathstral-7B-v0.1

mistral-chat $7B_MATHSTRAL --instruct --max_tokens 256

If you prompt it with "Albert likes to surf every week. Each surfing session lasts for 4 hours and costs $20 per hour. How much would Albert spend in 5 weeks?", the model should answer with the correct calculation.

You can then continue chatting afterwards, e.g. with "How much would he spend in a year?".

Python

  • Instruction Following:
from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file("./mistral_7b_instruct/tokenizer.model.v3")  # change to extracted tokenizer file
model = Transformer.from_folder("./mistral_7b_instruct")  # change to extracted model dir

completion_request = ChatCompletionRequest(messages=[UserMessage(content="Explain Machine Learning to me in a nutshell.")])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)
  • Function Calling:
from mistral_common.protocol.instruct.tool_calls import Function, Tool

completion_request = ChatCompletionRequest(
    tools=[
        Tool(
            function=Function(
                name="get_current_weather",
                description="Get the current weather",
                parameters={
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA",
                        },
                        "format": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"],
                            "description": "The temperature unit to use. Infer this from the users location.",
                        },
                    },
                    "required": ["location", "format"],
                },
            )
        )
    ],
    messages=[
        UserMessage(content="What's the weather like today in Paris?"),
        ],
)

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=64, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)
  • Fill-in-the-middle (FIM):

Make sure to have mistral-common >= 1.2.0 installed:

pip install --upgrade mistral-common

You can simulate a code completion in-filling as follows.

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate
from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.tokens.instruct.request import FIMRequest

tokenizer = MistralTokenizer.from_model("codestral-22b")
model = Transformer.from_folder("./mistral_22b_codestral")

prefix = """def add("""
suffix = """    return sum"""

request = FIMRequest(prompt=prefix, suffix=suffix)

tokens = tokenizer.encode_fim(request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=256, temperature=0.0, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.decode(out_tokens[0])

middle = result.split(suffix)[0].strip()
print(middle)

One-file-ref

If you want a self-contained implementation, look at one_file_ref.py, or run it with

python -m one_file_ref $M7B_DIR

which should give something along the following lines:

This is a test of the emergency broadcast system. This is only a test.

If this were a real emergency, you would be told what to do.

This is a test
=====================
This is another test of the new blogging software. I’m not sure if I’m going to keep it or not. I’m not sure if I’m going to keep
=====================
This is a third test, mistral AI is very good at testing. 🙂

This is a third test, mistral AI is very good at testing. 🙂

This
=====================

Note: To run self-contained implementations, you need to do a local installation.

Test

To run logits equivalence:

python -m pytest tests

Deployment

The deploy folder contains code to build a vLLM image with the required dependencies to serve the Mistral AI model. In the image, the transformers library is used instead of the reference implementation. To build it:

docker build deploy --build-arg MAX_JOBS=8

Instructions to run the image can be found in the official documentation.

Model platforms

References

[1]: LoRA: Low-Rank Adaptation of Large Language Models, Hu et al. 2021

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mistral_inference-1.2.0.tar.gz (25.1 kB view details)

Uploaded Source

Built Distribution

mistral_inference-1.2.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file mistral_inference-1.2.0.tar.gz.

File metadata

  • Download URL: mistral_inference-1.2.0.tar.gz
  • Upload date:
  • Size: 25.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for mistral_inference-1.2.0.tar.gz
Algorithm Hash digest
SHA256 67bc0c0f3ef5658a6ec94eaf3ee36ef6892c09a91e92ca0d512138cf712b09b3
MD5 a2df1f6acc93e5b090704c9f9175d78f
BLAKE2b-256 53272366f8dab92e874a29742453573b641c11dc9ba3eb03bb1ce13688ac2e6a

See more details on using hashes here.

File details

Details for the file mistral_inference-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mistral_inference-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba9cad66761d3a0731063f960e813724ba4ba3b494b4ddea3c7d0c13e4c918e9
MD5 cf41105d01791bd87f0f883762273685
BLAKE2b-256 97981c9eb6e8e65abd30a558b683f96985d03f916632a4d4af96c6ef3ba001fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page