Skip to main content

Python bindings for https://github.com/openvinotoolkit/openvino.genai

Project description

OpenVINO™ GenAI Library

OpenVINO™ GenAI is a flavor of OpenVINO™, aiming to simplify running inference of generative AI models. It hides the complexity of the generation process and minimizes the amount of code required.

Install OpenVINO™ GenAI

The OpenVINO™ GenAI flavor is available for installation via Archive and PyPI distributions. To install OpenVINO™ GenAI, refer to the Install Guide.

To build OpenVINO™ GenAI library from source, refer to the Build Instructions.

Usage

Prerequisites

  1. Installed OpenVINO™ GenAI

    If OpenVINO GenAI is installed via archive distribution or built from source, you will need to install additional python dependencies (e.g. optimum-cli for simplified model downloading and exporting):

    # (Optional) Clone OpenVINO GenAI repository if it does not exist
    git clone --recursive https://github.com/openvinotoolkit/openvino.genai.git
    cd openvino.genai
    # Install python dependencies
    python -m pip install ./thirdparty/openvino_tokenizers/[transformers] --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/pre-release
    python -m pip install --upgrade-strategy eager -r ./samples/cpp/requirements.txt
    
  2. A model in OpenVINO IR format

    Download and convert a model with optimum-cli:

    optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"
    

LLMPipeline is the main object used for decoding. You can construct it straight away from the folder with the converted model. It will automatically load the main model, tokenizer, detokenizer and default generation configuration.

Python

A simple example:

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path, "CPU")
print(pipe.generate("The Sun is yellow because"))

Calling generate with custom generation config parameters, e.g. config for grouped beam search:

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path, "CPU")

result = pipe.generate("The Sun is yellow because", max_new_tokens=30, num_groups=3, group_size=5, diversity_penalty=1.5)
print(result)

output:

'it is made up of carbon atoms. The carbon atoms are arranged in a linear pattern, which gives the yellow color. The arrangement of carbon atoms in'

A simple chat in Python:

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(model_path)

config = {'max_new_tokens': 100, 'num_groups': 3, 'group_size': 5, 'diversity_penalty': 1.5}
pipe.set_generation_config(config)

pipe.start_chat()
while True:
    print('question:')
    prompt = input()
    if prompt == 'Stop!':
        break
    print(pipe(prompt))
pipe.finish_chat()

Test to compare with Huggingface outputs

C++

A simple example:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string model_path = argv[1];
    ov::genai::LLMPipeline pipe(model_path, "CPU");
    std::cout << pipe.generate("The Sun is yellow because");
}

Using group beam search decoding:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string model_path = argv[1];
    ov::genai::LLMPipeline pipe(model_path, "CPU");

    ov::genai::GenerationConfig config;
    config.max_new_tokens = 256;
    config.num_groups = 3;
    config.group_size = 5;
    config.diversity_penalty = 1.0f;

    std::cout << pipe.generate("The Sun is yellow because", config);
}

A simple chat in C++ using grouped beam search decoding:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string prompt;

    std::string model_path = argv[1];
    ov::genai::LLMPipeline pipe(model_path, "CPU");
    
    ov::genai::GenerationConfig config;
    config.max_new_tokens = 100;
    config.num_groups = 3;
    config.group_size = 5;
    config.diversity_penalty = 1.0f;
    
    pipe.start_chat();
    for (;;;) {
        std::cout << "question:\n";
        std::getline(std::cin, prompt);
        if (prompt == "Stop!")
            break;

        std::cout << "answer:\n";
        auto answer = pipe(prompt, config);
        std::cout << answer << std::endl;
    }
    pipe.finish_chat();
}

Streaming example with lambda function:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string model_path = argv[1];
    ov::genai::LLMPipeline pipe(model_path, "CPU");
        
    auto streamer = [](std::string word) { 
        std::cout << word << std::flush; 
        // Return flag correspods whether generation should be stopped.
        // false means continue generation.
        return false;
    };
    std::cout << pipe.generate("The Sun is yellow bacause", streamer);
}

Streaming with a custom class:

#include "openvino/genai/streamer_base.hpp"
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

class CustomStreamer: public ov::genai::StreamerBase {
public:
    bool put(int64_t token) {
        bool stop_flag = false; 
        /* 
        custom decoding/tokens processing code
        tokens_cache.push_back(token);
        std::string text = m_tokenizer.decode(tokens_cache);
        ...
        */
        return stop_flag;  // flag whether generation should be stoped, if true generation stops.
    };

    void end() {
        /* custom finalization */
    };
};

int main(int argc, char* argv[]) {
    CustomStreamer custom_streamer;

    std::string model_path = argv[1];
    ov::genai::LLMPipeline pipe(model_path, "CPU");
    std::cout << pipe.generate("The Sun is yellow because", custom_streamer);
}

How It Works

For information on how OpenVINO™ GenAI works, refer to the How It Works Section.

Supported Models

For a list of supported models, refer to the Supported Models Section.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

openvino_genai-2024.2.0.0-305-cp312-cp312-win_amd64.whl (794.4 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

openvino_genai-2024.2.0.0-305-cp312-cp312-manylinux_2_31_aarch64.whl (984.9 kB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.31+ ARM64

openvino_genai-2024.2.0.0-305-cp312-cp312-manylinux_2_17_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

openvino_genai-2024.2.0.0-305-cp311-cp311-win_amd64.whl (794.1 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

openvino_genai-2024.2.0.0-305-cp311-cp311-manylinux_2_31_aarch64.whl (985.9 kB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.31+ ARM64

openvino_genai-2024.2.0.0-305-cp311-cp311-manylinux_2_17_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

openvino_genai-2024.2.0.0-305-cp311-cp311-macosx_11_0_arm64.whl (2.5 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

openvino_genai-2024.2.0.0-305-cp311-cp311-macosx_10_15_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.11 macOS 10.15+ x86-64

openvino_genai-2024.2.0.0-305-cp310-cp310-win_amd64.whl (792.9 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

openvino_genai-2024.2.0.0-305-cp310-cp310-manylinux_2_31_aarch64.whl (984.5 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.31+ ARM64

openvino_genai-2024.2.0.0-305-cp310-cp310-manylinux_2_17_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

openvino_genai-2024.2.0.0-305-cp310-cp310-macosx_11_0_arm64.whl (2.5 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

openvino_genai-2024.2.0.0-305-cp310-cp310-macosx_10_15_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.10 macOS 10.15+ x86-64

openvino_genai-2024.2.0.0-305-cp39-cp39-win_amd64.whl (793.2 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

openvino_genai-2024.2.0.0-305-cp39-cp39-manylinux_2_31_aarch64.whl (984.9 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.31+ ARM64

openvino_genai-2024.2.0.0-305-cp39-cp39-manylinux_2_17_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

openvino_genai-2024.2.0.0-305-cp39-cp39-macosx_11_0_arm64.whl (2.5 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

openvino_genai-2024.2.0.0-305-cp39-cp39-macosx_10_15_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.9 macOS 10.15+ x86-64

openvino_genai-2024.2.0.0-305-cp38-cp38-win_amd64.whl (793.1 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

openvino_genai-2024.2.0.0-305-cp38-cp38-manylinux_2_31_aarch64.whl (984.3 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.31+ ARM64

openvino_genai-2024.2.0.0-305-cp38-cp38-manylinux_2_17_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

openvino_genai-2024.2.0.0-305-cp38-cp38-macosx_11_0_arm64.whl (2.5 MB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

openvino_genai-2024.2.0.0-305-cp38-cp38-macosx_10_15_x86_64.whl (2.9 MB view hashes)

Uploaded CPython 3.8 macOS 10.15+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page