Library of the most popular Generative AI model pipelines, optimized execution methods, and samples

These details have not been verified by PyPI

Project description

OpenVINO™ GenAI Library

OpenVINO™ GenAI is a flavor of OpenVINO™, aiming to simplify running inference of generative AI models. It hides the complexity of the generation process and minimizes the amount of code required.

Install OpenVINO™ GenAI

NOTE: Please make sure that you are following the versions compatibility rules, refer to the OpenVINO™ GenAI Dependencies for more information.

The OpenVINO™ GenAI flavor is available for installation via Archive and PyPI distributions. To install OpenVINO™ GenAI, refer to the Install Guide.

To build OpenVINO™ GenAI library from source, refer to the Build Instructions.

OpenVINO™ GenAI Dependencies

OpenVINO™ GenAI depends on OpenVINO and OpenVINO Tokenizers.

When installing OpenVINO™ GenAI from PyPi, the same versions of OpenVINO and OpenVINO Tokenizers are used (e.g. openvino==2024.3.0 and openvino-tokenizers==2024.3.0.0 are installed for openvino-genai==2024.3.0). If you update one of the dependency packages (e.g. pip install openvino --pre --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly), versions might be incompatible due to different ABI and running OpenVINO GenAI can result in errors (e.g. ImportError: libopenvino.so.2430: cannot open shared object file: No such file or directory). Having packages version in format <MAJOR>.<MINOR>.<PATCH>.<REVISION>, only <REVISION> part of the full version can be varied to ensure ABI compatibility, while changing <MAJOR>, <MINOR> or <PATCH> parts of the version might break ABI.

GenAI, Tokenizers, and OpenVINO wheels for Linux on PyPI are compiled with _GLIBCXX_USE_CXX11_ABI=0 to cover a wider range of platforms. In contrast, C++ archive distributions for Ubuntu are compiled with _GLIBCXX_USE_CXX11_ABI=1. It is not possible to mix different Application Binary Interfaces (ABIs) because doing so results in a link error. This incompatibility prevents the use of, for example, OpenVINO from C++ archive distributions alongside GenAI from PyPI.

If you want to try OpenVINO GenAI with different dependencies versions (not prebuilt packages as archives or python wheels), build OpenVINO GenAI library from source.

Usage

Prerequisites

Installed OpenVINO™ GenAI

To use OpenVINO GenAI with models that are already in OpenVINO format, no additional python dependencies are needed. To convert models with optimum-cli and to run the examples, install the dependencies in ./samples/requirements.txt:

# (Optional) Clone OpenVINO GenAI repository if it does not exist
git clone --recursive https://github.com/openvinotoolkit/openvino.genai.git
cd openvino.genai
# Install python dependencies
python -m pip install ./thirdparty/openvino_tokenizers/[transformers] --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
python -m pip install --upgrade-strategy eager -r ./samples/requirements.txt

A model in OpenVINO IR format

Download and convert a model with optimum-cli:

optimum-cli export openvino --model "TinyLlama/TinyLlama-1.1B-Chat-v1.0" --trust-remote-code "TinyLlama-1.1B-Chat-v1.0"

LLMPipeline is the main object used for decoding. You can construct it straight away from the folder with the converted model. It will automatically load the main model, tokenizer, detokenizer and default generation configuration.

Python

A simple example:

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(models_path, "CPU")
print(pipe.generate("The Sun is yellow because", max_new_tokens=100))

Calling generate with custom generation config parameters, e.g. config for grouped beam search:

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(models_path, "CPU")

result = pipe.generate("The Sun is yellow because", max_new_tokens=100, num_beam_groups=3, num_beams=15, diversity_penalty=1.5)
print(result)

output:

'it is made up of carbon atoms. The carbon atoms are arranged in a linear pattern, which gives the yellow color. The arrangement of carbon atoms in'

Note: The chat_template from tokenizer_config.json or from tokenizer/detokenizer model will be automatically applied to the prompt at the generation stage. If you want to disable it, you can do it by calling pipe.get_tokenizer().set_chat_template("").

A simple chat in Python:

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(models_path)

config = {'max_new_tokens': 100, 'num_beam_groups': 3, 'num_beams': 15, 'diversity_penalty': 1.5}
pipe.set_generation_config(config)

pipe.start_chat()
while True:
    print('question:')
    prompt = input()
    if prompt == 'Stop!':
        break
    print(pipe(prompt, max_new_tokens=200))
pipe.finish_chat()

Test to compare with Huggingface outputs

C++

A simple example:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string models_path = argv[1];
    ov::genai::LLMPipeline pipe(models_path, "CPU");
    std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(256));
}

Using group beam search decoding:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string models_path = argv[1];
    ov::genai::LLMPipeline pipe(models_path, "CPU");

    ov::genai::GenerationConfig config;
    config.max_new_tokens = 256;
    config.num_beam_groups = 3;
    config.num_beams = 15;
    config.diversity_penalty = 1.0f;

    std::cout << pipe.generate("The Sun is yellow because", config);
}

A simple chat in C++ using grouped beam search decoding:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string prompt;

    std::string models_path = argv[1];
    ov::genai::LLMPipeline pipe(models_path, "CPU");

    ov::genai::GenerationConfig config;
    config.max_new_tokens = 100;
    config.num_beam_groups = 3;
    config.num_beams = 15;
    config.diversity_penalty = 1.0f;

    pipe.start_chat();
    for (;;;) {
        std::cout << "question:\n";
        std::getline(std::cin, prompt);
        if (prompt == "Stop!")
            break;

        std::cout << "answer:\n";
        auto answer = pipe(prompt, config);
        std::cout << answer << std::endl;
    }
    pipe.finish_chat();
}

Streaming example with lambda function:

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string models_path = argv[1];
    ov::genai::LLMPipeline pipe(models_path, "CPU");

    auto streamer = [](std::string word) {
        std::cout << word << std::flush;
        // Return flag corresponds whether generation should be stopped.
        return ov::genai::StreamingStatus::RUNNING;
    };
    std::cout << pipe.generate("The Sun is yellow because", ov::genai::streamer(streamer), ov::genai::max_new_tokens(200));
}

Streaming with a custom class:

C++ template for a streamer.

#include "openvino/genai/streamer_base.hpp"
#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

class CustomStreamer: public ov::genai::StreamerBase {
public:
    bool put(int64_t token) {
        // Custom decoding/tokens processing logic.

        // Returns a flag whether generation should be stopped, if true generation stops.
        return false;
    };

    void end() {
        // Custom finalization logic.
    };
};

int main(int argc, char* argv[]) {
    CustomStreamer custom_streamer;

    std::string models_path = argv[1];
    ov::genai::LLMPipeline pipe(models_path, "CPU");
    std::cout << pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(15), ov::genai::streamer(custom_streamer));
}

Python template for a streamer.

import openvino_genai as ov_genai

class CustomStreamer(ov_genai.StreamerBase):
    def __init__(self):
        super().__init__()
        # Initialization logic.

    def put(self, token_id) -> bool:
        # Custom decoding/tokens processing logic.

        # Returns a flag whether generation should be stopped, if true generation stops.
        return False

    def end(self):
        # Custom finalization logic.

pipe = ov_genai.LLMPipeline(models_path, "CPU")
custom_streamer = CustomStreamer()

pipe.generate("The Sun is yellow because", max_new_tokens=15, streamer=custom_streamer)

For fully implemented iterable CustomStreamer please refer to multinomial_causal_lm sample.

Continuous batching with LLMPipeline:

To activate continuous batching please provide additional property to LLMPipeline config: ov::genai::scheduler_config. This property contains struct SchedulerConfig.

#include "openvino/genai/llm_pipeline.hpp"

int main(int argc, char* argv[]) {
    ov::genai::SchedulerConfig scheduler_config;
    // fill other fields in scheduler_config with custom data if required
    scheduler_config.cache_size = 1;    // minimal possible KV cache size in GB, adjust as required

    ov::genai::LLMPipeline pipe(models_path, "CPU", ov::genai::scheduler_config(scheduler_config));
}

Performance Metrics

openvino_genai.PerfMetrics (referred as PerfMetrics for simplicity) is a structure that holds performance metrics for each generate call. PerfMetrics holds fields with mean and standard deviations for the following metrics:

Time To the First Token (TTFT), ms
Time per Output Token (TPOT), ms/token
Generate total duration, ms
Tokenization duration, ms
Detokenization duration, ms
Throughput, tokens/s

and:

Load time, ms
Number of generated tokens
Number of tokens in the input prompt

Performance metrics are stored either in the DecodedResults or EncodedResults perf_metric field. Additionally to the fields mentioned above, PerfMetrics has a member raw_metrics of type openvino_genai.RawPerfMetrics (referred to as RawPerfMetrics for simplicity) that contains raw values for the durations of each batch of new token generation, tokenization durations, detokenization durations, and more. These raw metrics are accessible if you wish to calculate your own statistical values such as median or percentiles. However, since mean and standard deviation values are usually sufficient, we will focus on PerfMetrics.

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(models_path, "CPU")
result = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
perf_metrics = result.perf_metrics

print(f'Generate duration: {perf_metrics.get_generate_duration().mean:.2f}')
print(f'TTFT: {perf_metrics.get_ttft().mean:.2f} ms')
print(f'TPOT: {perf_metrics.get_tpot().mean:.2f} ms/token')
print(f'Throughput: {perf_metrics.get_throughput().mean:.2f} tokens/s')

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string models_path = argv[1];
    ov::genai::LLMPipeline pipe(models_path, "CPU");
    auto result = pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(20));
    auto perf_metrics = result.perf_metrics;

    std::cout << std::fixed << std::setprecision(2);
    std::cout << "Generate duration: " << perf_metrics.get_generate_duration().mean << " ms" << std::endl;
    std::cout << "TTFT: " << metrics.get_ttft().mean  << " ms" << std::endl;
    std::cout << "TPOT: " << metrics.get_tpot().mean  << " ms/token " << std::endl;
    std::cout << "Throughput: " << metrics.get_throughput().mean  << " tokens/s" << std::endl;
}

output:

mean_generate_duration: 76.28
mean_ttft: 42.58
mean_tpot 3.80

Note: If the input prompt is just a string, the generate function returns only a string without perf_metrics. To obtain perf_metrics, provide the prompt as a list with at least one element or call generate with encoded inputs.

Accumulating metrics

Several perf_metrics can be added to each other. In that case raw_metrics are concatenated and mean/std values are recalculated. This accumulates statistics from several generate() calls

#include "openvino/genai/llm_pipeline.hpp"
#include <iostream>

int main(int argc, char* argv[]) {
    std::string models_path = argv[1];
    ov::genai::LLMPipeline pipe(models_path, "CPU");
    auto result_1 = pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(20));
    auto result_2 = pipe.generate("The Sun is yellow because", ov::genai::max_new_tokens(20));
    auto perf_metrics = result_1.perf_metrics + result_2.perf_metrics

    std::cout << std::fixed << std::setprecision(2);
    std::cout << "Generate duration: " << perf_metrics.get_generate_duration().mean << " ms" << std::endl;
    std::cout << "TTFT: " << metrics.get_ttft().mean  << " ms" << std::endl;
    std::cout << "TPOT: " << metrics.get_tpot().mean  << " ms/token " << std::endl;
    std::cout << "Throughput: " << metrics.get_throughput().mean  << " tokens/s" << std::endl;
}

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(models_path, "CPU")
res_1 = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
res_2 = pipe.generate(["Why Sky is blue because"], max_new_tokens=20)
perf_metrics = res_1.perf_metrics + res_2.perf_metrics

print(f'Generate duration: {perf_metrics.get_generate_duration().mean:.2f}')
print(f'TTFT: {perf_metrics.get_ttft().mean:.2f} ms')
print(f'TPOT: {perf_metrics.get_tpot().mean:.2f} ms/token')
print(f'Throughput: {perf_metrics.get_throughput().mean:.2f} tokens/s')

Using raw performance metrics

In addition to mean and standard deviation values, the perf_metrics object has a raw_metrics field. This field stores raw data, including:

Timestamps for each batch of generated tokens
Batch sizes for each timestamp
Tokenization durations
Detokenization durations
Other relevant metrics

These metrics can be use for more fine grained analysis, such as getting exact calculating median values, percentiles, etc. Below are a few examples of how to use raw metrics.

Getting timestamps for each generated token:

import openvino_genai as ov_genai
pipe = ov_genai.LLMPipeline(models_path, "CPU")
result = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
perf_metrics = result.perf_metrics
raw_metrics = perf_metrics.raw_metrics

print(f'Generate duration: {perf_metrics.get_generate_duration().mean:.2f}')
print(f'Throughput: {perf_metrics.get_throughput().mean:.2f} tokens/s')
print(f'Timestamps: {" ms, ".join(f"{i:.2f}" for i in raw_metrics.m_new_token_times)}')

Getting pure inference time without tokenizatin and detokenization duration:

import openvino_genai as ov_genai
import numpy as np
pipe = ov_genai.LLMPipeline(models_path, "CPU")
result = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
perf_metrics = result.perf_metrics
print(f'Generate duration: {perf_metrics.get_generate_duration().mean:.2f} ms')

raw_metrics = perf_metrics.raw_metrics
generate_duration = np.array(raw_metrics.generate_durations)
tok_detok_duration = np.array(raw_metrics.tokenization_durations) - np.array(raw_metrics.detokenization_durations)
pure_inference_duration = np.sum(generate_duration - tok_detok_duration) / 1000 # in milliseconds
print(f'Pure Inference duration: {pure_inference_duration:.2f} ms')

Example of using raw metrics to calculate median value of generate duration:

import openvino_genai as ov_genai
import numpy as np
pipe = ov_genai.LLMPipeline(models_path, "CPU")
result = pipe.generate(["The Sun is yellow because"], max_new_tokens=20)
perf_metrics = result.perf_metrics
raw_metrics = perf_metrics.raw_metrics

print(f'Generate duration: {perf_metrics.get_generate_duration().mean:.2f}')
print(f'Throughput: {perf_metrics.get_throughput().mean:.2f} tokens/s')
durations = np.array(raw_metrics.m_new_token_times[1:]) - np.array(raw_metrics.m_new_token_times[:-1])
print(f'Median from token to token duration: {np.median(durations):.2f} ms')

For more examples of how metrics are used, please refer to the Python benchmark_genai.py and C++ benchmark_genai samples.

Structured Output generation

OpenVINO™ GenAI supports structured output generation, which allows you to generate outputs in a structured format such as JSON, regex, or according to EBNF (Extended Backus–Naur form) grammar.

Below is a minimal example that demonstrates how to use OpenVINO™ GenAI to generate structured JSON output for a single item type (e.g., person). This example uses a Pydantic schema to define the structure and constraints of the generated output.

import json
from openvino_genai import LLMPipeline, GenerationConfig, StructuredOutputConfig
from pydantic import BaseModel, Field

# Define the schema for a person
class Person(BaseModel):
    name: str = Field(pattern=r"^[A-Z][a-z]{1,20}$")
    surname: str = Field(pattern=r"^[A-Z][a-z]{1,20}$")
    age: int
    city: str

pipe = LLMPipeline(models_path, "CPU")

config = GenerationConfig()
config.max_new_tokens = 100
# If backend is not specified, it will use the default backend which is "xgrammar" for the moment.
config.structured_output_config = StructuredOutputConfig(json_schema=json.dumps(Person.model_json_schema()), backend="xgrammar")

# Generate structured output
result = pipe.generate("Generate a JSON for a person.", config)
print(json.loads(result))

This will generate a JSON object matching the Person schema, for example:

{
  "name": "John",
  "surname": "Doe",
  "age": 30,
  "city": "Dublin"
}

Note:
Structured output enforcement guarantees correct JSON formatting, but does not ensure the factual correctness or sensibility of the content. The model may generate implausible or nonsensical data, such as {"name": "John", "age": 200000} or {"model": "AbrakaKadabra9999######4242"}. These are valid JSONs but may not make sense. For best results, use the latest or fine-tuned models for this task to improve the quality and relevance of the generated output.

Tokenization

OpenVINO™ GenAI provides a way to tokenize and detokenize text using the ov::genai::Tokenizer class. The Tokenizer is a high level abstraction over the OpenVINO Tokenizers library.

It can be initialized from the path, in-memory IR representation or obtained from the ov::genai::LLMPipeline object.

// Initialize from the path
#include "openvino/genai/llm_pipeline.hpp"
auto tokenizer = ov::genai::Tokenizer(models_path);

// Get instance of Tokenizer from LLMPipeline.
auto pipe = ov::genai::LLMPipeline pipe(models_path, "CPU");
auto tokenzier = pipe.get_tokenizer();

import openvino_genai as ov_genai
tokenizer = ov_genai.Tokenizer(models_path)

# Or from LLMPipeline.
pipe = ov_genai.LLMPipeline(models_path, "CPU")
tokenizer = pipe.get_tokenizer()

Tokenizer has encode and decode methods which support the following arguments: add_special_tokens, skip_special_tokens, pad_to_max_length, max_length arguments.

In order to disable adding special tokens do the following, in C++:

auto tokens = tokenizer.encode("The Sun is yellow because", ov::genai::add_special_tokens(false));

In Python:

tokens = tokenizer.encode("The Sun is yellow because", add_special_tokens=False)

The encode method returns a TokenizedInputs object containing input_ids and attention_mask, both stored as ov::Tensor. Since ov::Tensor requires fixed-length sequences, padding is applied to match the longest sequence in a batch, ensuring a uniform shape. Also resulting sequence is truncated by max_length. If this value is not defined by used, it's is taken from the IR.

Both padding and max_length can be controlled by the user. If pad_to_max_length is set to true, then instead of padding to the longest sequence it will be padded to the max_length.

Below are example how padding can be controlled, in C++:

#include "openvino/genai/llm_pipeline.hpp"
auto tokenizer = ov::genai::Tokenizer(models_path);
std::vector<std::string> prompts = {"The Sun is yellow because", "The"};

// Since prompt is definitely shorter than maximal length (which is taken from IR) will not affect shape.
// Resulting shape is defined by length of the longest tokens sequence.
// Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="longest", truncation=True)
tokens = tokenizer.encode({"The Sun is yellow because", "The"})
// or is equivalent to
tokens = tokenizer.encode({"The Sun is yellow because", "The"}, ov::genai::pad_to_max_length(False))
// out_shape: [2, 6]

// Resulting tokens tensor will be padded to 1024.
// Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="max_length", truncation=True, max_length=1024)
tokens = tokenizer.encode({"The Sun is yellow because", 
                           "The",
                           std::string(2000, 'n')}, ov::genai::pad_to_max_length(True), ov::genai::max_length(1024))
// out_shape: [3, 1024]

// For single string prompts truncation and padding are also applied.
tokens = tokenizer.encode({"The Sun is yellow because"}, ov::genai::pad_to_max_length(True), ov::genai::max_length(1024))
// out_shape: [1, 128]

In Python:

import openvino_genai as ov_genai

tokenizer = ov_genai.Tokenizer(models_path)
prompts = ["The Sun is yellow because", "The"]

# Since prompt is definitely shorter than maximal length (which is taken from IR) will not affect shape.
# Resulting shape is defined by length of the longest tokens sequence.
# Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="longest", truncation=True)
tokens = tokenizer.encode(["The Sun is yellow because", "The"])
# or is equivalent to
tokens = tokenizer.encode(["The Sun is yellow because", "The"], pad_to_max_length=False)
print(tokens.input_ids.shape)
# out_shape: [2, 6]

# Resulting tokens tensor will be padded to 1024, sequences which exceed this length will be truncated.
# Equivalent of HuggingFace hf_tokenizer.encode(prompt, padding="max_length", truncation=True, max_length=1024)
tokens = tokenizer.encode(["The Sun is yellow because", 
                           "The"
                           "The longest string ever" * 2000], pad_to_max_length=True, max_length=1024)
print(tokens.input_ids.shape)
# out_shape: [3, 1024]

# For single string prompts truncation and padding are also applied.
tokens = tokenizer.encode("The Sun is yellow because", pad_to_max_length=True, max_length=128)
print(tokens.input_ids.shape)
# out_shape: [1, 128]

How It Works

For information on how OpenVINO™ GenAI works, refer to the How It Works Section.

Supported Models

For a list of supported models, refer to the Supported Models page.

Debug Log

For using debug log, refer to DEBUG Log.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

2026.1.0.0

Apr 7, 2026

2026.0.0.0

Feb 23, 2026

2025.4.1.0

Dec 17, 2025

This version

2025.4.0.0

Dec 1, 2025

2025.3.0.0

Sep 3, 2025

2025.2.0.0

Jun 18, 2025

2025.1.0.0

Apr 10, 2025

2025.0.0.0

Feb 6, 2025

2024.6.0.0

Dec 19, 2024

2024.5.0.0

Nov 20, 2024

2024.4.1.0.dev20240926 pre-release

Oct 1, 2024

2024.4.0.0

Sep 19, 2024

2024.3.0.0

Jul 31, 2024

2024.2.0.0

Jun 17, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openvino_genai-2025.4.0.0-1899-cp314-cp314t-manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded Dec 1, 2025 CPython 3.14t

openvino_genai-2025.4.0.0-1899-cp314-cp314t-macosx_11_0_arm64.whl (3.4 MB view details)

Uploaded Dec 1, 2025 CPython 3.14tmacOS 11.0+ ARM64

openvino_genai-2025.4.0.0-1899-cp314-cp314t-macosx_10_15_x86_64.whl (3.9 MB view details)

Uploaded Dec 1, 2025 CPython 3.14tmacOS 10.15+ x86-64

openvino_genai-2025.4.0.0-1899-cp314-cp314-win_amd64.whl (2.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.14Windows x86-64

openvino_genai-2025.4.0.0-1899-cp314-cp314-manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded Dec 1, 2025 CPython 3.14

openvino_genai-2025.4.0.0-1899-cp314-cp314-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded Dec 1, 2025 CPython 3.14macOS 11.0+ ARM64

openvino_genai-2025.4.0.0-1899-cp314-cp314-macosx_10_15_x86_64.whl (3.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.14macOS 10.15+ x86-64

openvino_genai-2025.4.0.0-1899-cp313-cp313-win_amd64.whl (2.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.13Windows x86-64

openvino_genai-2025.4.0.0-1899-cp313-cp313-manylinux_2_31_aarch64.whl (3.9 MB view details)

Uploaded Dec 1, 2025 CPython 3.13manylinux: glibc 2.31+ ARM64

openvino_genai-2025.4.0.0-1899-cp313-cp313-manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded Dec 1, 2025 CPython 3.13

openvino_genai-2025.4.0.0-1899-cp313-cp313-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded Dec 1, 2025 CPython 3.13macOS 11.0+ ARM64

openvino_genai-2025.4.0.0-1899-cp313-cp313-macosx_10_15_x86_64.whl (3.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.13macOS 10.15+ x86-64

openvino_genai-2025.4.0.0-1899-cp312-cp312-win_amd64.whl (2.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.12Windows x86-64

openvino_genai-2025.4.0.0-1899-cp312-cp312-manylinux_2_31_aarch64.whl (3.9 MB view details)

Uploaded Dec 1, 2025 CPython 3.12manylinux: glibc 2.31+ ARM64

openvino_genai-2025.4.0.0-1899-cp312-cp312-manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded Dec 1, 2025 CPython 3.12

openvino_genai-2025.4.0.0-1899-cp312-cp312-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded Dec 1, 2025 CPython 3.12macOS 11.0+ ARM64

openvino_genai-2025.4.0.0-1899-cp312-cp312-macosx_10_15_x86_64.whl (3.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.12macOS 10.15+ x86-64

openvino_genai-2025.4.0.0-1899-cp311-cp311-win_amd64.whl (2.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.11Windows x86-64

openvino_genai-2025.4.0.0-1899-cp311-cp311-manylinux_2_31_aarch64.whl (3.9 MB view details)

Uploaded Dec 1, 2025 CPython 3.11manylinux: glibc 2.31+ ARM64

openvino_genai-2025.4.0.0-1899-cp311-cp311-manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded Dec 1, 2025 CPython 3.11

openvino_genai-2025.4.0.0-1899-cp311-cp311-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded Dec 1, 2025 CPython 3.11macOS 11.0+ ARM64

openvino_genai-2025.4.0.0-1899-cp311-cp311-macosx_10_15_x86_64.whl (3.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.11macOS 10.15+ x86-64

openvino_genai-2025.4.0.0-1899-cp310-cp310-win_amd64.whl (2.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.10Windows x86-64

openvino_genai-2025.4.0.0-1899-cp310-cp310-manylinux_2_31_aarch64.whl (3.9 MB view details)

Uploaded Dec 1, 2025 CPython 3.10manylinux: glibc 2.31+ ARM64

openvino_genai-2025.4.0.0-1899-cp310-cp310-manylinux2014_x86_64.whl (4.5 MB view details)

Uploaded Dec 1, 2025 CPython 3.10

openvino_genai-2025.4.0.0-1899-cp310-cp310-macosx_11_0_arm64.whl (3.3 MB view details)

Uploaded Dec 1, 2025 CPython 3.10macOS 11.0+ ARM64

openvino_genai-2025.4.0.0-1899-cp310-cp310-macosx_10_15_x86_64.whl (3.7 MB view details)

Uploaded Dec 1, 2025 CPython 3.10macOS 10.15+ x86-64

File details

Details for the file openvino_genai-2025.4.0.0-1899-cp314-cp314t-manylinux2014_x86_64.whl.

File metadata

Download URL: openvino_genai-2025.4.0.0-1899-cp314-cp314t-manylinux2014_x86_64.whl
Upload date: Dec 1, 2025
Size: 4.5 MB
Tags: CPython 3.14t
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.20

File hashes

Hashes for openvino_genai-2025.4.0.0-1899-cp314-cp314t-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`893934407aa0119220397719b7f82be8b412f02fe5f3a19b0cbe174f18c91b7f`
MD5	`51e38531b1a54d1f778cd6261d8ba52f`
BLAKE2b-256	`345c56ba4694e534b0d04f74b04ffab4238d2fd4bed3f33ca9c5c5a1fa8be916`

Algorithm	Hash digest
SHA256	`9c001391d40de8a190d6df33321c71b90745e7066b524e7da09f7fa3ede0b997`
MD5	`e31bf0381d225966aa95e641b08ee7b7`
BLAKE2b-256	`8f9593736dbd321512c5d429383d16cfbf7a381983b547f8d8ea81f71be0672a`

Algorithm	Hash digest
SHA256	`cd4c6f48e1e6347b16948383ac86bef8da63890bd272d96d4d7b0260402378dc`
MD5	`d2290fe8e4c4503c1a2a173153c75ae9`
BLAKE2b-256	`d47d318e0420fcd2a8191ae03f9f4ef18dd08416318c85cf25d6d164015a2502`

Algorithm	Hash digest
SHA256	`661c63165cd598b542fda939f073f91cdadc156daca13a7b8c2c786ed7226d0a`
MD5	`3f68d42cdcd87210f07673dc356c1424`
BLAKE2b-256	`9b1d5a4a50f4765300cbe3fd32dd571f1aabdd1b85682232424ac84da6dc460c`

Algorithm	Hash digest
SHA256	`9695888c2f816b662d6330a64fb64d90b25ff1a199bef39c85f3a5e28c3ac321`
MD5	`cfda1e87b4e590888d000152e0781f3f`
BLAKE2b-256	`35d2dc345e87e84a9554ad36d83a86c07643087ad094ff1278e548aab0c7d713`

Algorithm	Hash digest
SHA256	`10252fa958eeafce96ed939f355570aad7f9253e0a0edfcd45be136c19797670`
MD5	`d19818481c43be638dacdc90c474e42e`
BLAKE2b-256	`6e20b98c3a7af30551547d76dd515c6972a4d144483a1f47edd0f07332403e1e`

Algorithm	Hash digest
SHA256	`acd38201fe72b779c4dedfe8d31722097c4d6ed68a24ff393c9e3545935e569d`
MD5	`7fc0208d641b2dd349030f98734c7214`
BLAKE2b-256	`9cca4679ac92d6dddc7cc102064dc51a957d1b8941480424cbd6e85d6afaf305`

Algorithm	Hash digest
SHA256	`0d655132350ed8ff1d1f2d8a985e9944059e2d0476393f1c336074f433af721f`
MD5	`b5a230999aa32905615e904e6d475536`
BLAKE2b-256	`784e08f3624c6144a68d9528c02b5ee4222c2323df006cc6cca0175ec65ccba6`

Algorithm	Hash digest
SHA256	`b3a5b497e0033c8e1f08d74c123777a43112ef1ee8017e3eb453e9aee5780e63`
MD5	`49e4c4c5a701a76bab179506bd879451`
BLAKE2b-256	`02c64004cc84f272792ef0c69a33e5fb1c4bc644b16c4c06c60918d6e80e3fd8`

Algorithm	Hash digest
SHA256	`6428747abbac624ae54771b6291b07a3a386a579c5af421b6198843d352d9004`
MD5	`c0bc7373a3b4cfe263d36fda5c2a6d18`
BLAKE2b-256	`f61e27f7637722b69e6e02f61486907d3caf1623ea3f035be1d7a6646093207a`

Algorithm	Hash digest
SHA256	`3a09c8d60b768b82c1959865fda1dfff952f41443164a6e787f739d395336cc9`
MD5	`f401fd9322b68cd4d7f31d0a1b64e27e`
BLAKE2b-256	`8db795670acebee9c4d0cc9814abe7a14ec58790c9b0f25f9fc4e4bac7e4f8f4`

Algorithm	Hash digest
SHA256	`010b8d2a8a19a4b4baa94471d5d04e701480151096c980638f18d901af875bd0`
MD5	`4865cf37a27364065c07ff23484ec60d`
BLAKE2b-256	`50039bbbd2bf4042f9e9f22d8b9fad879430daed183bfc7e36bcd04d36147152`

Algorithm	Hash digest
SHA256	`7eb5140bc5b41e8f73e15a65e18e779d64920ba6f90a8ad5caf51e2031f1be68`
MD5	`292ec53207d5b9033f78b76db8f3fbaf`
BLAKE2b-256	`b9fe475f3a822ad749d19ae2a0f242e35fbd3145eece83d0d7a1402d2fa2cdd5`

Algorithm	Hash digest
SHA256	`a0fe0d5d919a4d5ec5979e4ff1bd06c54e4f81897b727c11a5be209ad0b2bfff`
MD5	`4e1e7771a55c5c11c566c0d0e9f9b939`
BLAKE2b-256	`24c5033c3516c55aeba0b4e87152263401378a4525685f47d8b9296bd18600ec`

Algorithm	Hash digest
SHA256	`3b91ffe11fa5309060bb5f05bc1fa82a9766f49d91c74680d7ebe378187a9eb5`
MD5	`b3f85380aa1302b9ba232589a6fa47b2`
BLAKE2b-256	`e59c96b72999973c98be6314c4cbab6ffc5e5cbc00e43c213c2ca9df629f3199`

Algorithm	Hash digest
SHA256	`4fdfe63d2bb8b6ba6445f99855f19c38f3e0a4b53aba6284dc30afec34a6397d`
MD5	`74aed0cd365d51aaab032ce1aea46804`
BLAKE2b-256	`8186f02617313ae129d91d5a9277aab8a58ae24e66d358491c891ccb2628ac4f`

Algorithm	Hash digest
SHA256	`774324607182c54f05db9898b4874bf679d12c4943a798e62f3f8a6298fec44c`
MD5	`495ee146a300e16dcf9da8e8d76cafa2`
BLAKE2b-256	`b6b61bd92f76d34bbb7ca8650e200a750552a1334e0d0bd816782c50ed2b13e9`

Algorithm	Hash digest
SHA256	`5e1c9fc7b173921398078db04095d8a5b54c6d8ec1bee4658caaf74db420ca60`
MD5	`daaeca34d77b04fe4f33ac153b153a40`
BLAKE2b-256	`a8d2d178821ecbeda9ad51546b3c51a22ffb050ecae33689fd6d769571dc6957`

Algorithm	Hash digest
SHA256	`675908be83859c63680127dab69a7bed6bfd5d3d033f7d55d7b6e71b457b0706`
MD5	`e730cdf8c7fdc0667683719e1de05bbe`
BLAKE2b-256	`9aa7bd1ac9335fba02d24c30fa2bd5004dd46339d10eeaed458e61608238db29`

Algorithm	Hash digest
SHA256	`8c44e6a764468784faca4ab6684589f4d5df5fac489100cc31372ec2829fa681`
MD5	`6aa23228b2510481977b53cdf4614621`
BLAKE2b-256	`e06e1160eaffe7ec29f41da37e863c5f9d16acb1fd0494eb36d7211daacbdfda`

Algorithm	Hash digest
SHA256	`bdf8f089d43dda6372ed16f76db7d2a60399b87a340594bd63a44c8aebec34ff`
MD5	`f500499513c0ad440482c61b4af79209`
BLAKE2b-256	`e116ff81398ada299c6958a904d33c14bcc866207fceaa2c5805a94977ee6fc9`

openvino-genai 2025.4.0.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

OpenVINO™ GenAI Library

Install OpenVINO™ GenAI

OpenVINO™ GenAI Dependencies

Usage

Prerequisites

Python

C++

Performance Metrics

Accumulating metrics

Using raw performance metrics

Structured Output generation

Tokenization

How It Works

Supported Models

Debug Log

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

Algorithm	Hash digest
SHA256	`9263d15a64477a2f4a15fbffa687c0a79574c4316eb1cf9247b4f1398701e3bb`
MD5	`eecf78966a04c6718f235f18d67c4004`
BLAKE2b-256	`9306681a973acb7049eb8a672594d0046b5836019341d719f19c763be1ded8af`

Algorithm	Hash digest
SHA256	`93dc9ef34853326798ccdbbe29005681e7b5e194f6a5d3960515ba6a799b936e`
MD5	`20d5c0ef2c11cb99c52d62d516a6926f`
BLAKE2b-256	`574cc088b4961cf826f746d31ddcca41b38d41fcbc0954d70eea1855acbb9a61`

Algorithm	Hash digest
SHA256	`454e2176628fe06a3f510bb5773087d904f501efffdd700323af1f40c5d155ee`
MD5	`25aea8c95400fa66602582af2d9a28b3`
BLAKE2b-256	`c690f9e03c9eea15937054152794809b3afcbb03532db2c01277a512a00ff546`

Algorithm	Hash digest
SHA256	`c6f3b67089099e3b3db4cf9b53600fa93ba12db20911f6e9688381ba2135d755`
MD5	`d55468063e3aa35b7494c0314451d9cd`
BLAKE2b-256	`9ed4ca169aabdbf97b2e9f947500f66a8158d8af377910da39f608e85505150d`

Algorithm	Hash digest
SHA256	`6bbf6ead3067d8efa0ab29479b215ed1ae19e8c4dfdf0993ecaee20aac7d2e78`
MD5	`d14b76a22ab8342bfef7866a33867bcb`
BLAKE2b-256	`169b5f390c7fd3a02958df8013f1c680b011752c5f89c764a74cb98b1444bb1e`

Algorithm	Hash digest
SHA256	`7b8f01579dbc92ada1717b2c33a6da0da88988faefa4aed9047078d3f9839f7e`
MD5	`916fde4457c58a38fec79b6f8bf34a41`
BLAKE2b-256	`486a745b62bf8af2492ac10c9763d0b9c85f10a41bf34425d41a75d59e534ec3`