No project description provided

These details have not been verified by PyPI

Project links

Project description

Overview

This package provides a vectorized interface for the OpenAI API, enabling you to process multiple inputs with a single API call instead of sending requests one by one. This approach helps reduce latency and simplifies your code.

Additionally, it integrates effortlessly with Pandas DataFrames and Apache Spark UDFs, making it easy to incorporate into your data processing pipelines.

Features

Vectorized API requests for processing multiple inputs at once.
Seamless integration with Pandas DataFrames.
A UDF builder for Apache Spark.
Compatibility with multiple OpenAI clients, including Azure OpenAI.

Requirements

Python 3.10 or higher

Installation

Install the package with:

pip install openaivec

If you want to uninstall the package, you can do so with:

pip uninstall openaivec

Basic Usage

import os
from openai import OpenAI
from openaivec import VectorizedOpenAI


# Initialize the vectorized client with your system message and parameters
client = VectorizedOpenAI(
    client=OpenAI(...),
    temperature=0.0,
    top_p=1.0,
    model_name="<your-model-name>",
    system_message="Please answer only with 'xx family' and do not output anything else."
)

result = client.predict(["panda", "rabbit", "koala"])
print(result)  # Expected output: ['bear family', 'rabbit family', 'koala family']

Using with Pandas DataFrame

import pandas as pd

df = pd.DataFrame({"name": ["panda", "rabbit", "koala"]})

df.assign(
    kind=lambda df: client.predict(df.name)
)

Example output:

name	kind
panda	bear family
rabbit	rabbit family
koala	koala family

Using with Apache Spark UDF

Below is an example showing how to create UDFs for Apache Spark using the provided UDFBuilder. This configuration is intended for use with Azure OpenAI.

from openaivec.spark import UDFBuilder

udf = UDFBuilder(
    api_key="<your-api-key>",
    api_version="2024-10-21",
    endpoint="https://<your_resource_name>.openai.azure.com",
    model_name="<your_deployment_name>"
)

# Register UDFs (e.g., to extract flavor or product type from product names)
spark.udf.register("parse_taste", udf.completion("""
- Extract flavor-related information from the product name. Return only the concise flavor name with no extra text.
- Minimize unnecessary adjectives related to the flavor.
    - Example:
        - Hokkaido Milk → Milk
        - Uji Matcha → Matcha
"""))

# Register UDFs (e.g., to extract product type from product names)
spark.udf.register("parse_product", udf.completion("""
- Extract the type of food from the product name. Return only the food category with no extra text.
- Example output:
    - Smoothie
    - Milk Tea
    - Protein Bar
"""))

You can then use the UDFs in your Spark SQL queries as follows:

SELECT id,
       product_name,
       parse_taste(product_name)   AS taste,
       parse_product(product_name) AS product
FROM product_names;

Example Output:

id	product_name	taste	product
4414732714624	Cafe Mocha Smoothie (Trial Size)	Mocha	Smoothie
4200162318339	Dark Chocolate Tea (New Product)	Chocolate	Tea
4920122084098	Cafe Mocha Protein Bar (Trial Size)	Mocha	Protein Bar
4468864478874	Dark Chocolate Smoothie (On Sale)	Chocolate	Smoothie
4036242144725	Uji Matcha Tea (New Product)	Matcha	Tea
4847798245741	Hokkaido Milk Tea (Trial Size)	Milk	Milk Tea
4449574211957	Dark Chocolate Smoothie (Trial Size)	Chocolate	Smoothie
4127044426148	Fruit Mix Tea (Trial Size)	Fruit	Tea
...	...	...	...

Building Prompts

Building prompt is a crucial step in using LLMs. In particular, providing a few examples in a prompt can significantly improve an LLM’s performance, a technique known as "few-shot learning." Typically, a few-shot prompt consists of a purpose, cautions, and examples.

FewShotPromptBuilder is a class that helps you build a few-shot learning prompt with simple interface.

Basic Usage

FewShotPromptBuilder requires simply a purpose, cautions, and examples, and build method will return rendered prompt with XML format.

Here is an example:

from openaivec.prompt import FewShotPromptBuilder

prompt: str = (
    FewShotPromptBuilder()
    .purpose("Return the smallest category that includes the given word")
    .caution("Never use proper nouns as categories")
    .example("Apple", "Fruit")
    .example("Car", "Vehicle")
    .example("Tokyo", "City")
    .example("Keiichi Sogabe", "Musician")
    .example("America", "Country")
    .build()
)
print(prompt)

The output will be:

<Prompt>
    <Purpose>Return the smallest category that includes the given word</Purpose>
    <Cautions>
        <Caution>Never use proper nouns as categories</Caution>
    </Cautions>
    <Examples>
        <Example>
            <Source>Apple</Source>
            <Result>Fruit</Result>
        </Example>
        <Example>
            <Source>Car</Source>
            <Result>Vehicle</Result>
        </Example>
        <Example>
            <Source>Tokyo</Source>
            <Result>City</Result>
        </Example>
        <Example>
            <Source>Keiichi Sogabe</Source>
            <Result>Musician</Result>
        </Example>
        <Example>
            <Source>America</Source>
            <Result>Country</Result>
        </Example>
    </Examples>
</Prompt>

Improve with openai

For most analysts, it can be challenging to write a prompt entirely free of contradictions, ambiguities, or redundancies. FewShotPromptBuilder provides a method improve to help you improve the prompt with OpenAI's API.

improve method will try to eliminate contradictions, ambiguities, and redundancies in the prompt with OpenAI's API, and iterate the process up to max_iter times.

from openai import OpenAI
from openaivec.prompt import FewShotPromptBuilder

client = OpenAI(...)
model_name = "<your-model-name>"
improved_prompt: str = (
    FewShotPromptBuilder()
    .purpose("Return the smallest category that includes the given word")
    .caution("Never use proper nouns as categories")
    # Examples has contradictions, ambiguities, or redundancies
    .example("Apple", "Fruit")
    .example("Apple", "Technology")
    .example("Apple", "Company")
    .example("Apple", "Color")
    .example("Apple", "Animal")
    # improve the prompt with OpenAI's API, max_iter is number of iterations to improve the prompt.
    .improve(client, model_name, max_iter=5)
    .build()
)
print(improved_prompt)

Then we will get the improved prompt with extra examples, improved purpose, and cautions:

<Prompt>
    <Purpose>Classify a given word into its most relevant category by considering its context and potential meanings.
        The input is a word accompanied by context, and the output is the appropriate category based on that context.
        This is useful for disambiguating words with multiple meanings, ensuring accurate understanding and
        categorization.
    </Purpose>
    <Cautions>
        <Caution>Ensure the context of the word is clear to avoid incorrect categorization.</Caution>
        <Caution>Be aware of words with multiple meanings and provide the most relevant category.</Caution>
        <Caution>Consider the possibility of new or uncommon contexts that may not fit traditional categories.</Caution>
    </Cautions>
    <Examples>
        <Example>
            <Source>Apple (as a fruit)</Source>
            <Result>Fruit</Result>
        </Example>
        <Example>
            <Source>Apple (as a tech company)</Source>
            <Result>Technology</Result>
        </Example>
        <Example>
            <Source>Java (as a programming language)</Source>
            <Result>Technology</Result>
        </Example>
        <Example>
            <Source>Java (as an island)</Source>
            <Result>Geography</Result>
        </Example>
        <Example>
            <Source>Mercury (as a planet)</Source>
            <Result>Astronomy</Result>
        </Example>
        <Example>
            <Source>Mercury (as an element)</Source>
            <Result>Chemistry</Result>
        </Example>
        <Example>
            <Source>Bark (as a sound made by a dog)</Source>
            <Result>Animal Behavior</Result>
        </Example>
        <Example>
            <Source>Bark (as the outer covering of a tree)</Source>
            <Result>Botany</Result>
        </Example>
        <Example>
            <Source>Bass (as a type of fish)</Source>
            <Result>Aquatic Life</Result>
        </Example>
        <Example>
            <Source>Bass (as a low-frequency sound)</Source>
            <Result>Music</Result>
        </Example>
    </Examples>
</Prompt>

Using with Microsoft Fabric

Microsoft Fabric is a unified, cloud-based analytics platform that seamlessly integrates data engineering, warehousing, and business intelligence to simplify the journey from raw data to actionable insights.

This section provides instructions on how to integrate and use vectorize-openai within Microsoft Fabric. Follow these steps:

Create an Environment in Microsoft Fabric:
- In Microsoft Fabric, click on New item in your workspace.
- Select Environment to create a new environment for Apache Spark.
- Determine the environment name, eg. openai-environment.
- Figure: Creating a new Environment in Microsoft Fabric.
Add openaivec to the Environment from Public Library
- Once your environment is set up, go to the Custom Library section within that environment.
- Click on Add from PyPI and search for latest version of openaivec.
- Save and publish to reflect the changes.
- Figure: Add openaivec from PyPI to Public Library
Use the Environment from a Notebook:
- Open a notebook within Microsoft Fabric.
- Select the environment you created in the previous steps.
- Figure: Using custom environment from a notebook.
- In the notebook, import and use openaivec.spark.UDFBuilder as you normally would. For example:
```
from openaivec.spark import UDFBuilder

udf = UDFBuilder(
    api_key="<your-api-key>",
    api_version="2024-10-21",
    endpoint="https://<your-resource-name>.openai.azure.com",
    model_name="<your-deployment-name"
)
```

Following these steps allows you to successfully integrate and use vectorize-openai within Microsoft Fabric.

Contributing

We welcome contributions to this project! If you would like to contribute, please follow these guidelines:

Fork the repository and create your branch from main.
If you've added code that should be tested, add tests.
Ensure the test suite passes.
Make sure your code lints.

Installing Dependencies

To install the necessary dependencies for development, run:

poetry install --dev

Code Formatting

To reformat the code, use the following command:

poetry run black ./openaivec

Linting

To check for linting issues, use the following command:

poetry run flake8 ./openaivec

Community

Join our Discord community for developers: https://discord.gg/vbb83Pgn

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.10

Mar 31, 2026

1.1.9

Mar 24, 2026

1.1.8

Mar 2, 2026

1.1.7

Feb 13, 2026

1.1.6

Feb 12, 2026

1.1.5

Feb 12, 2026

1.1.4

Feb 11, 2026

1.1.3

Feb 11, 2026

1.1.2

Feb 10, 2026

1.1.1

Feb 10, 2026

1.1.0

Feb 10, 2026

1.0.14

Feb 9, 2026

1.0.13

Feb 4, 2026

1.0.12

Feb 4, 2026

1.0.11

Feb 3, 2026

1.0.10

Feb 1, 2026

1.0.9

Jan 28, 2026

1.0.8

Jan 5, 2026

1.0.7

Dec 2, 2025

1.0.6

Dec 2, 2025

1.0.5

Dec 1, 2025

1.0.4

Nov 27, 2025

1.0.3

Nov 27, 2025

1.0.2

Nov 22, 2025

1.0.1

Nov 21, 2025

1.0.0

Nov 16, 2025

0.99.3

Nov 15, 2025

0.99.2

Nov 12, 2025

0.99.1

Oct 22, 2025

0.99.0

Oct 15, 2025

0.15.1

Oct 6, 2025

0.15.0

Oct 6, 2025

0.14.14

Aug 28, 2025

0.14.13

Aug 26, 2025

0.14.12

Aug 25, 2025

0.14.10

Aug 21, 2025

0.14.9

Aug 20, 2025

0.14.8

Aug 19, 2025

0.14.7

Aug 18, 2025

0.14.6

Aug 18, 2025

0.14.5

Aug 18, 2025

0.14.4

Aug 18, 2025

0.14.3

Aug 17, 2025

0.14.2

Aug 15, 2025

0.14.1

Aug 15, 2025

0.14.0

Aug 14, 2025

0.13.7

Aug 14, 2025

0.13.6

Aug 14, 2025

0.13.5

Aug 13, 2025

0.13.4

Aug 13, 2025

0.13.3

Aug 12, 2025

0.13.2

Aug 11, 2025

0.13.1

Aug 10, 2025

0.13.0

Aug 10, 2025

0.12.6

Aug 6, 2025

0.12.5

Aug 6, 2025

0.12.4

Aug 6, 2025

0.12.3

Aug 6, 2025

0.12.2

Aug 6, 2025

0.12.1

Aug 6, 2025

0.12.0

Aug 4, 2025

0.11.3

Aug 4, 2025

0.11.2

Aug 2, 2025

0.11.1

Aug 1, 2025

0.11.0

Aug 1, 2025

0.10.5

Jul 31, 2025

0.10.4

Jul 26, 2025

0.10.3

Jul 25, 2025

0.10.2

Jul 25, 2025

0.10.1

Jul 25, 2025

0.10.0

Jul 25, 2025

0.9.7

Jul 16, 2025

0.9.6

Jul 16, 2025

0.9.5

Jul 16, 2025

0.9.4

Jul 15, 2025

0.9.3

Jul 14, 2025

0.9.2

Jul 11, 2025

0.9.1

Jul 11, 2025

0.9.0

Jul 11, 2025

0.8.10

Jun 19, 2025

0.8.9

Jun 9, 2025

0.8.8

Jun 9, 2025

0.8.7

May 11, 2025

0.8.6

May 8, 2025

0.8.5

May 7, 2025

0.8.4

May 6, 2025

0.8.3

May 6, 2025

0.8.2

May 6, 2025

0.8.1

May 5, 2025

0.8.0

May 5, 2025

0.7.6

May 5, 2025

0.7.5

May 4, 2025

0.7.4

May 4, 2025

0.7.3

May 2, 2025

0.7.2

May 2, 2025

0.7.1

May 1, 2025

0.7.0

Apr 22, 2025

0.6.11

Apr 20, 2025

0.6.10

Apr 20, 2025

0.6.9

Apr 20, 2025

0.6.8

Apr 20, 2025

0.6.7

Apr 20, 2025

0.6.6

Apr 20, 2025

0.6.5

Apr 20, 2025

0.6.4

Apr 20, 2025

0.6.3

Apr 19, 2025

0.6.2

Apr 17, 2025

0.6.1

Apr 17, 2025

0.6.0

Apr 16, 2025

0.5.4

Apr 16, 2025

0.5.3

Apr 15, 2025

0.5.2

Mar 30, 2025

0.5.1

Mar 28, 2025

0.5.0

Mar 25, 2025

0.4.9

Mar 19, 2025

0.4.8

Mar 19, 2025

0.4.7

Mar 18, 2025

0.4.6

Mar 17, 2025

0.4.5

Mar 16, 2025

0.4.4

Mar 8, 2025

0.4.3

Mar 3, 2025

0.4.2

Mar 3, 2025

0.4.1

Mar 2, 2025

0.4.0

Feb 24, 2025

0.3.4

Feb 22, 2025

0.3.3

Feb 22, 2025

This version

0.3.2

Feb 21, 2025

0.3.1

Feb 17, 2025

0.3.0

Feb 17, 2025

0.2.11

Feb 5, 2025

0.2.10

Feb 5, 2025

0.2.9

Feb 5, 2025

0.2.8

Feb 4, 2025

0.2.7

Feb 4, 2025

0.2.6

Feb 4, 2025

0.2.5

Feb 4, 2025

0.2.2

Feb 3, 2025

0.2.1

Feb 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openaivec-0.3.2.tar.gz (18.5 kB view details)

Uploaded Feb 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openaivec-0.3.2-py3-none-any.whl (18.7 kB view details)

Uploaded Feb 21, 2025 Python 3

File details

Details for the file openaivec-0.3.2.tar.gz.

File metadata

Download URL: openaivec-0.3.2.tar.gz
Upload date: Feb 21, 2025
Size: 18.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for openaivec-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`3254aa081df13ba85d87428f1cca5c5561f9c453720c6a403f3621dc885e5655`
MD5	`6b131bc1b6efc0b3edb0c65aef33b58e`
BLAKE2b-256	`8a5d29f8d6ce5cd23c3966cfb30d3e153731192f165ad135e43486bf8144c5f0`

See more details on using hashes here.

File details

Details for the file openaivec-0.3.2-py3-none-any.whl.

File metadata

Download URL: openaivec-0.3.2-py3-none-any.whl
Upload date: Feb 21, 2025
Size: 18.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.10.16 Linux/6.8.0-1021-azure

File hashes

Hashes for openaivec-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bebc60986fd4cafe58cd0ebc56a2e03762bb0067b619dd64083759ae5fbd8653`
MD5	`82fbba737a88c9ef44415c9996ba8832`
BLAKE2b-256	`cd5e4fb3e2407f9d028b4c4effecb618ed3d3fd044337dc474ac33e6c4f4298d`

See more details on using hashes here.

openaivec 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Overview

Features

Requirements

Installation

Basic Usage

Using with Pandas DataFrame

Using with Apache Spark UDF

Building Prompts

Basic Usage

Improve with openai

Using with Microsoft Fabric

Contributing

Installing Dependencies

Code Formatting

Linting

Community

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes