Utility functions for Azure GenAI

These details have not been verified by PyPI

Project links

Homepage

Project description

Azure GenAI Utils

This repository contains a set of utilities for working with Azure GenAI. The utilities are written in Python and are designed to be used for Hackathons, Workshops, and other events where you need to quickly get started with Azure GenAI.

Requirements

Azure Subscription
Azure AI Foundry
Bing Search API Key
Python 3.8 or later

.env file: Please do not forget to modify the .env file to match your account. Rename .env.sample to .env or copy and use it

AZURE_OPENAI_ENDPOINT=xxxxx
AZURE_OPENAI_API_KEY=xxxxx
OPENAI_API_VERSION=2024-12-01-preview
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o-mini

# Optinoal, but required for LangChain
LANGCHAIN_TRACING_V2=false
LANGCHAIN_ENDPOINT=https://api.smith.langchain.com
LANGCHAIN_API_KEY=xxxxx
LANGCHAIN_PROJECT="YOUR-PROJECT"

Installation

PyPI

pip install azure-genai-utils

From Source

python setup.py install

Usage

Azure OpenAI Test

Expand

from azure_genai_utils.aoai import AOAI
aoai = AOAI()
aoai.test_api_call()

PDF RAG Chain

Expand

from azure_genai_utils.rag.pdf import PDFRetrievalChain

pdf_path = "[YOUR-PDF-PATH]"

pdf = PDFRetrievalChain(
    source_uri=[pdf_path],
    loader_type="PDFPlumber",
    model_name="gpt-4o-mini",
    embedding_name="text-embedding-3-large",
    chunk_size=500,
    chunk_overlap=50,
).create_chain()

question = "[YOUR-QUESTION]"
docs = pdf.retriever.invoke(question)
results = pdf.chain.invoke({"chat_history": "", "question": question, "context": docs})

Bing Search

Please make sure to set the following environment variables in your .env file:

BING_SUBSCRIPTION_KEY=xxxxx

Expand

from azure_genai_utils.tools import BingSearch
from dotenv import load_dotenv

# You need to add BING_SUBSCRIPTION_KEY=xxxx in .env file
load_dotenv()

# Basic usage
bing = BingSearch(max_results=2, locale="ko-KR")
results = bing.invoke("Microsoft AutoGen")
print(results)

## Include news search results and format output
bing = BingSearch(
    max_results=2,
    locale="ko-KR",
    include_news=True,
    include_entity=False,
    format_output=True,
)
results = bing.invoke("Microsoft AutoGen")
print(results)

LangGraph Example (Bing Search + Azure GenAI)

Expand

import json
from typing import Annotated
from typing_extensions import TypedDict
from langchain_openai import AzureChatOpenAI
from langchain_core.messages import ToolMessage
from langgraph.graph.message import add_messages
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode
from langgraph.graph import START, END
from azure_genai_utils.tools import BingSearch
from dotenv import load_dotenv

load_dotenv()

class State(TypedDict):
    messages: Annotated[list, add_messages]

llm = AzureChatOpenAI(model="gpt-4o-mini")
tool = BingSearch(max_results=3, format_output=False)
tools = [tool]
llm_with_tools = llm.bind_tools(tools)

def chatbot(state: State):
    answer = llm_with_tools.invoke(state["messages"])
    return {"messages": [answer]}

def route_tools(
    state: State,
):
    if messages := state.get("messages", []):
        ai_message = messages[-1]
    else:
        raise ValueError(f"No messages found in input state to tool_edge: {state}")

    if hasattr(ai_message, "tool_calls") and len(ai_message.tool_calls) > 0:
        return "tools"

    return END

graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot)
tool_node = ToolNode(tools=[tool])
graph_builder.add_node("tools", tool_node)

graph_builder.add_conditional_edges(
    source="chatbot",
    path=route_tools,
    path_map={"tools": "tools", END: END},
)

graph_builder.add_edge("tools", "chatbot")
graph_builder.add_edge(START, "chatbot")
graph = graph_builder.compile()

# Test
inputs = {"messages": "Microsoft AutoGen"}

for event in graph.stream(inputs, stream_mode="values"):
    for key, value in event.items():
        print(f"\n==============\nSTEP: {key}\n==============\n")
        print(value[-1])

Synthetic Data Generation

Expand

from azure_genai_utils.synthetic import (
    QADataGenerator,
    CustomQADataGenerator,
    QAType,
    generate_qas,
)

input_batch = [
    "The quick brown fox jumps over the lazy dog.",
    "What is the capital of France?",
]

model_config = {
    "deployment": "gpt-4o-mini",
    "model": "gpt-4o-mini",
    "max_tokens": 256,
}

try:
    qa_generator = QADataGenerator(model_config=model_config)
    # qa_generator = CustomQADataGenerator(
    #     model_config=model_config, templates_dir=f"./azure_genai_utils/synthetic/prompt_templates/ko"
    # )
    task = generate_qas(
        input_texts=input_batch,
        qa_generator=qa_generator,
        qa_type=QAType.LONG_ANSWER,
        num_questions=2,
        concurrency=3,
    )
except Exception as e:
    print(f"Error generating QAs: {e}")

Azure Custom Speech

Please make sure to set the following environment variables in your .env file:

AZURE_AI_SPEECH_REGION=xxxxx
AZURE_AI_SPEECH_API_KEY=xxxxx

Expand

from azure_genai_utils.stt.stt_generator import CustomSpeechToTextGenerator

# Initialize the CustomSpeechToTextGenerator
stt = CustomSpeechToTextGenerator(
    custom_speech_lang="Korean",
    synthetic_text_file="cc_support_expressions.jsonl",
    train_output_dir="synthetic_data_train",
    train_output_dir_aug="synthetic_data_train_aug",
    eval_output_dir="synthetic_data_eval",
)

### Training set
# Generate synthetic text
topic = "Call center QnA related expected spoken utterances"
content = stt.generate_synthetic_text(
    topic=topic, num_samples=2, model_name="gpt-4o-mini"
)
stt.save_synthetic_text(output_dir="plain_text")

# Generate synthetic wav files for training
train_tts_voice_list = [
    "ko-KR-InJoonNeural",
    "zh-CN-XiaoxiaoMultilingualNeural",
    "en-GB-AdaMultilingualNeural",
]
stt.generate_synthetic_wav(
    mode="train", tts_voice_list=train_tts_voice_list, delete_old_data=True
)

# Augment the train data (Optional)
stt.augment_wav_files(num_augments=4)
# Package the train data to be used in the training pipeline
stt.package_trainset(use_augmented_data=True)

### Evaluation set
# Generate synthetic wav files for evaluation
eval_tts_voice_list = ["ko-KR-YuJinNeural"]
stt.generate_synthetic_wav(
    mode="eval", tts_voice_list=eval_tts_voice_list, delete_old_data=True
)
# Package the eval data to be used in the evaluation pipeline
stt.package_evalset(eval_dataset_dir="eval_dataset")

License Summary

This sample code is provided under the Apache 2.0 license. See the LICENSE file.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.2.21

May 23, 2025

0.0.2.20

Feb 1, 2025

0.0.2.19

Feb 1, 2025

This version

0.0.2.18

Feb 1, 2025

0.0.2.17

Jan 31, 2025

0.0.2.16

Jan 21, 2025

0.0.2.15

Jan 20, 2025

0.0.2.14

Jan 19, 2025

0.0.2.13

Jan 18, 2025

0.0.2.12

Jan 18, 2025

0.0.2.11

Jan 18, 2025

0.0.2.10

Jan 17, 2025

0.0.2.9

Jan 17, 2025

0.0.2.8

Jan 16, 2025

0.0.2.7

Jan 16, 2025

0.0.2.6

Jan 16, 2025

0.0.2.5

Jan 15, 2025

0.0.2.4

Jan 15, 2025

0.0.2.3

Jan 15, 2025

0.0.2.1

Jan 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

azure_genai_utils-0.0.2.18.tar.gz (229.0 kB view details)

Uploaded Feb 1, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

azure_genai_utils-0.0.2.18-py3-none-any.whl (235.9 kB view details)

Uploaded Feb 1, 2025 Python 3

File details

Details for the file azure_genai_utils-0.0.2.18.tar.gz.

File metadata

Download URL: azure_genai_utils-0.0.2.18.tar.gz
Upload date: Feb 1, 2025
Size: 229.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for azure_genai_utils-0.0.2.18.tar.gz
Algorithm	Hash digest
SHA256	`d6655c364f3b0c9170e85024f79301ec2af59ee94f6ea2d7b08d3ef7cd4ee9b0`
MD5	`ef6877b11782d850cab8af35275f2cb2`
BLAKE2b-256	`974520d3527b8c550a2ca53b09d4c19e5331ca4a3daaa014d9ee95ac504e2b62`

See more details on using hashes here.

File details

Details for the file azure_genai_utils-0.0.2.18-py3-none-any.whl.

File metadata

Download URL: azure_genai_utils-0.0.2.18-py3-none-any.whl
Upload date: Feb 1, 2025
Size: 235.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for azure_genai_utils-0.0.2.18-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4803682877051720678f2eefbdbc78c9239afe06a77d841c17fea9f8cbd192fa`
MD5	`ed7ad4c6a95817a0b8d4931255e305d9`
BLAKE2b-256	`e937b1f3761eaa950b56038f84ba9f556376d1c69fda0fe16a520f3934e390fb`

See more details on using hashes here.

azure-genai-utils 0.0.2.18

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Azure GenAI Utils

Requirements

Installation

PyPI

From Source

Usage

Azure OpenAI Test

PDF RAG Chain

Bing Search

LangGraph Example (Bing Search + Azure GenAI)

Synthetic Data Generation

Azure Custom Speech

License Summary

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes