A utility package for AI development
Project description
StructAI
StructAI is a comprehensive utility library for accelerating LLM application development, including multi-agent systems. It offers a robust toolkit for LLM interaction—such as structured outputs, context management, and parallel execution—streamlining development workflows and facilitating the deployment of scalable, production-ready AI systems.
⚙️ Installation
Recommended for most users. Installs the latest stable release from PyPI.
pip install structai
For development. Installs StructAI in editable mode from source, enabling live code changes.
git clone https://github.com/black-yt/structai.git
cd structai
pip install -e .
Note: Before using LLM-related features, please ensure you have set the necessary environment variables:
export LLM_API_KEY="your-api-key"
export LLM_BASE_URL="your-api-base-url"
Note: If you need to use PDF parsing-related functions, please apply for the API at MinerU and add it to your environment variables.
export MINERU_TOKEN="your-mineru-api-key"
📚 StructAI Library Documentation
Table of Contents
Skill
structai_skill
Returns a comprehensive documentation string for the StructAI library in Markdown format. This is useful for providing context to LLMs about the available tools in this library.
-
Args:
- None
-
Returns:
- (str): The documentation string.
-
Example:
from structai import structai_skill
docs = structai_skill()
print(docs)
LLMs/vLLMs
LLMAgent Class
A powerful wrapper class for interacting with OpenAI-compatible LLM APIs. It handles retries, timeouts, and structured output validation.
initialization
-
Args:
api_key(str, optional): API Key. Defaults toos.environ["LLM_API_KEY"].api_base(str, optional): Base URL. Defaults toos.environ["LLM_BASE_URL"].model_version(str, optional): Model identifier. Default'gpt-4.1-mini'.system_prompt(str, optional): Default system prompt. Default'You are a helpful assistant.'.max_tokens(int, optional): Maximum tokens for generation. DefaultNone.temperature(float, optional): Sampling temperature. Default0.http_client(httpx.Client, optional): Optional custom httpx client.headers(dict, optional): Optional custom headers.time_limit(int, optional): Timeout in seconds. Default300(5 minutes).max_try(int, optional): Default number of retries. Default1.use_responses_api(bool, optional): Whether to use the Responses API format. DefaultFalse.
-
Returns:
- (LLMAgent): LLMAgent instance.
-
Example:
from structai import LLMAgent
agent = LLMAgent()
__call__
Sends a query to the LLM with built-in validation, parsing, and retry logic.
-
Args:
query(str): The main input text or prompt to be sent to the LLM.system_prompt(str, optional): The system instruction. Overrides the default if provided.return_example(str | list | dict, optional): A template defining the expected structure and type of the response.Noneorstr(default): Returns raw response string.list: Expects a JSON list string. Validates element types if example elements are provided.dict: Expects a JSON object string. Validates keys (supports fuzzy matching).
max_try(int, optional): Max attempts. Defaults to instance'smax_try.wait_time(float, optional): Time in seconds to wait between retries. Default0.0.n(int, optional): Number of completion choices. Default1.max_tokens(int, optional): Overrides instance'smax_tokens.temperature(float, optional): Overrides instance'stemperature.image_paths(list[str], optional): List of local image paths for multimodal models.history(list[dict], optional): Conversation history[{"role": "user", "content": "..."}, ...].use_responses_api(bool, optional): Overrides instance setting.list_len(int, optional): Validation - Enforces exact list length.list_min(int | float, optional): Validation - Enforces minimum value for list elements.list_max(int | float, optional): Validation - Enforces maximum value for list elements.check_keys(bool, optional): Validation - Whether to validate dict keys. DefaultTrue.
-
Returns:
- (str | list | dict): The parsed response from the LLM.
- If
n > 1, returns a list of results. - Returns
Noneif all retries fail.
- If
- (str | list | dict): The parsed response from the LLM.
-
Example:
# Basic usage
response = agent("Generate a random number.", n=3, temperature=1)
# Output: ["Sure! Here's a random number for you: 738", "Sure! Here's a random number: 7382", "Sure! Here's a random number: 487."]
# Enforce the output format (List, Dict, or specific types) using `return_example`. Note that the output format needs to be explicitly specified in the prompt.
numbers = agent(
"Generate 3 random numbers, for example, [1, 2, 3].",
return_example=[1],
list_len=3
)
# Output: [10, 42, 7]
profile = agent(
"Create a user profile for Alice, for example, {'name': Alice, 'age': 1, 'city': 'shanghai'}.",
return_example={"name": "str", "age": 1, "city": "str"}
)
# Output: {'name': 'Alice', 'age': 25, 'city': 'New York'}
# Multimodal input for vision models
description = agent(
"Describe these images",
image_paths=["path/to/image_1.jpg", "path/to/image_2.jpg"]
)
# Memory context
history = [
{"role": "user", "content": "My name is Bob."},
{"role": "assistant", "content": "Hello Bob."}
]
answer = agent(
"What is my name?",
history=history,
)
# Output: 'Your name is Bob.'
messages_to_responses_input
Converts standard Chat Completions messages format (list of dicts) to the input format required by the Responses API.
-
Args:
messages(list[dict]): List of message dictionaries with 'role' and 'content'.
-
Returns:
- (tuple): A tuple containing
(system_prompt_content, input_blocks).
- (tuple): A tuple containing
-
Example:
from structai import messages_to_responses_input
messages = [{"role": "user", "content": "Hello"}]
system_prompt, input_blocks = messages_to_responses_input(messages)
extract_text_outputs
Extracts the text content from an LLM API response object (supports both Chat Completions and Responses API formats).
-
Args:
result(object): The response object from the LLM API.
-
Returns:
- (list[str]): A list of extracted text outputs.
-
Example:
from structai import extract_text_outputs
# Assuming 'response' is the object returned by the OpenAI client
texts = extract_text_outputs(response)
print(texts[0])
print_messages
Print chat messages with colored labels and text.
-
Args:
messages(list): List of message dictionaries withroleandcontent.user_color(str, optional): Color for the user's message text and label background. Default iscyan.ai_color(str, optional): Color for the assistant's message text and label background. Default isyellow.label_text_color(str, optional): Color for the label text (User and Assistant). Default isgrey.
-
Returns:
- None
-
Example:
from structai import print_messages
messages = [
{"role": "user", "content": "My name is Bob."},
{"role": "assistant", "content": "Hello Bob."}
]
print_messages(messages)
Concurrent
multi_thread
Executes a function concurrently for each item in inp_list using a thread pool.
-
Args:
inp_list(list[dict]): A list of dictionaries, where each dictionary contains keyword arguments forfunction.function(callable): The function to execute.max_workers(int, optional): The maximum number of threads. Default40.use_tqdm(bool, optional): Whether to show a progress bar. DefaultTrue.
-
Returns:
- (list): A list of results corresponding to the input list order.
-
Example:
from structai import multi_thread
import time
def square(x):
return x * x
inputs = [{"x": i} for i in range(10)]
results = multi_thread(inputs, square, max_workers=4)
print(results) # [0, 1, 4, 9, ...]
multi_process
Executes a function concurrently for each item in inp_list using a process pool. Ideal for CPU-bound tasks.
-
Args:
inp_list(list[dict]): A list of dictionaries, where each dictionary contains keyword arguments forfunction.function(callable): The function to execute.max_workers(int, optional): The maximum number of processes. Default40.use_tqdm(bool, optional): Whether to show a progress bar. DefaultTrue.
-
Returns:
- (list): A list of results corresponding to the input list order.
-
Example:
from structai import multi_process
# 'heavy_computation' must be defined at the top level for multiprocessing pickling.
def heavy_computation(n):
return sum(range(n))
inputs = [{"n": 1000} for _ in range(5)]
results = multi_process(inputs, heavy_computation)
I/O
load_file
Automatically reads a file based on its extension.
-
Args:
path(str): The path to the file to be read.
-
Returns:
- (Any): The content of the file, parsed into an appropriate Python object.
.json->dictorlist.jsonl->listof dicts.csv,.parquet,.xlsx->pandas.DataFrame.txt,.md,.py->str.pkl-> unpickled object.npy->numpy.ndarray.pt->torchobject.png,.jpg,.jpeg->PIL.Image.Image
- (Any): The content of the file, parsed into an appropriate Python object.
-
Example:
from structai import load_file
# Load a JSON file
data = load_file("config.json")
# Load a CSV file as a pandas DataFrame
df = load_file("data.csv")
# Load an image
image = load_file("photo.jpg")
save_file
Automatically saves data to a file based on the extension. Creates necessary directories if they don't exist.
-
Args:
data(Any): The data object to save.path(str): The destination file path.
-
Returns:
- None
-
Example:
from structai import save_file
data = {"key": "value"}
# Save as JSON
save_file(data, "output.json")
# Save as Pickle
save_file(data, "backup.pkl")
read_pdf
Processes PDF file(s) by uploading them to MinerU for parsing, downloading the results, and loading the extracted content (text and images) into memory.
-
Args:
path(str | list[str]): A single file path (str) or a list of file paths (list[str]) pointing to the PDF files to be processed.
-
Returns:
- (dict | list[dict | None] | None):
- If
pathis a single string, returns a dictionary containing the parsed data, or None if processing failed. - If
pathis a list, returns a list where each element is either a dictionary (success) or None (failure). - The result dictionary has the following structure:
{ "path": str, # The original path of the PDF file. "text": str, # The full extracted text content in Markdown format. "img_paths": list[str], # A list of absolute file paths to the extracted images. "imgs": list[PIL.Image.Image] # A list of PIL Image objects corresponding to the images in `img_paths`. }
- If
- (dict | list[dict | None] | None):
-
Example:
from structai import read_pdf
# Process a single PDF
result = read_pdf("paper.pdf")
if result:
print(result["text"][:100])
print(f"Found {len(result['imgs'])} images")
# Process multiple PDFs
results = read_pdf(["doc1.pdf", "doc2.pdf"])
encode_image
Encodes a PIL Image object into a base64 string.
-
Args:
image_obj(PIL.Image.Image): The image object to encode.
-
Returns:
- (str): The base64 encoded string.
-
Example:
from structai import encode_image
b64_str = encode_image(img)
get_all_file_paths
Recursively retrieves all file paths in a directory that match a given suffix.
-
Args:
directory(str): The root directory to search.suffix(str, optional): The file suffix to filter by (e.g., '.py'). Default''(matches all files).filter_func(callable, optional): A function that takes a file path and returns True to include it. DefaultNone.absolute(bool, optional): Whether to return absolute paths. DefaultTrue.
-
Returns:
- (list[str]): A list of matching file paths.
-
Example:
from structai import get_all_file_paths
# Get all Python files in the current directory
py_files = get_all_file_paths(".", suffix=".py")
print(py_files)
# Get relative paths of all files, excluding those in 'test' directory
files = get_all_file_paths(
".",
filter_func=lambda p: "test" not in p,
absolute=False
)
print_once
Prints a message to stdout only once during the entire program execution. Useful for logging warnings or info inside loops.
-
Args:
msg(str): The message to print.
-
Returns:
- None
-
Example:
from structai import print_once
for i in range(10):
print_once("Starting processing...") # print only once
make_print_once
Creates and returns a local function that prints a message only once. This is useful if you need a "print once" behavior scoped to a specific function or instance rather than globally.
-
Args:
- None
-
Returns:
- (callable): A function
inner(msg)that behaves likeprint_once.
- (callable): A function
-
Example:
from structai import make_print_once
logger1 = make_print_once()
logger2 = make_print_once()
logger1("Hello") # Prints "Hello"
logger1("Hello") # Does nothing
logger2("World") # Prints "World"
logger2("World") # Does nothing
String Processing
extract_markdown_images
Parses Markdown text to extract paths of embedded images.
-
Args:
text(str): The Markdown content string to analyze.
-
Returns:
- (list[str]): A list of image file paths extracted from the Markdown text.
-
Example:
from structai import extract_markdown_images
md_text = "Here is an image: "
images = extract_markdown_images(md_text)
print(images) # ['images/img1.jpg']
sanitize_text
Sanitizes text by keeping only ASCII English characters, digits, and common punctuation. Removes control characters and ANSI codes.
-
Args:
text(str): The text to sanitize.
-
Returns:
- (str): The sanitized text.
-
Example:
from structai import sanitize_text
clean = sanitize_text("Hello \x1b[31mWorld\x1b[0m!")
print(clean) # 'Hello [31mWorld[0m!'
filter_excessive_repeats
Identifies sequences where a single character or a two-character substring repeats at least the specified threshold times and removes them entirely from the string.
-
Args:
text(str): The input string.threshold(int, optional): The maximum allowed consecutive repetitions. Default5.
-
Returns:
- (str): The processed string with excessive repetitions removed.
-
Example:
from structai import filter_excessive_repeats
clean = filter_excessive_repeats("Helloooooo World", threshold=5)
print(clean) # "Hell World"
clean = filter_excessive_repeats("Hello\\b\\b World", threshold=2)
print(clean) # "Heo World"
cutoff_text
Truncate and sanitize a string so that its final length is guaranteed to be <= l. The function applies a series of progressively stronger transformations:
- Sanitize text with
sanitize_text. - Reduce repetitions with
filter_excessive_repeats. - If still too long, keep a head and tail segment and insert a separator in the middle.
- Apply a final hard cutoff as a safety net.
-
Args:
s(str): Input string to be processed. May contain invalid Unicode, excessive repetition, or arbitrarily long content.l(int): Maximum allowed length of the returned string. Must be greater than9. Defaults to20_000.
-
Returns:
- (str): A processed string whose length is guaranteed to be less than or equal to
l.
- (str): A processed string whose length is guaranteed to be less than or equal to
-
Example:
from structai import cutoff_text
s = cutoff_text("aaaaaaasdddddfdf", l=10)
print(s) # "sfdf"
s = cutoff_text("asdfjsdjgofgofdkmsdlfmldmsgkgnfkdsfagfsdafdsfskfn", 22)
print(s) # "asdfjsd\n\n...\n\ndsfskfn"
str2dict
Robustly converts a string representation of a dictionary to a Python dict. It handles common formatting errors and uses json_repair as a fallback.
-
Args:
s(str): The string representation of a dictionary.
-
Returns:
- (dict): The parsed dictionary.
-
Example:
from structai import str2dict
d = str2dict("{'a': 1, 'b': 2}")
print(d['a']) # 1
str2list
Robustly converts a string representation of a list to a Python list.
-
Args:
s(str): The string representation of a list.
-
Returns:
- (list): The parsed list.
-
Example:
from structai import str2list
l = str2list("[1, 2, 3]")
print(len(l)) # 3
remove_tag
Removes specified tags from a string, replacing them with a separator (default newline).
-
Args:
s(str): The input string.tags(list[str], optional): A list of tags to remove. Default["<think>", "</think>", "<answer>", "</answer>"].r(str, optional): The replacement string. Default"\n".
-
Returns:
- (str): The cleaned string.
-
Example:
from structai import remove_tag
clean_text = remove_tag("<think>...</think> Answer")
# Output: "...\n Answer"
parse_think_answer
Parses a string containing Chain-of-Thought tags (<think>...</think> and <answer>...</answer>) and returns the content of both.
-
Args:
text(str): The input text containing the tags.
-
Returns:
- (tuple): A tuple
(think_content, answer_content).
- (tuple): A tuple
-
Example:
from structai import parse_think_answer
raw_text = "<think>Step 1...</think><answer>42</answer>"
think, answer = parse_think_answer(raw_text)
print(f"Reasoning: {think}") # Reasoning: Step 1...
print(f"Result: {answer}") # Result: 42
extract_within_tags
Extracts the substring found between two specific tags.
-
Args:
content(str): The text to search within.start_tag(str, optional): The opening tag. Default'<answer>'.end_tag(str, optional): The closing tag. Default'</answer>'.default_return(Any, optional): The value to return if tags are not found. DefaultNone.
-
Returns:
- (str | Any): The extracted content string, or
default_returnif not found.
- (str | Any): The extracted content string, or
-
Example:
from structai import extract_within_tags
text = "Result: <json>{...}</json>"
json_str = extract_within_tags(text, "<json>", "</json>")
# Output: "{...}"
Network Service
add_no_proxy_if_private
Checks if the hostname in the URL is a private IP address. If so, it adds it to the no_proxy environment variable to bypass proxies.
-
Args:
url(str): The URL to check.
-
Returns:
- None
-
Example:
from structai import add_no_proxy_if_private
add_no_proxy_if_private("http://192.168.1.100:8080/v1")
run_server
Starts a FastAPI server that acts as a proxy to an OpenAI-compatible LLM provider using LLM_BASE_URL and LLM_API_KEY in environment variables.
-
Args:
host(str, optional): The host to bind to. Default"0.0.0.0".port(int, optional): The port to bind to. Default8001.
-
Returns:
- None (Runs indefinitely until stopped).
-
Example:
from structai import run_server
if __name__ == "__main__":
run_server()
Time Limit
timeout_limit
A decorator that enforces a maximum execution time on a function. Raises TimeoutError if the limit is exceeded.
-
Args:
timeout(float | None): Maximum allowed execution time in seconds.
-
Returns:
- (decorator): A decorator function that wraps the target function.
-
Example:
from structai import timeout_limit
import time
@timeout_limit(timeout=2.0)
def task():
time.sleep(5)
# This will raise TimeoutError
task()
run_with_timeout
Runs a function with a specified timeout without using a decorator.
-
Args:
func(callable): The function to run.args(tuple, optional): Positional arguments for the function. Default().kwargs(dict, optional): Keyword arguments for the function. DefaultNone.timeout(float | None): Maximum allowed execution time in seconds.
-
Returns:
- (Any): The return value of the function.
-
Example:
from structai import run_with_timeout
def task(x):
return x * 2
result = run_with_timeout(task, args=(10,), timeout=1.0)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file structai-0.1.15.tar.gz.
File metadata
- Download URL: structai-0.1.15.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
657caccabfad564ebfae617b2da438abd22d2ef41d47081df33152a993d60043
|
|
| MD5 |
857a75edadab19dd82059a384f92446f
|
|
| BLAKE2b-256 |
ed59f3e37b02aa9f019fe29b874bb12caca30a5d18e631b21edde3f0cb2e789d
|
File details
Details for the file structai-0.1.15-py3-none-any.whl.
File metadata
- Download URL: structai-0.1.15-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ff8bb13278919d0465e67bcac7f48891feb6ca59838103f95e02011ba4a20db
|
|
| MD5 |
275500b607d6dd4a383f09c91ac542ef
|
|
| BLAKE2b-256 |
5e795d96d93987ecc655c69b459ff4af2dcf1af1e482b2b2d4da577e69128fdc
|