A new package designed to parse and structure information from unstructured text. It takes textual input and utilizes advanced language models with pattern matching to extract specific data points, en
Project description
textstructparser 🔍
Parse and structure information from unstructured text using advanced language models.
Overview
textstructparser is a Python package designed to extract structured data from unstructured text using pattern matching and large language models (LLMs). It ensures consistent, machine-readable output for downstream applications like data processing, NLP pipelines, or automation workflows.
By default, it uses ChatLLM7 (via langchain_llm7), but developers can easily integrate their preferred LLM (e.g., OpenAI, Anthropic, Google) for flexibility.
Installation
pip install textstructparser
Features
✅ Flexible LLM Integration – Works with any BaseChatModel from LangChain.
✅ Pattern-Based Extraction – Uses regex constraints for structured output.
✅ Default LLM7 Support – Free-tier rate limits sufficient for most use cases.
✅ Customizable – Override defaults with your own API keys or LLMs.
Usage Examples
Basic Usage (Default LLM7)
from textstructparser import textstructparser
user_input = "Extract dates and names from this text: John Doe visited New York on 2023-10-15."
response = textstructparser(user_input)
print(response) # Returns structured data matching the predefined pattern
Using a Custom LLM (OpenAI)
from langchain_openai import ChatOpenAI
from textstructparser import textstructparser
llm = ChatOpenAI(model="gpt-4")
response = textstructparser(user_input, llm=llm)
Using a Custom LLM (Anthropic)
from langchain_anthropic import ChatAnthropic
from textstructparser import textstructparser
llm = ChatAnthropic(model="claude-2")
response = textstructparser(user_input, llm=llm)
Using a Custom LLM (Google Vertex AI)
from langchain_google_genai import ChatGoogleGenerativeAI
from textstructparser import textstructparser
llm = ChatGoogleGenerativeAI(model="gemini-pro")
response = textstructparser(user_input, llm=llm)
Passing a Custom API Key (LLM7)
from textstructparser import textstructparser
# Option 1: Via environment variable
import os
os.environ["LLM7_API_KEY"] = "your_api_key_here"
# Option 2: Directly in function call
response = textstructparser(user_input, api_key="your_api_key_here")
Parameters
| Parameter | Type | Description |
|---|---|---|
user_input |
str |
The unstructured text to parse. |
api_key |
Optional[str] |
LLM7 API key (defaults to LLM7_API_KEY env var). |
llm |
Optional[BaseChatModel] |
Custom LangChain LLM (defaults to ChatLLM7). |
How It Works
- Input Processing: Takes raw text and applies a predefined regex pattern for structured extraction.
- LLM Integration: Uses
llmatch(fromllmatch_messages) to query the LLM with system/human prompts. - Output Validation: Ensures extracted data matches the regex pattern before returning.
Default LLM (LLM7)
- Provider: LLM7
- Free Tier: Sufficient for most use cases (rate limits apply).
- Upgrade: Pass a custom API key via
api_keyorLLM7_API_KEYenv var.
Get a free API key: https://token.llm7.io/
Custom LLM Support
For OpenAI, Anthropic, Google, or other LangChain-compatible LLMs, simply pass your model instance:
from langchain_openai import ChatOpenAI
from textstructparser import textstructparser
llm = ChatOpenAI(model="gpt-3.5-turbo")
response = textstructparser(user_input, llm=llm)
Contributing
Contributions are welcome! Please open an issue or submit a PR: 🔗 GitHub Issues
License
MIT – See LICENSE for details.
Author
👤 Eugene Evstafev 📧 hi@euegne.plus 🔗 LinkedIn 🐙 GitHub
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file textstructparser-2025.12.20202423.tar.gz.
File metadata
- Download URL: textstructparser-2025.12.20202423.tar.gz
- Upload date:
- Size: 4.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56a396473cc5f1ac4e10d2493f689fe635002051329bd5832fd99f4b7b6f07bf
|
|
| MD5 |
228f281c404df976b4d65784f7ba2578
|
|
| BLAKE2b-256 |
baa68e56c7a91af93c8f350a78651d5d875cd68b8f3d3104308eb7ca0ee7c4be
|
File details
Details for the file textstructparser-2025.12.20202423-py3-none-any.whl.
File metadata
- Download URL: textstructparser-2025.12.20202423-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4daef9ac9083221e48c19f9cc5373fde9ef0b29254273379fe6905e7bb0101cd
|
|
| MD5 |
693a890088b449c0db7adb9848eca1bf
|
|
| BLAKE2b-256 |
c1baf73c6c417e4be9d04fe01692a0c462f0988b0787e61c3c9754efd5afae64
|