Skip to main content

A new package designed to parse and structure information from unstructured text. It takes textual input and utilizes advanced language models with pattern matching to extract specific data points, en

Project description

textstructparser 🔍

Parse and structure information from unstructured text using advanced language models.

PyPI version License: MIT Downloads LinkedIn


Overview

textstructparser is a Python package designed to extract structured data from unstructured text using pattern matching and large language models (LLMs). It ensures consistent, machine-readable output for downstream applications like data processing, NLP pipelines, or automation workflows.

By default, it uses ChatLLM7 (via langchain_llm7), but developers can easily integrate their preferred LLM (e.g., OpenAI, Anthropic, Google) for flexibility.


Installation

pip install textstructparser

Features

Flexible LLM Integration – Works with any BaseChatModel from LangChain. ✅ Pattern-Based Extraction – Uses regex constraints for structured output. ✅ Default LLM7 Support – Free-tier rate limits sufficient for most use cases. ✅ Customizable – Override defaults with your own API keys or LLMs.


Usage Examples

Basic Usage (Default LLM7)

from textstructparser import textstructparser

user_input = "Extract dates and names from this text: John Doe visited New York on 2023-10-15."
response = textstructparser(user_input)
print(response)  # Returns structured data matching the predefined pattern

Using a Custom LLM (OpenAI)

from langchain_openai import ChatOpenAI
from textstructparser import textstructparser

llm = ChatOpenAI(model="gpt-4")
response = textstructparser(user_input, llm=llm)

Using a Custom LLM (Anthropic)

from langchain_anthropic import ChatAnthropic
from textstructparser import textstructparser

llm = ChatAnthropic(model="claude-2")
response = textstructparser(user_input, llm=llm)

Using a Custom LLM (Google Vertex AI)

from langchain_google_genai import ChatGoogleGenerativeAI
from textstructparser import textstructparser

llm = ChatGoogleGenerativeAI(model="gemini-pro")
response = textstructparser(user_input, llm=llm)

Passing a Custom API Key (LLM7)

from textstructparser import textstructparser

# Option 1: Via environment variable
import os
os.environ["LLM7_API_KEY"] = "your_api_key_here"

# Option 2: Directly in function call
response = textstructparser(user_input, api_key="your_api_key_here")

Parameters

Parameter Type Description
user_input str The unstructured text to parse.
api_key Optional[str] LLM7 API key (defaults to LLM7_API_KEY env var).
llm Optional[BaseChatModel] Custom LangChain LLM (defaults to ChatLLM7).

How It Works

  1. Input Processing: Takes raw text and applies a predefined regex pattern for structured extraction.
  2. LLM Integration: Uses llmatch (from llmatch_messages) to query the LLM with system/human prompts.
  3. Output Validation: Ensures extracted data matches the regex pattern before returning.

Default LLM (LLM7)

  • Provider: LLM7
  • Free Tier: Sufficient for most use cases (rate limits apply).
  • Upgrade: Pass a custom API key via api_key or LLM7_API_KEY env var.

Get a free API key: https://token.llm7.io/


Custom LLM Support

For OpenAI, Anthropic, Google, or other LangChain-compatible LLMs, simply pass your model instance:

from langchain_openai import ChatOpenAI
from textstructparser import textstructparser

llm = ChatOpenAI(model="gpt-3.5-turbo")
response = textstructparser(user_input, llm=llm)

Contributing

Contributions are welcome! Please open an issue or submit a PR: 🔗 GitHub Issues


License

MIT – See LICENSE for details.


Author

👤 Eugene Evstafev 📧 hi@euegne.plus 🔗 LinkedIn 🐙 GitHub


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textstructparser-2025.12.20202423.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textstructparser-2025.12.20202423-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file textstructparser-2025.12.20202423.tar.gz.

File metadata

File hashes

Hashes for textstructparser-2025.12.20202423.tar.gz
Algorithm Hash digest
SHA256 56a396473cc5f1ac4e10d2493f689fe635002051329bd5832fd99f4b7b6f07bf
MD5 228f281c404df976b4d65784f7ba2578
BLAKE2b-256 baa68e56c7a91af93c8f350a78651d5d875cd68b8f3d3104308eb7ca0ee7c4be

See more details on using hashes here.

File details

Details for the file textstructparser-2025.12.20202423-py3-none-any.whl.

File metadata

File hashes

Hashes for textstructparser-2025.12.20202423-py3-none-any.whl
Algorithm Hash digest
SHA256 4daef9ac9083221e48c19f9cc5373fde9ef0b29254273379fe6905e7bb0101cd
MD5 693a890088b449c0db7adb9848eca1bf
BLAKE2b-256 c1baf73c6c417e4be9d04fe01692a0c462f0988b0787e61c3c9754efd5afae64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page