Skip to main content

A new package designed to facilitate structured extraction of key information from scientific or factual text inputs, enabling precise summaries, data extraction, or categorization based on user promp

Project description

textract-io

PyPI version License: MIT Downloads LinkedIn

Structured Text Extraction for Scientific & Factual Data

textract_io is a Python package designed to extract structured key information from scientific or factual text inputs. It leverages pattern matching and retry mechanisms to ensure accurate, reliable responses—ideal for generating summaries, extracting data, or categorizing text based on user prompts. Perfect for processing pre-extracted textual data from multimedia sources to produce concise, structured outputs for research, reporting, or database entry.


🚀 Features

  • Pattern-based extraction: Uses regex patterns to enforce structured output.
  • LLM7 integration: Defaults to ChatLLM7 (from langchain_llm7) for extraction tasks.
  • Flexible LLM support: Easily swap with any LangChain-compatible LLM (OpenAI, Anthropic, Google, etc.).
  • Error handling: Robust retry logic and clear error messages.
  • Environment-aware: Uses LLM7_API_KEY from environment variables or direct API key input.

📦 Installation

pip install textract_io

🔧 Usage

Basic Usage (Default LLM7)

from textract_io import textract_io

response = textract_io(user_input="Your text here...")
print(response)  # List of extracted data matching the pattern

Custom LLM Integration

Replace the default ChatLLM7 with any LangChain-compatible LLM (e.g., OpenAI, Anthropic, Google):

OpenAI Example

from langchain_openai import ChatOpenAI
from textract_io import textract_io

llm = ChatOpenAI()
response = textract_io(user_input="Your text here...", llm=llm)

Anthropic Example

from langchain_anthropic import ChatAnthropic
from textract_io import textract_io

llm = ChatAnthropic()
response = textract_io(user_input="Your text here...", llm=llm)

Google Generative AI Example

from langchain_google_genai import ChatGoogleGenerativeAI
from textract_io import textract_io

llm = ChatGoogleGenerativeAI()
response = textract_io(user_input="Your text here...", llm=llm)

🔑 API Key Configuration

  • Default: Uses LLM7_API_KEY from environment variables.
  • Manual Override: Pass the API key directly:
    response = textract_io(user_input="Your text...", api_key="your_llm7_api_key")
    
  • Free API Key: Register at LLM7 Token to get started.

📝 Parameters

Parameter Type Description
user_input str The input text to process.
api_key Optional[str] LLM7 API key (defaults to LLM7_API_KEY environment variable).
llm Optional[BaseChatModel] Custom LangChain LLM (e.g., ChatOpenAI, ChatAnthropic). Defaults to ChatLLM7.

📊 Rate Limits

  • LLM7 Free Tier: Sufficient for most use cases.
  • Upgrade: Use your own API key or environment variable for higher limits.

🔄 Error Handling

  • If extraction fails, raises RuntimeError with a descriptive message.
  • Retries internally to improve reliability.

📜 License

MIT License (see LICENSE).


📢 Support & Issues

For bugs or feature requests, open an issue on GitHub.


👤 Author

Eugene Evstafev (@chigwell) 📧 hi@euegne.plus


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textract_io-2025.12.21154549.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

textract_io-2025.12.21154549-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file textract_io-2025.12.21154549.tar.gz.

File metadata

  • Download URL: textract_io-2025.12.21154549.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.1

File hashes

Hashes for textract_io-2025.12.21154549.tar.gz
Algorithm Hash digest
SHA256 d4a731e257a6ce5fd1773292e981fadf832da4061e004a22215fae601eeee9dd
MD5 5d6b2e45afd1d817b0570ad6bd6f42b1
BLAKE2b-256 2c7cdd33997d8d142d3d8ef27420f4eb0fee744b084a108316427133ad37694e

See more details on using hashes here.

File details

Details for the file textract_io-2025.12.21154549-py3-none-any.whl.

File metadata

File hashes

Hashes for textract_io-2025.12.21154549-py3-none-any.whl
Algorithm Hash digest
SHA256 a7fe56fc60b503f3f1578c1bbfd4e6b6187f9ce3539e2248a8902e9a9ae244a8
MD5 9ec9971a8ee80f75e91959805495275a
BLAKE2b-256 5678b4a57e4d4a34ac1fea90ea010275d2d25aefccd19ecc3592d5bad4a0f8b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page