Skip to main content

An integration package connecting Tzafon and LangChain

Project description

🦜 langchain-tzafon

An integration package connecting Tzafon and LangChain.

langchain-tzafon allows you to seamlessly use Tzafon's headless browser infrastructure as a Document Loader in your LangChain applications. It handles complex web page rendering (including JavaScript) and extracts clean text or raw HTML for your LLM pipelines.


✨ Features

  • Headless Browser Rendering: Power by Tzafon's cloud-based browser instances.
  • JavaScript Support: Naturally handles SPAs and dynamically loaded content.
  • Sync & Async Support: Features both lazy_load and alazy_load for high-performance applications.
  • Configurable Extraction: Choice between clean text content or full source HTML.
  • Seamless Integration: Fully compatible with LangChain's BaseLoader interface.

🚀 Installation

pip install langchain-tzafon

🔑 Configuration

To use this package, you need a Tzafon API Key.

  1. Sign up or log in at tzafon.ai to get your API key.
  2. Set it as an environment variable (recommended):
export TZAFON_API_KEY="your_api_key_here"

Alternatively, you can pass the API key directly when initializing the loader.


📖 Usage

Basic Usage (Text Extraction)

By default, TzafonLoader extracts the visible text from the <body> of the page.

from langchain_tzafon import TzafonLoader

# Initialize with one or more URLs
loader = TzafonLoader(urls=["https://example.com"])

# Load documents
documents = loader.load()

for doc in documents:
    print(f"Content from {doc.metadata['url']}:")
    print(doc.page_content[:200])

Async Loading

For better performance when handling multiple URLs, use the asynchronous loader:

import asyncio
from langchain_tzafon import TzafonLoader

async def main():
    loader = TzafonLoader(urls=[
        "https://example.com",
        "https://tzafon.ai"
    ])
    
    async for doc in loader.alazy_load():
        print(f"Loaded {doc.metadata['url']}")

if __name__ == "__main__":
    asyncio.run(main())

Loading Raw HTML

If you need the full HTML structure for custom parsing:

loader = TzafonLoader(
    urls="https://example.com",
    text_content=False  # Set to False for raw HTML
)
documents = loader.load()

🛠️ API Reference

TzafonLoader

Argument Type Description
urls str | List[str] A single URL or a list of URLs to load.
api_key Optional[str] Your Tzafon API key. Defaults to TZAFON_API_KEY env var.
text_content bool If True (default), extracts visible text. If False, returns raw HTML.

🧪 Development

This project uses uv for dependency management and pytest for testing.

Running Tests

uv run pytest

📄 License

This project is licensed under the MIT License - see the LICENSE file for details (if available).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_tzafon-1.0.0.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_tzafon-1.0.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file langchain_tzafon-1.0.0.tar.gz.

File metadata

  • Download URL: langchain_tzafon-1.0.0.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for langchain_tzafon-1.0.0.tar.gz
Algorithm Hash digest
SHA256 b0050c14a3464078b9ad30bfb19354ca090bdd2afb61dbf0e5b9d326d2906247
MD5 2fb373900531ae255e581fa726d4bc51
BLAKE2b-256 5453f4d56046a710d36acf982e7f75a2dd31c05217191265224483d4dc56bf27

See more details on using hashes here.

File details

Details for the file langchain_tzafon-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_tzafon-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 75bedd9a3357673ac82d88055ef753273593cea07b557f61982674cae03cbb9b
MD5 b902501bb128619ec15df34918607279
BLAKE2b-256 187423f80344b62c2019cbe573a2d954e3636fb3578f91e7b87dac59e8914ab8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page