Skip to main content

An integration package connecting Tzafon and LangChain

Project description

🦜 langchain-tzafon

An integration package connecting Tzafon and LangChain.

langchain-tzafon allows you to seamlessly use Tzafon's headless browser infrastructure as a Document Loader in your LangChain applications. It handles complex web page rendering (including JavaScript) and extracts clean text or raw HTML for your LLM pipelines.


✨ Features

  • Headless Browser Rendering: Powered by Tzafon's cloud-based browser instances.
  • JavaScript Support: Naturally handles SPAs and dynamically loaded content.
  • Sync & Async Support: Features both lazy_load and alazy_load for high-performance applications.
  • Configurable Extraction: Choice between clean text content or full source HTML.
  • Seamless Integration: Fully compatible with LangChain's BaseLoader interface.

🚀 Installation

pip install langchain-tzafon

Note: This package requires Playwright for connecting to the remote browser.


🔑 Configuration

To use this package, you need a Tzafon API Key.

  1. Sign up or log in at tzafon.ai to get your API key.
  2. Set it as an environment variable (recommended):
export TZAFON_API_KEY="your_api_key_here"

Alternatively, you can pass the API key directly when initializing the loader.


📖 Usage

Basic Usage (Text Extraction)

By default, TzafonLoader extracts the visible text from the <body> of the page, which is ideal for LLM processing.

from langchain_tzafon import TzafonLoader

# Initialize with one or more URLs
loader = TzafonLoader(urls=["https://example.com"])

# Load documents
documents = loader.load()

for doc in documents:
    print(f"Content from {doc.metadata['url']}:")
    print(doc.page_content[:200])

Async Loading

For better performance when handling multiple URLs, use the asynchronous loader:

import asyncio
from langchain_tzafon import TzafonLoader

async def main():
    loader = TzafonLoader(urls=[
        "https://example.com",
        "https://tzafon.ai"
    ])
    
    async for doc in loader.alazy_load():
        print(f"Loaded {doc.metadata['url']}")

if __name__ == "__main__":
    asyncio.run(main())

Loading Raw HTML

If you need the full HTML structure for custom parsing:

loader = TzafonLoader(
    urls="https://example.com",
    text_content=False  # Set to False for raw HTML
)
documents = loader.load()

🛠️ API Reference

TzafonLoader

Argument Type Description
urls str | List[str] A single URL or a list of URLs to load.
api_key Optional[str] Your Tzafon API key. Defaults to TZAFON_API_KEY env var.
text_content bool If True (default), extracts visible text. If False, returns raw HTML.
kind "browser" | "desktop" The type of environment to use. Defaults to "browser".

📄 License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_tzafon-1.0.1.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_tzafon-1.0.1-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file langchain_tzafon-1.0.1.tar.gz.

File metadata

  • Download URL: langchain_tzafon-1.0.1.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.20

File hashes

Hashes for langchain_tzafon-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0dfd091c7dd6f84abe43c9e078ccf77f728a45d7a2cf11e44d6ec2830b564166
MD5 24231761d3bf11f41296a743ccab3f3c
BLAKE2b-256 85b4f110d97c85620bd6110c57622c6f73122472fe8703b9bbef1982ea5be4f4

See more details on using hashes here.

File details

Details for the file langchain_tzafon-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_tzafon-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7e8fc05be58c5e3312ec2f0d4ecd697f798165ee38ef87cb40ce157e6849b407
MD5 008eabd58d7cdf597131c4e429650ca7
BLAKE2b-256 5c3085d6766d55693080b005d2d63827e866edd5e978974dc6fc03e97ec9ff7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page