An integration package connecting Tzafon and LangChain
Project description
🦜 langchain-tzafon
An integration package connecting Tzafon and LangChain.
langchain-tzafon allows you to seamlessly use Tzafon's headless browser infrastructure as a Document Loader in your LangChain applications. It handles complex web page rendering (including JavaScript) and extracts clean text or raw HTML for your LLM pipelines.
✨ Features
- Headless Browser Rendering: Power by Tzafon's cloud-based browser instances.
- JavaScript Support: Naturally handles SPAs and dynamically loaded content.
- Sync & Async Support: Features both
lazy_loadandalazy_loadfor high-performance applications. - Configurable Extraction: Choice between clean text content or full source HTML.
- Seamless Integration: Fully compatible with LangChain's
BaseLoaderinterface.
🚀 Installation
pip install langchain-tzafon
🔑 Configuration
To use this package, you need a Tzafon API Key.
- Sign up or log in at tzafon.ai to get your API key.
- Set it as an environment variable (recommended):
export TZAFON_API_KEY="your_api_key_here"
Alternatively, you can pass the API key directly when initializing the loader.
📖 Usage
Basic Usage (Text Extraction)
By default, TzafonLoader extracts the visible text from the <body> of the page.
from langchain_tzafon import TzafonLoader
# Initialize with one or more URLs
loader = TzafonLoader(urls=["https://example.com"])
# Load documents
documents = loader.load()
for doc in documents:
print(f"Content from {doc.metadata['url']}:")
print(doc.page_content[:200])
Async Loading
For better performance when handling multiple URLs, use the asynchronous loader:
import asyncio
from langchain_tzafon import TzafonLoader
async def main():
loader = TzafonLoader(urls=[
"https://example.com",
"https://tzafon.ai"
])
async for doc in loader.alazy_load():
print(f"Loaded {doc.metadata['url']}")
if __name__ == "__main__":
asyncio.run(main())
Loading Raw HTML
If you need the full HTML structure for custom parsing:
loader = TzafonLoader(
urls="https://example.com",
text_content=False # Set to False for raw HTML
)
documents = loader.load()
🛠️ API Reference
TzafonLoader
| Argument | Type | Description |
|---|---|---|
urls |
str | List[str] |
A single URL or a list of URLs to load. |
api_key |
Optional[str] |
Your Tzafon API key. Defaults to TZAFON_API_KEY env var. |
text_content |
bool |
If True (default), extracts visible text. If False, returns raw HTML. |
🧪 Development
This project uses uv for dependency management and pytest for testing.
Running Tests
uv run pytest
📄 License
This project is licensed under the MIT License - see the LICENSE file for details (if available).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_tzafon-1.0.0.tar.gz.
File metadata
- Download URL: langchain_tzafon-1.0.0.tar.gz
- Upload date:
- Size: 53.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0050c14a3464078b9ad30bfb19354ca090bdd2afb61dbf0e5b9d326d2906247
|
|
| MD5 |
2fb373900531ae255e581fa726d4bc51
|
|
| BLAKE2b-256 |
5453f4d56046a710d36acf982e7f75a2dd31c05217191265224483d4dc56bf27
|
File details
Details for the file langchain_tzafon-1.0.0-py3-none-any.whl.
File metadata
- Download URL: langchain_tzafon-1.0.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75bedd9a3357673ac82d88055ef753273593cea07b557f61982674cae03cbb9b
|
|
| MD5 |
b902501bb128619ec15df34918607279
|
|
| BLAKE2b-256 |
187423f80344b62c2019cbe573a2d954e3636fb3578f91e7b87dac59e8914ab8
|