LangChain integration for WebCrawlerAPI
Project description
WebCrawlerAPI LangChain Integration
WebcrawlerAPI - is a website to LLM data API. It allows to convert websites and webpages markdown or cleaned content.
No subscription required.
This package provides LangChain integration for WebCrawlerAPI, allowing you to easily use web crawling capabilities with LangChain document processing pipeline.
Installation
Get your API key first
pip install webcrawlerapi-langchain
Usage
Basic Loading
from webcrawlerapi_langchain import WebCrawlerAPILoader
# Initialize the loader
loader = WebCrawlerAPILoader(
url="https://example.com",
api_key="your-api-key",
scrape_type="markdown",
items_limit=10
)
# Load documents
documents = loader.load()
# Use documents in your LangChain pipeline
for doc in documents:
print(doc.page_content[:100])
print(doc.metadata)
Async Loading
# Async loading
documents = await loader.aload()
Lazy Loading
# Lazy loading
for doc in loader.lazy_load():
print(doc.page_content[:100])
Async Lazy Loading
# Async lazy loading
async for doc in loader.alazy_load():
print(doc.page_content[:100])
Configuration
The loader accepts the following parameters:
url: The URL to crawlapi_key: Your WebCrawlerAPI API keyscrape_type: Type of scraping (html, cleaned, markdown)items_limit: Maximum number of pages to crawlwhitelist_regexp: Regex pattern for URL whitelistblacklist_regexp: Regex pattern for URL blacklist
Links
If you need help with integration feel free to contact us.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webcrawlerapi_langchain-0.1.1.tar.gz.
File metadata
- Download URL: webcrawlerapi_langchain-0.1.1.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c846ee8fba565d524de91242f49172d08aa51b1308ed426370f410abb5c0292
|
|
| MD5 |
a72c84c7a629cdf002a7bb48c99d9e11
|
|
| BLAKE2b-256 |
173ff92b54f6466d3ee2e2d2cc134d33725a40b2dcec82d618c1372b5e2191eb
|
File details
Details for the file webcrawlerapi_langchain-0.1.1-py3-none-any.whl.
File metadata
- Download URL: webcrawlerapi_langchain-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1d46fcc6f78e75f5b55d58fd79d853570a7d567d609d37aa0b82e3f1970dcea
|
|
| MD5 |
73048d17d12a74b28497626bc4ac0795
|
|
| BLAKE2b-256 |
c61e89b9712370a2a55d51dbe9ee96a59c08c09412c8a8bff24fa5da508d011e
|