Skip to main content

LangChain integration for WebCrawlerAPI

Project description

WebCrawlerAPI LangChain Integration

WebcrawlerAPI - is a website to LLM data API. It allows to convert websites and webpages markdown or cleaned content.

No subscription required.

This package provides LangChain integration for WebCrawlerAPI, allowing you to easily use web crawling capabilities with LangChain document processing pipeline.

Installation

Get your API key first

pip install webcrawlerapi-langchain

Usage

Basic Loading

from webcrawlerapi_langchain import WebCrawlerAPILoader

# Initialize the loader
loader = WebCrawlerAPILoader(
    url="https://example.com",
    api_key="your-api-key",
    scrape_type="markdown",
    items_limit=10
)

# Load documents
documents = loader.load()

# Use documents in your LangChain pipeline
for doc in documents:
    print(doc.page_content[:100])
    print(doc.metadata)

Async Loading

# Async loading
documents = await loader.aload()

Lazy Loading

# Lazy loading
for doc in loader.lazy_load():
    print(doc.page_content[:100])

Async Lazy Loading

# Async lazy loading
async for doc in loader.alazy_load():
    print(doc.page_content[:100])

Configuration

The loader accepts the following parameters:

  • url: The URL to crawl
  • api_key: Your WebCrawlerAPI API key
  • scrape_type: Type of scraping (html, cleaned, markdown)
  • items_limit: Maximum number of pages to crawl
  • whitelist_regexp: Regex pattern for URL whitelist
  • blacklist_regexp: Regex pattern for URL blacklist

Links

If you need help with integration feel free to contact us.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webcrawlerapi_langchain-0.1.1.tar.gz (4.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webcrawlerapi_langchain-0.1.1-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file webcrawlerapi_langchain-0.1.1.tar.gz.

File metadata

  • Download URL: webcrawlerapi_langchain-0.1.1.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for webcrawlerapi_langchain-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1c846ee8fba565d524de91242f49172d08aa51b1308ed426370f410abb5c0292
MD5 a72c84c7a629cdf002a7bb48c99d9e11
BLAKE2b-256 173ff92b54f6466d3ee2e2d2cc134d33725a40b2dcec82d618c1372b5e2191eb

See more details on using hashes here.

File details

Details for the file webcrawlerapi_langchain-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for webcrawlerapi_langchain-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e1d46fcc6f78e75f5b55d58fd79d853570a7d567d609d37aa0b82e3f1970dcea
MD5 73048d17d12a74b28497626bc4ac0795
BLAKE2b-256 c61e89b9712370a2a55d51dbe9ee96a59c08c09412c8a8bff24fa5da508d011e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page