Skip to main content

LangChain integration for WebCrawlerAPI

Project description

WebCrawlerAPI LangChain Integration

WebcrawlerAPI - is a website to LLM data API. It allows to convert websites and webpages markdown or cleaned content.

No subscription required.

This package provides LangChain integration for WebCrawlerAPI, allowing you to easily use web crawling capabilities with LangChain document processing pipeline.

Installation

Get your API key first

pip install webcrawlerapi-langchain

Usage

Basic Loading

from webcrawlerapi_langchain import WebCrawlerAPILoader

# Initialize the loader
loader = WebCrawlerAPILoader(
    url="https://example.com",
    api_key="your-api-key",
    scrape_type="markdown",
    items_limit=10
)

# Load documents
documents = loader.load()

# Use documents in your LangChain pipeline
for doc in documents:
    print(doc.page_content[:100])
    print(doc.metadata)

Async Loading

# Async loading
documents = await loader.aload()

Lazy Loading

# Lazy loading
for doc in loader.lazy_load():
    print(doc.page_content[:100])

Async Lazy Loading

# Async lazy loading
async for doc in loader.alazy_load():
    print(doc.page_content[:100])

Configuration

The loader accepts the following parameters:

  • url: The URL to crawl
  • api_key: Your WebCrawlerAPI API key
  • scrape_type: Type of scraping (html, cleaned, markdown)
  • items_limit: Maximum number of pages to crawl
  • whitelist_regexp: Regex pattern for URL whitelist
  • blacklist_regexp: Regex pattern for URL blacklist

Links

If you need help with integration feel free to contact us.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webcrawlerapi_langchain-0.1.0.tar.gz (5.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webcrawlerapi_langchain-0.1.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file webcrawlerapi_langchain-0.1.0.tar.gz.

File metadata

  • Download URL: webcrawlerapi_langchain-0.1.0.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for webcrawlerapi_langchain-0.1.0.tar.gz
Algorithm Hash digest
SHA256 925431625afc76597156b5c55b87d2866e9bff263c81ffda1fed8d38c8fcbb11
MD5 8c7b9155b5677356fbaffc15c460a0ae
BLAKE2b-256 d63e89a1bc2a5eee2da363982ea436c6746e42101f848d49151d3f3851d87275

See more details on using hashes here.

File details

Details for the file webcrawlerapi_langchain-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for webcrawlerapi_langchain-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3668b78abfc12899dab92d6b2d6c03daffe7dc5c07b85bd61cf14eb7915992e2
MD5 efb4821f4ef8b67be1258de685e5850a
BLAKE2b-256 425b6aa73328b664b01fb5ae45b6bc5f90072312e9db2d32513db5436bf6b04f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page