A LangChain integration tool that provides reliable web scraping capabilities at any scale using ZenRows' Universal Scraper API

These details have not been verified by PyPI

Project links

Project description

langchain-zenrows

The langchain-zenrows integration tool enables LangChain agents to scrape and access web content at any scale using ZenRows' enterprise-grade infrastructure.

Whether you need to scrape JavaScript-heavy single-page applications, bypass anti-bot systems, access geo-restricted content, or extract structured data at scale, this integration provides the tools and reliability needed for modern AI applications.

Installation
Usage
API Reference
Features
License

Installation

pip install langchain-zenrows

Usage

To use the ZenRows Universal Scraper with LangChain, you'll need a ZenRows API key. You can sign up for free at ZenRows.

For more comprehensive examples and use cases, see the examples/ folder.

Basic Usage

import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

# Initialize the tool
scraper = ZenRowsUniversalScraper()

# Scrape a simple webpage
result = scraper.invoke({"url": "https://httpbin.io/html"})
print(result)

Advanced Usage with Parameters

import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Scrape with JavaScript rendering and premium proxies
result = scraper.invoke({
    "url": "https://www.scrapingcourse.com/ecommerce/",
    "js_render": True,
    "premium_proxy": True,
    "proxy_country": "us",
    "response_type": "markdown",
    "wait": 2000  # Wait 2 seconds after page load
})

print(result)

See the API Reference section below for more available parameters and customizing scraping requests.

Using with LangChain Agents

from langchain_zenrows import ZenRowsUniversalScraper
from langchain_openai import ChatOpenAI  # or your preferred LLM
from langgraph.prebuilt import create_react_agent
import os

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"
os.environ["OPENAI_API_KEY"] = "<YOUR_OPEN_AI_API_KEY>"


# Initialize components
llm = ChatOpenAI(model="gpt-4o-mini")
zenrows_tool = ZenRowsUniversalScraper()

# Create agent
agent = create_react_agent(llm, [zenrows_tool])

# Use the agent
result = agent.invoke(
    {
        "messages": "Scrape https://news.ycombinator.com/ and list the top 3 stories with title, points, comments, username, and time."
    }
)

print("Agent Response:")
for message in result["messages"]:
    print(f"{message.content}")

CSS Extraction

Extract specific data using CSS selectors:

import json
import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Extract specific elements
css_selector = json.dumps({
    "title": "h1",
    "paragraphs": "p"
})

result = scraper.invoke({
    "url": "https://httpbin.io/html",
    "css_extractor": css_selector
})

Premium Proxy with Geo-targeting

Access geo-restricted content:

import os
from langchain_zenrows import ZenRowsUniversalScraper

# Set your ZenRows API key
os.environ["ZENROWS_API_KEY"] = "<YOUR_ZENROWS_API_KEY>"

scraper = ZenRowsUniversalScraper()

# Check your IP location
result = scraper.invoke({
    "url": "https://httpbin.io/ip",
    "premium_proxy": True,
    "proxy_country": "us"
})
print(result)  # Shows the US IP being used

API Reference

ZenRowsUniversalScraper

Main tool class for web scraping with ZenRows.

Parameters:

zenrows_api_key (str, optional): Your ZenRows API key. If not provided, looks for ZENROWS_API_KEY environment variable.

Input Schema:

For complete parameter documentation and details, see the official ZenRows API Reference.

Parameter	Type	Description
`url`	str	Required. The URL to scrape
`js_render`	bool	Enable JavaScript rendering with a headless browser. Essential for modern web apps, SPAs, and sites with dynamic content (default: False)
`js_instructions`	str	Execute custom JavaScript on the page to interact with elements, scroll, click buttons, or manipulate content
`premium_proxy`	bool	Use residential IPs to bypass anti-bot protection. Essential for accessing protected sites (default: False)
`proxy_country`	str	Set the country of the IP used for the request. Use for accessing geo-restricted content. Two-letter country code
`session_id`	int	Maintain the same IP for multiple requests for up to 10 minutes. Essential for multi-step processes
`custom_headers`	dict	Include custom headers in your request to mimic browser behavior
`wait_for`	str	Wait for a specific CSS Selector to appear in the DOM before returning content
`wait`	int	Wait a fixed amount of milliseconds after page load
`block_resources`	str	Block specific resources (images, fonts, etc.) from loading to speed up scraping
`response_type`	str	Convert HTML to other formats. Options: "markdown", "plaintext", "pdf"
`css_extractor`	str	Extract specific elements using CSS selectors (JSON format)
`autoparse`	bool	Automatically extract structured data from HTML (default: False)
`screenshot`	str	Capture an above-the-fold screenshot of the page (default: "false")
`screenshot_fullpage`	str	Capture a full-page screenshot (default: "false")
`screenshot_selector`	str	Capture a screenshot of a specific element using CSS Selector
`screenshot_format`	str	Choose between "png" (default) and "jpeg" formats for screenshots
`screenshot_quality`	int	For JPEG format, set quality from 1 to 100. Lower values reduce file size but decrease quality
`original_status`	bool	Return the original HTTP status code from the target page (default: False)
`allowed_status_codes`	str	Returns the content even if the target page fails with specified status codes. Useful for debugging or when you need content from error pages
`json_response`	bool	Capture network requests in JSON format, including XHR or Fetch data. Ideal for intercepting API calls made by the web page (default: False)
`outputs`	str	Specify which data types to extract from the scraped HTML. Accepted values: emails, phone_numbers, headings, images, audios, videos, links, menus, hashtags, metadata, tables, favicon

Features

JavaScript Rendering: Scrape modern SPAs and dynamic content
Anti-Bot Bypass: Bypass sophisticated bot detection systems
Geo-Targeting: Access region-specific content with 190+ countries
Multiple Output Formats: HTML, Markdown, Plaintext, PDF, Screenshots
CSS Extraction: Target specific data with CSS selectors
Structured Data Extraction: Automatically extract emails, phone numbers, links, and other data types
Session Management: Maintain consistent sessions across requests
Wait Conditions: Smart waiting for dynamic content
Premium Proxies: 55M+ residential IPs for maximum success rates

License

langchain-zenrows is distributed under the terms of the MIT license.

Support

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_zenrows-0.1.0.tar.gz (9.8 kB view details)

Uploaded Jun 16, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_zenrows-0.1.0-py3-none-any.whl (9.7 kB view details)

Uploaded Jun 16, 2025 Python 3

File details

Details for the file langchain_zenrows-0.1.0.tar.gz.

File metadata

Download URL: langchain_zenrows-0.1.0.tar.gz
Upload date: Jun 16, 2025
Size: 9.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.4

File hashes

Hashes for langchain_zenrows-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5824abe9e057a966dd75f5fbdec095b039e2930a96d62bdb22d65163058c27cc`
MD5	`edf43fd06410619988f9f5a5a06ad645`
BLAKE2b-256	`b88ee1c27ce1b24c5658abec4515d9cd9d5ebb50dd6e991d107ccc12d6be1ef4`

See more details on using hashes here.

File details

Details for the file langchain_zenrows-0.1.0-py3-none-any.whl.

File metadata

Download URL: langchain_zenrows-0.1.0-py3-none-any.whl
Upload date: Jun 16, 2025
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.4

File hashes

Hashes for langchain_zenrows-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dd940f912a1085eae970bfed9c96f17e20bd6c25021abdf5ede54d05411cb145`
MD5	`bb62a8b6413cac5ca9d60aae6a7ce8de`
BLAKE2b-256	`c4518c0b61489d9d81c4e92a9a15942596453b2ef1ab80a4d9158e2a110ec6e3`

See more details on using hashes here.

langchain-zenrows 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

langchain-zenrows

Table of Contents

Installation

Usage

Basic Usage

Advanced Usage with Parameters

Using with LangChain Agents

CSS Extraction

Premium Proxy with Geo-targeting

API Reference

ZenRowsUniversalScraper

Features

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes