LangChain/LangGraph integration for Olostep - Powerful web scraping tools for AI agents

These details have not been verified by PyPI

Project links

Project description

LangChain Olostep Integration

A powerful LangChain/LangGraph integration for the Olostep web scraping API. Build intelligent agents that can scrape, analyze, and extract data from any website.

Features

Web Scraping: Extract content from any website with JavaScript rendering support
Batch Processing: Scrape up to 100,000 URLs in parallel
AI-Powered Q&A: Ask questions about websites and get intelligent answers
Data Extraction: Extract specific fields using AI-powered mapping
Multiple Formats: Support for Markdown, HTML, JSON, and plain text
Specialized Parsers: Use custom parsers for specific websites (e.g., Amazon, LinkedIn)
Location-Specific: Scrape with country-specific settings
LangGraph Ready: Perfect for building complex AI agent workflows

Installation

pip install langchain-olostep

Setup

Set your Olostep API key:

export OLOSTEP_API_KEY="your_olostep_api_key_here"

Get your API key from https://olostep.com/dashboard

Quick Start

Basic Web Scraping

from langchain_olostep import scrape_website
import asyncio

# Scrape a website
content = asyncio.run(scrape_website.ainvoke({
    "url": "https://example.com",
    "format": "markdown"
}))

print(content)

With LangChain Agent

from langchain.agents import initialize_agent, AgentType
from langchain_openai import ChatOpenAI
from langchain_olostep import scrape_website, scrape_with_answer

# Create agent with Olostep tools
tools = [scrape_website, scrape_with_answer]
llm = ChatOpenAI(model="gpt-4o-mini")

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

# Use the agent
result = agent.run("""
Scrape https://example.com and tell me:
1. What is the main content about?
2. Extract any contact information
""")

print(result)

With LangGraph

from langgraph.graph import StateGraph
from langchain_olostep import scrape_website, scrape_batch
from langchain_openai import ChatOpenAI

# Build a research agent workflow
workflow = StateGraph(dict)

def scrape_node(state):
    urls = state["urls"]
    result = scrape_batch.invoke({"urls": urls})
    return {"scraped_data": result}

workflow.add_node("scrape", scrape_node)
# ... add more nodes

Available Tools

1. scrape_website

Scrape content from any website.

from langchain_olostep import scrape_website

result = await scrape_website.ainvoke({
    "url": "https://example.com",
    "format": "markdown",  # markdown, html, json, or text
    "country": "US",  # Optional: country code for location-specific content
    "wait_before_scraping": 2000,  # Optional: wait time in ms for JS rendering
    "parser": "@olostep/amazon-product"  # Optional: specialized parser
})

Perfect for:

Extracting article content
Scraping dynamic websites
Bypassing anti-scraping measures
Getting clean, formatted content

2. scrape_batch

Scrape multiple URLs in parallel.

from langchain_olostep import scrape_batch

urls = [
    "https://example1.com",
    "https://example2.com",
    "https://example3.com"
]

result = await scrape_batch.ainvoke({
    "urls": urls,
    "format": "markdown"
})

Perfect for:

Competitive analysis
Large-scale data collection
Building datasets
Monitoring multiple sources

3. scrape_with_answer

Ask questions about website content and get AI-powered answers.

from langchain_olostep import scrape_with_answer

result = await scrape_with_answer.ainvoke({
    "url": "https://company.com",
    "question": "What is the company's main product and its pricing?"
})

Perfect for:

Research and information extraction
Competitive intelligence
Lead generation
Content analysis

4. scrape_with_map

Extract specific fields using AI-powered mapping.

from langchain_olostep import scrape_with_map

result = await scrape_with_map.ainvoke({
    "url": "https://store.com/product/123",
    "fields": ["product_name", "price", "rating", "description"]
})

Perfect for:

Structured data extraction
Product information gathering
Contact details extraction
E-commerce data collection

Examples

Example 1: Research Agent

from langchain_olostep import scrape_website, scrape_with_answer
from langchain.agents import initialize_agent
from langchain_openai import ChatOpenAI

tools = [scrape_website, scrape_with_answer]
llm = ChatOpenAI(model="gpt-4o-mini")

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION
)

# Research a topic
result = agent.run("""
Research the latest developments in AI by:
1. Scraping https://openai.com/blog
2. Extracting key announcements
3. Summarizing the findings
""")

Example 2: Competitive Analysis

from langchain_olostep import scrape_batch, scrape_with_map

# Scrape competitor websites
competitors = [
    "https://competitor1.com/pricing",
    "https://competitor2.com/pricing",
    "https://competitor3.com/pricing"
]

batch_result = await scrape_batch.ainvoke({"urls": competitors})

# Extract pricing information
for url in competitors:
    pricing = await scrape_with_map.ainvoke({
        "url": url,
        "fields": ["pricing_tiers", "features", "prices"]
    })
    print(f"Competitor: {url}")
    print(f"Pricing: {pricing}")

Example 3: Content Monitoring

from langchain_olostep import scrape_website
import schedule
import time

def monitor_website():
    content = await scrape_website.ainvoke({
        "url": "https://important-site.com",
        "format": "markdown"
    })
    
    # Check for changes, send alerts, etc.
    # ... your logic here

# Run every hour
schedule.every().hour.do(monitor_website)

while True:
    schedule.run_pending()
    time.sleep(1)

Example 4: LangGraph Research Workflow

See the complete example in the examples directory.

from langgraph.graph import StateGraph, END
from langchain_olostep import scrape_website, scrape_with_answer

# Define your research workflow
workflow = StateGraph(dict)

# Add nodes for different stages
workflow.add_node("plan", plan_research)
workflow.add_node("scrape", scrape_content)
workflow.add_node("analyze", analyze_data)
workflow.add_node("report", generate_report)

# Connect the nodes
workflow.set_entry_point("plan")
workflow.add_edge("plan", "scrape")
workflow.add_edge("scrape", "analyze")
workflow.add_edge("analyze", "report")
workflow.add_edge("report", END)

# Compile and run
agent = workflow.compile()
result = agent.invoke({"query": "Research AI developments"})

Advanced Features

JavaScript Rendering

Handle dynamic websites that load content via JavaScript:

result = await scrape_website.ainvoke({
    "url": "https://dynamic-site.com",
    "wait_before_scraping": 3000  # Wait 3 seconds
})

Location-Specific Scraping

Get content as it appears in different countries:

result = await scrape_website.ainvoke({
    "url": "https://example.com",
    "country": "GB"  # Scrape as viewed from UK
})

Specialized Parsers

Use pre-built parsers for specific websites:

# Amazon product parser
product = await scrape_website.ainvoke({
    "url": "https://amazon.com/product/xyz",
    "parser": "@olostep/amazon-product"
})

# LinkedIn profile parser
profile = await scrape_website.ainvoke({
    "url": "https://linkedin.com/in/username",
    "parser": "@olostep/linkedin-profile"
})

Multiple Output Formats

Get content in different formats:

# Get markdown for readability
markdown = await scrape_website.ainvoke({
    "url": "https://example.com",
    "format": "markdown"
})

# Get JSON for structured data
json_data = await scrape_website.ainvoke({
    "url": "https://example.com",
    "format": "json"
})

# Get HTML for full page structure
html = await scrape_website.ainvoke({
    "url": "https://example.com",
    "format": "html"
})

Configuration

Environment Variables

OLOSTEP_API_KEY: Your Olostep API key (required)

Tool Parameters

All tools accept an optional api_key parameter:

result = await scrape_website.ainvoke({
    "url": "https://example.com",
    "api_key": "your_api_key_here"  # Override environment variable
})

Use Cases

Research & Analysis

Market research
Competitive intelligence
Academic research
News monitoring

Data Collection

Building datasets
Product information gathering
Price monitoring
Contact information extraction

AI Agents

Research assistants
Data extraction bots
Content analyzers
Web automation agents

Business Intelligence

Competitor tracking
Lead generation
Market analysis
Trend monitoring

Getting Started

Install the package
```
pip install langchain-olostep
```
Get your API key
- Sign up at olostep.com
- Get your API key from the dashboard
Set your API key
```
export OLOSTEP_API_KEY="your_key_here"
```
Try the examples Check out the examples in the repository

Documentation

Olostep API Documentation: https://docs.olostep.com
LangChain Documentation: https://python.langchain.com
LangGraph Documentation: https://langchain-ai.github.io/langgraph/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Support

Documentation: docs.olostep.com
Issues: GitHub Issues
Email: support@olostep.com

Why Olostep?

Reliable: Handle JavaScript rendering, anti-scraping measures, and dynamic content
Fast: Parallel processing for batch operations
Accurate: AI-powered extraction for precise data gathering
Flexible: Multiple formats, parsers, and configuration options
Scalable: From single URLs to 100,000+ URLs in batch

Changelog

0.2.0

Complete redesign focusing on Olostep's core features
Added scrape_with_answer for AI-powered Q&A
Added scrape_with_map for structured data extraction
Removed confusing document loader terminology
Improved tool descriptions and examples
Added comprehensive LangGraph example

0.1.0

Initial release

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.3

Dec 19, 2025

0.3.2

Nov 24, 2025

0.3.1

Oct 27, 2025

0.3.0

Oct 26, 2025

0.2.2

Oct 26, 2025

0.2.1

Oct 26, 2025

This version

0.2.0

Oct 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_olostep-0.2.0.tar.gz (17.4 kB view details)

Uploaded Oct 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

langchain_olostep-0.2.0-py3-none-any.whl (9.6 kB view details)

Uploaded Oct 26, 2025 Python 3

File details

Details for the file langchain_olostep-0.2.0.tar.gz.

File metadata

Download URL: langchain_olostep-0.2.0.tar.gz
Upload date: Oct 26, 2025
Size: 17.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for langchain_olostep-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`0609472d69f51009b400e3dab75fecce0af61031ea79e55271ee989e23e4eae4`
MD5	`22a0589f3686f8efd9feb488cbf45dd3`
BLAKE2b-256	`3d22db1dbe12e86224b88f4d5f76cf9c837c3f7dd041ec6aeddf0f925dc28714`

See more details on using hashes here.

File details

Details for the file langchain_olostep-0.2.0-py3-none-any.whl.

File metadata

Download URL: langchain_olostep-0.2.0-py3-none-any.whl
Upload date: Oct 26, 2025
Size: 9.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for langchain_olostep-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51bc33baf6db2fed301c863228669f01eafdb47c9c2c9ab6f152848787f572a3`
MD5	`54fff007abaa419addcb6b432a62c6f1`
BLAKE2b-256	`2014364968c25498e7b73c661437914116fca9b1f2e4f32d3f1ad89e3a91d442`

See more details on using hashes here.

langchain-olostep 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LangChain Olostep Integration

Features

Installation

Setup

Quick Start

Basic Web Scraping

With LangChain Agent

With LangGraph

Available Tools

1. scrape_website

2. scrape_batch

3. scrape_with_answer

4. scrape_with_map

Examples

Example 1: Research Agent

Example 2: Competitive Analysis

Example 3: Content Monitoring

Example 4: LangGraph Research Workflow

Advanced Features

JavaScript Rendering

Location-Specific Scraping

Specialized Parsers

Multiple Output Formats

Configuration

Environment Variables

Tool Parameters

Use Cases

Research & Analysis

Data Collection

AI Agents

Business Intelligence

Getting Started

Documentation

Contributing

License

Support

Why Olostep?

Changelog

0.2.0

0.1.0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes