llama-index tools to use ScraperAPI web scraping
Project description
ScraperAPI LlamaIndex Tools Integration
This tool connects to ScraperAPI, a web scraping API that handles proxies, browsers, and CAPTCHAs, enabling your LlamaIndex agent to scrape web pages and extract structured data from Amazon, Google, eBay, Walmart, and Redfin.
Installation
pip install llama-index-tools-scraperapi
Usage
import asyncio
import os
from llama_index.tools.scraperapi import ScraperAPIToolSpec
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.llms.openai import OpenAI
async def main():
scraper_tool = ScraperAPIToolSpec(
api_key=os.environ["SCRAPERAPI_API_KEY"],
)
agent = FunctionAgent(
tools=scraper_tool.to_tool_list(),
llm=OpenAI(model="gpt-4.1"),
)
response = await agent.run(
"Scrape https://example.com and summarize the content"
)
print(response)
asyncio.run(main())
Scrape a Web Page
from llama_index.tools.scraperapi import ScraperAPIToolSpec
tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])
# Returns markdown content by default
content = tool.scrape("https://example.com")
print(content)
# Get plain text instead
content = tool.scrape("https://example.com", output_format="text")
# Enable JS rendering for dynamic pages
content = tool.scrape("https://example.com", render=True)
Amazon
# Product details by ASIN
product = tool.amazon_product(asin="B07FTKQ97Q")
# Search products
results = tool.amazon_search(query="wireless headphones")
# All seller offers for a product
offers = tool.amazon_offers(asin="B07FTKQ97Q")
# Web search (structured SERP)
results = tool.google_search(query="Python web scraping tutorial")
# Shopping results
shopping = tool.google_shopping(query="laptop")
# News articles
news = tool.google_news(query="AI", tbs="w") # past week
# Maps / places search
places = tool.google_maps_search(query="pizza", latitude=40.7128, longitude=-74.0060)
# Job listings
jobs = tool.google_jobs(query="python developer", gl="us")
eBay
# Product details
product = tool.ebay_product(product_id="166619046796")
# Search with filters
results = tool.ebay_search(query="vintage watch", sort_by="price_lowest", condition="used")
Walmart
# Product details
product = tool.walmart_product(product_id="5253396052")
# Search
results = tool.walmart_search(query="laptop")
# Browse category
items = tool.walmart_category(category="3944_1089430_37807")
# Product reviews
reviews = tool.walmart_reviews(product_id="5253396052", sort="helpful")
Redfin
# Search listings
listings = tool.redfin_search(url="https://www.redfin.com/city/30749/CA/San-Francisco")
# Agent details
agent = tool.redfin_agent(url="https://www.redfin.com/real-estate-agents/agent-name")
# For-sale listing
listing = tool.redfin_forsale(url="https://www.redfin.com/CA/San-Francisco/123-Main-St")
# For-rent listing
rental = tool.redfin_forrent(url="https://www.redfin.com/CA/San-Francisco/456-Oak-Ave")
Geo-targeted Scraping
tool = ScraperAPIToolSpec(
api_key=os.environ["SCRAPERAPI_API_KEY"],
country_code="uk",
)
# All requests will use UK proxies by default
content = tool.scrape("https://example.co.uk")
# Override per request
content = tool.scrape("https://example.de", country_code="de")
Available Tools
Scraping:
scrape: Scrape any web page and return content as markdown, text, or JSON.
Amazon (Structured Data):
amazon_product: Get product details by ASIN.amazon_search: Search Amazon products.amazon_offers: Get all seller offers for a product.
Google (Structured Data):
google_search: Google SERP search results.google_shopping: Google Shopping product results.google_news: Google News articles.google_maps_search: Google Maps places search.google_jobs: Google Jobs listings.
eBay (Structured Data):
ebay_product: Get product details by product ID.ebay_search: Search eBay listings.
Redfin (Structured Data):
redfin_search: Search Redfin listings.redfin_agent: Get agent profile details.redfin_forsale: Get for-sale listing details.redfin_forrent: Get for-rent listing details.
Walmart (Structured Data):
walmart_product: Get product details by product ID.walmart_search: Search Walmart products.walmart_category: Browse a Walmart category.walmart_reviews: Get product reviews.
Error Handling
All API errors raise ScraperAPIError, so you can handle them specifically:
from llama_index.tools.scraperapi import ScraperAPIToolSpec, ScraperAPIError
tool = ScraperAPIToolSpec(api_key=os.environ["SCRAPERAPI_API_KEY"])
try:
result = tool.scrape("https://example.com")
except ScraperAPIError as e:
print(f"Scraping failed: {e}")
Configuration
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
required | ScraperAPI key |
render |
bool |
False |
Enable JS rendering by default |
country_code |
str |
None |
Default geo-targeting country code |
device_type |
str |
None |
"desktop" or "mobile" |
timeout |
int |
70 |
Request timeout in seconds |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llama_index_tools_scraperapi-0.1.0.tar.gz.
File metadata
- Download URL: llama_index_tools_scraperapi-0.1.0.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.12.10 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4369509b46685fc2443fa9b282c193d80112b6ecbf612eaa392be8f49836bba4
|
|
| MD5 |
eccb91f6471908d4749ae9befaa3d2c0
|
|
| BLAKE2b-256 |
dc0bed78192558547eecfc80e1419358015d5ff71adb7d873bf91c83bba27727
|
File details
Details for the file llama_index_tools_scraperapi-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llama_index_tools_scraperapi-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.12.10 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15915e18d051ed6885cf339f99e79272ea09f2f883c21c897cd78bd4db392920
|
|
| MD5 |
4a3c26a09e030e9fc2a15994efa18b83
|
|
| BLAKE2b-256 |
35226547dbd030ad3984452ee0d2367e3234cf60a00c9c3e02b7d40f8e9d5128
|