A comprehensive collection of async tools for web scraping, searching, and data extraction using the Firecrawl API
Project description
Firecrawl Tools
A comprehensive collection of async tools for web scraping, searching, and data extraction using the Firecrawl API. Built with LangChain for seamless integration with AI applications.
Features
- URL Scraping: Extract content from single URLs with multiple format options
- Web Search: Search the web and optionally scrape search results
- Website Mapping: Discover all indexed URLs on a website
- Structured Data Extraction: Extract specific information using LLM capabilities
- Deep Research: Conduct comprehensive web research with intelligent crawling
- Website Crawling: Asynchronous crawling of entire websites
- Crawl Status Monitoring: Track and manage crawl jobs
Installation
pip install firecrawl-tools
Quick Start
import asyncio
from firecrawl_tools import FirecrawlTools
# Initialize with your API key
tools = FirecrawlTools(api_key="your_firecrawl_api_key")
# Get individual tools
scrape_tool = await tools.get_scrape_tool()
search_tool = await tools.get_search_tool()
# Use the tools
content = await scrape_tool.ainvoke({
"url": "https://example.com",
"formats": ["markdown"],
"only_main_content": True
})
Available Tools
1. URL Scraping Tool
Extract content from a single URL with advanced options.
scrape_tool = await tools.get_scrape_tool()
result = await scrape_tool.ainvoke({
"url": "https://example.com",
"formats": ["markdown", "html"],
"only_main_content": True,
"wait_for": 2000,
"mobile": False
})
2. Web Search Tool
Search the web and optionally extract content from results.
search_tool = await tools.get_search_tool()
results = await search_tool.ainvoke({
"query": "Python web scraping",
"limit": 5,
"scrape_options": {
"formats": ["markdown"],
"onlyMainContent": True
}
})
3. Website Mapping Tool
Discover all indexed URLs on a website.
map_tool = await tools.get_map_tool()
urls = await map_tool.ainvoke({
"url": "https://example.com",
"include_subdomains": True,
"limit": 100
})
4. Structured Data Extraction Tool
Extract specific information using LLM capabilities.
extract_tool = await tools.get_extract_tool()
data = await extract_tool.ainvoke({
"urls": ["https://example.com"],
"prompt": "Extract all product names and prices",
"schema": {
"type": "object",
"properties": {
"products": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "string"}
}
}
}
}
}
})
5. Deep Research Tool
Conduct comprehensive web research.
research_tool = await tools.get_research_tool()
analysis = await research_tool.ainvoke({
"query": "Latest developments in AI",
"max_depth": 3,
"time_limit": 120,
"max_urls": 50
})
6. Website Crawling Tool
Crawl entire websites asynchronously.
crawl_tool = await tools.get_crawl_tool()
job_id = await crawl_tool.ainvoke({
"url": "https://example.com",
"max_depth": 2,
"limit": 100,
"allow_external_links": False
})
7. Crawl Status Tool
Check the status of crawl jobs.
status_tool = await tools.get_status_tool()
status = await status_tool.ainvoke({
"crawl_id": "your_crawl_job_id"
})
ReAct Agent Integration
Firecrawl Tools work seamlessly with LangChain's ReAct agents, allowing you to build intelligent applications that automatically choose the right tool for each task.
Basic ReAct Agent Setup
import asyncio
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from firecrawl_tools import FirecrawlTools
async def create_react_agent():
# Initialize Firecrawl tools
tools = FirecrawlTools(api_key="your_firecrawl_api_key")
tools_dict = await tools.get_tools_dict()
tool_list = list(tools_dict.values())
# Initialize OpenAI LLM
llm = ChatOpenAI(
openai_api_key="your_openai_api_key",
temperature=0,
model="gpt-4o-mini"
)
# Create ReAct agent
agent = initialize_agent(
tool_list,
llm,
agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
verbose=True,
max_iterations=5,
handle_parsing_errors=True,
)
return agent
# Use the agent
agent = await create_react_agent()
result = await agent.ainvoke(
"Find the main topic of https://example.com and summarize it in 2 sentences."
)
Example Queries
The ReAct agent can handle various natural language queries:
- "What are the latest news headlines on cricbuzz.com?"
- "Extract all product names and prices from https://example.com"
- "Search for information about Python web scraping and provide a summary."
- "Map all URLs on https://example.com and list the top 5 pages."
The agent automatically chooses the appropriate Firecrawl tool (scrape, search, extract, map, etc.) based on the query.
Complete Example
See examples/react_agent_example.py for a complete working example with multiple queries and error handling.
Configuration
You can configure the tools using environment variables or by passing configuration directly:
# Using environment variable
export FIRECRAWL_API_KEY="your_api_key"
# Or pass configuration directly
tools = FirecrawlTools(api_key="your_api_key")
Error Handling
All tools include comprehensive error handling and will raise ToolException with descriptive error messages:
from langchain_core.tools import ToolException
try:
result = await scrape_tool.ainvoke({"url": "https://example.com"})
except ToolException as e:
print(f"Error: {e}")
Contributing
We welcome contributions! Please see our Contributing Guide for details.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
- Documentation: https://github.com/ichbineshan/firecrawl-tools-py
- Issues: https://github.com/ichbineshan/firecrawl-tools-py/issues
- Discussions: https://github.com/ichbineshan/firecrawl-tools-py/discussions
Changelog
See CHANGELOG.md for a list of changes and version history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file firecrawl_tools-0.1.1.tar.gz.
File metadata
- Download URL: firecrawl_tools-0.1.1.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1808a5dedeb7eed6afd468fcf70b98f6f80d3037fe5de30c6a4dc0ae50e3e10f
|
|
| MD5 |
a2f8b6d2278edc64ff92b6c86f841b5b
|
|
| BLAKE2b-256 |
9f9c238eeb7ac5f1c7a5235e3b9afe4a49d5aa4faae5bd26a19fdcb5d5209341
|
File details
Details for the file firecrawl_tools-0.1.1-py3-none-any.whl.
File metadata
- Download URL: firecrawl_tools-0.1.1-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
842302b9bff4cdf837f0bd24937fe5f3c7d04be06475aeffb499d7566af6ceb5
|
|
| MD5 |
f795f6b10b71c69eadc48288ffec01c4
|
|
| BLAKE2b-256 |
c161da498377cce800913771c065b6b9105c70be5c5bcf6ecfc5d30cd0eeb768
|