Skip to main content

A comprehensive collection of async tools for web scraping, searching, and data extraction using the Firecrawl API

Project description

Firecrawl Tools

A comprehensive collection of async tools for web scraping, searching, and data extraction using the Firecrawl API. Built with LangChain for seamless integration with AI applications.

Features

  • URL Scraping: Extract content from single URLs with multiple format options
  • Web Search: Search the web and optionally scrape search results
  • Website Mapping: Discover all indexed URLs on a website
  • Structured Data Extraction: Extract specific information using LLM capabilities
  • Deep Research: Conduct comprehensive web research with intelligent crawling
  • Website Crawling: Asynchronous crawling of entire websites
  • Crawl Status Monitoring: Track and manage crawl jobs

Installation

pip install firecrawl-tools

Quick Start

import asyncio
from firecrawl_tools import FirecrawlTools

# Initialize with your API key
tools = FirecrawlTools(api_key="your_firecrawl_api_key")

# Get individual tools
scrape_tool = await tools.get_scrape_tool()
search_tool = await tools.get_search_tool()

# Use the tools
content = await scrape_tool.ainvoke({
    "url": "https://example.com",
    "formats": ["markdown"],
    "only_main_content": True
})

Available Tools

1. URL Scraping Tool

Extract content from a single URL with advanced options.

scrape_tool = await tools.get_scrape_tool()
result = await scrape_tool.ainvoke({
    "url": "https://example.com",
    "formats": ["markdown", "html"],
    "only_main_content": True,
    "wait_for": 2000,
    "mobile": False
})

2. Web Search Tool

Search the web and optionally extract content from results.

search_tool = await tools.get_search_tool()
results = await search_tool.ainvoke({
    "query": "Python web scraping",
    "limit": 5,
    "scrape_options": {
        "formats": ["markdown"],
        "onlyMainContent": True
    }
})

3. Website Mapping Tool

Discover all indexed URLs on a website.

map_tool = await tools.get_map_tool()
urls = await map_tool.ainvoke({
    "url": "https://example.com",
    "include_subdomains": True,
    "limit": 100
})

4. Structured Data Extraction Tool

Extract specific information using LLM capabilities.

extract_tool = await tools.get_extract_tool()
data = await extract_tool.ainvoke({
    "urls": ["https://example.com"],
    "prompt": "Extract all product names and prices",
    "schema": {
        "type": "object",
        "properties": {
            "products": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "name": {"type": "string"},
                        "price": {"type": "string"}
                    }
                }
            }
        }
    }
})

5. Deep Research Tool

Conduct comprehensive web research.

research_tool = await tools.get_research_tool()
analysis = await research_tool.ainvoke({
    "query": "Latest developments in AI",
    "max_depth": 3,
    "time_limit": 120,
    "max_urls": 50
})

6. Website Crawling Tool

Crawl entire websites asynchronously.

crawl_tool = await tools.get_crawl_tool()
job_id = await crawl_tool.ainvoke({
    "url": "https://example.com",
    "max_depth": 2,
    "limit": 100,
    "allow_external_links": False
})

7. Crawl Status Tool

Check the status of crawl jobs.

status_tool = await tools.get_status_tool()
status = await status_tool.ainvoke({
    "crawl_id": "your_crawl_job_id"
})

ReAct Agent Integration

Firecrawl Tools work seamlessly with LangChain's ReAct agents, allowing you to build intelligent applications that automatically choose the right tool for each task.

Basic ReAct Agent Setup

import asyncio
from langchain_openai import ChatOpenAI
from langchain.agents import initialize_agent, AgentType
from firecrawl_tools import FirecrawlTools

async def create_react_agent():
    # Initialize Firecrawl tools
    tools = FirecrawlTools(api_key="your_firecrawl_api_key")
    tools_dict = await tools.get_tools_dict()
    tool_list = list(tools_dict.values())
    
    # Initialize OpenAI LLM
    llm = ChatOpenAI(
        openai_api_key="your_openai_api_key",
        temperature=0,
        model="gpt-4o-mini"
    )
    
    # Create ReAct agent
    agent = initialize_agent(
        tool_list,
        llm,
        agent=AgentType.STRUCTURED_CHAT_ZERO_SHOT_REACT_DESCRIPTION,
        verbose=True,
        max_iterations=5,
        handle_parsing_errors=True,
    )
    
    return agent

# Use the agent
agent = await create_react_agent()
result = await agent.ainvoke(
    "Find the main topic of https://example.com and summarize it in 2 sentences."
)

Example Queries

The ReAct agent can handle various natural language queries:

  • "What are the latest news headlines on cricbuzz.com?"
  • "Extract all product names and prices from https://example.com"
  • "Search for information about Python web scraping and provide a summary."
  • "Map all URLs on https://example.com and list the top 5 pages."

The agent automatically chooses the appropriate Firecrawl tool (scrape, search, extract, map, etc.) based on the query.

Complete Example

See examples/react_agent_example.py for a complete working example with multiple queries and error handling.

Configuration

You can configure the tools using environment variables or by passing configuration directly:

# Using environment variable
export FIRECRAWL_API_KEY="your_api_key"

# Or pass configuration directly
tools = FirecrawlTools(api_key="your_api_key")

Error Handling

All tools include comprehensive error handling and will raise ToolException with descriptive error messages:

from langchain_core.tools import ToolException

try:
    result = await scrape_tool.ainvoke({"url": "https://example.com"})
except ToolException as e:
    print(f"Error: {e}")

Contributing

We welcome contributions! Please see our Contributing Guide for details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Changelog

See CHANGELOG.md for a list of changes and version history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

firecrawl_tools-0.1.0.tar.gz (22.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

firecrawl_tools-0.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file firecrawl_tools-0.1.0.tar.gz.

File metadata

  • Download URL: firecrawl_tools-0.1.0.tar.gz
  • Upload date:
  • Size: 22.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for firecrawl_tools-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b087cc1be21b1f331c789142495c1df643620a7c400058318aaeb9ad746a68c0
MD5 169d0e70a8f389d4a4f91914ba4b045d
BLAKE2b-256 1f28a2dc9f3f33798f1d8f3dfbb6452cd68def3a1ab6569f985a13d3545af60c

See more details on using hashes here.

File details

Details for the file firecrawl_tools-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for firecrawl_tools-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 547ed3b2bad95ffd78c91527dc9281a714148422ff8189c2815811d06cc79968
MD5 884bbbbdae4d0ea6bfbfd2d86ae44c37
BLAKE2b-256 dabf2bb7e7f0467d02ad0074619d81b7d1782e31ae0757ac2b8898589a18f993

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page