Skip to main content

This tool leverages Firecrawl to generate concise summaries of web pages directly from their URLs. Firecrawl processes the content of the provided website, extracting key insights and metadata to deliver a brief, focused summary.

Project description

website_firecrawl_service - MCP Server

🔍 Internet research just got smarter! Built an MCP server that turns any website into structured, relevant content based on your queries!

Using @Firecrawl's powerful features (mapping, selection, scraping), combined with GPT-4o for smart URL filtering, it's like having an AI research assistant that knows exactly what you're looking for! Works seamlessly with Claude, or any MCP-compatible client!

Agentic Web Scraping Architecture

An agentic web scraping system powered by Firecrawl: Map → Select → Scrape → Extract


Features

  • Efficient Web Crawling: Crawls websites using the Firecrawl API with customizable link limits and intelligent URL selection
  • Intelligent URL Selection: Uses GPT-4 to select the most relevant URLs based on user queries
  • Smart Content Processing: Extracts and cleans HTML content, providing readable text output

Setup

  1. Create a .env file with the following required environment variables:
    FIRECRAWL_API_KEY=your_firecrawl_api_key
    OPENAI_API_KEY=your_openai_api_key
    

Usage

The server exposes a single tool:

website_firecrawl

Description: Crawls a website and returns relevant content based on a query.

Parameters:

  • query (string): The search query to filter relevant content
  • base_url (string): The target website URL to crawl
  • max_links (integer, optional): Maximum number of links to process (default: 100)

Technical Details

  • Built using the MCP (Model Control Protocol) server framework
  • Implements retry logic with exponential backoff for API calls
  • Integrates with LangSmith for tracing and monitoring
  • Implements singleton patterns for API clients to manage resources efficiently
  • Uses Pydantic for robust data validation and serialization:
    • WebsiteCrawlArgs: Validates input parameters for the crawling service
    • CrawlerModel: Handles URL selection and justification
    • Page: Structures metadata and content from crawled pages
  • Structured OpenAI Integration:
    • Uses OpenAI's beta chat completions with parsing
    • Automatically validates and converts JSON responses to Pydantic models
    • Ensures type safety and data validation for AI-generated content
  • Jinja2 Template System:
    • Modular prompt management using template inheritance
    • Dynamic prompt generation based on user queries and context
    • Separate system and user prompt templates for clear separation of concerns
    • Easy maintenance and updates of prompt structures

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file iflow_mcp_lgesuellip_website_firecrawl_service-0.1.0.tar.gz.

File metadata

  • Download URL: iflow_mcp_lgesuellip_website_firecrawl_service-0.1.0.tar.gz
  • Upload date:
  • Size: 256.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_lgesuellip_website_firecrawl_service-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d1816cc7f477e572078310d7d51ad5a6978fb817eb1f672fe812d9e3be607dc
MD5 1d8df2ce9242e2d5e5173a4e948b95b0
BLAKE2b-256 a4954ea528a275b558119c2dd05efeb79999753b123d086065980600664d42e5

See more details on using hashes here.

File details

Details for the file iflow_mcp_lgesuellip_website_firecrawl_service-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: iflow_mcp_lgesuellip_website_firecrawl_service-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for iflow_mcp_lgesuellip_website_firecrawl_service-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b5eb5449dde2ef875017dab039a4698ea3cc0b7c0d3b9a9ca4488436c1cf7d46
MD5 e4225cf2564bed28adfddaee9c3f6438
BLAKE2b-256 e7b4e698aee380bb6701ad3caff6471627b0da2c18e845862d923a2a329c2bc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page