Python wrapper for SpiderForce4AI HTML-to-Markdown conversion service with LLM post-processing

These details have not been verified by PyPI

Project links

Project description

SpiderForce4AI Python Wrapper (FireCrawl, Jina AI, Crawl4AI Alternative)

A comprehensive Python package for web content crawling, HTML-to-Markdown conversion, and AI-powered post-processing with real-time webhook notifications. Built for seamless integration with the SpiderForce4AI service.

Prerequisites

Important: To use this wrapper, you must have the SpiderForce4AI service running. For full installation and deployment instructions, visit: https://github.com/petertamai/SpiderForce4AI

Features

HTML to Markdown conversion
Advanced content extraction with custom selectors
Parallel and async crawling support
Sitemap processing
Automatic retry mechanism
Real-time webhook notifications for each processed URL
AI-powered post-extraction processing
Post-extraction webhook integration
Detailed progress tracking
Customizable reporting

Installation

pip install spiderforce4ai

Quick Start with Webhooks

from spiderforce4ai import SpiderForce4AI, CrawlConfig
from pathlib import Path

# Initialize crawler with your SpiderForce4AI service URL (required)
spider = SpiderForce4AI("http://localhost:3004")

# Configure crawling options with webhook support
config = CrawlConfig(
    target_selector="article",  # Optional: target element to extract
    remove_selectors=[".ads", ".navigation"],  # Optional: elements to remove
    max_concurrent_requests=5,  # Optional: default is 1
    save_reports=True,  # Optional: default is False
    
    # Webhook configuration for real-time notifications
    webhook_url="https://webhook.site/your-unique-webhook-id",  # Required for webhooks
    webhook_headers={  # Optional custom headers
        "Authorization": "Bearer your-token",
        "Content-Type": "application/json"
    }
)

# Crawl a sitemap with webhook notifications
results = spider.crawl_sitemap_server_parallel("https://petertam.pro/sitemap.xml", config)

Core Components

Configuration Options

The CrawlConfig class accepts the following parameters:

Mandatory Parameters for Webhook Integration

webhook_url: (str) Webhook endpoint URL (required only if you want webhook notifications)

Optional Parameters

Content Selection:

target_selector: (str) CSS selector for the main content to extract
remove_selectors: (List[str]) List of CSS selectors to remove from content
remove_selectors_regex: (List[str]) List of regex patterns for element removal

Processing:

max_concurrent_requests: (int) Number of parallel requests (default: 1)
request_delay: (float) Delay between requests in seconds (default: 0.5)
timeout: (int) Request timeout in seconds (default: 30)

Output:

output_dir: (Path) Directory to save output files (default: "spiderforce_reports")
save_reports: (bool) Whether to save crawl reports (default: False)
report_file: (Path) Custom report file location (generated if None)
combine_to_one_markdown: (str) 'full' or 'metadata_headers' (perfect for SEO) to combine content
combined_markdown_file: (Path) Custom combined file location (generated if None)

Webhook:

webhook_url: (str) Webhook endpoint URL
webhook_timeout: (int) Webhook timeout in seconds (default: 10)
webhook_headers: (Dict[str, str]) Custom webhook headers
webhook_payload_template: (str) Custom webhook payload template

Post-Extraction Processing:

post_extraction_agent: (Dict) Configuration for LLM post-processing
post_extraction_agent_save_to_file: (str) Path to save extraction results
post_agent_transformer_function: (Callable) Custom transformer function for webhook payloads

When Are Webhooks Triggered?

SpiderForce4AI provides webhook notifications at multiple points in the crawling process:

URL Processing Completion: Triggered after each URL is processed (whether successful or failed)
Post-Extraction Completion: Triggered after LLM post-processing of each URL (if configured)
Custom Transformation: You can implement your own webhook logic in the transformer function

The webhook payload contains detailed information about the processed URL, including:

The URL that was processed
Processing status (success/failed)
Extracted markdown content (for successful requests)
Error details (for failed requests)
Timestamp of processing
Post-extraction results (if LLM processing is enabled)

Complete Webhook Integration Examples

Basic Webhook Integration

This example sends a webhook notification after each URL is processed:

from spiderforce4ai import SpiderForce4AI, CrawlConfig
from pathlib import Path

# Initialize crawler
spider = SpiderForce4AI("http://localhost:3004")

# Configure with basic webhook support
config = CrawlConfig(
    # Content selection
    target_selector="article",
    remove_selectors=[".ads", ".navigation"],
    
    # Processing
    max_concurrent_requests=5,
    request_delay=0.5,
    
    # Output
    output_dir=Path("content_output"),
    save_reports=True,
    
    # Webhook configuration
    webhook_url="https://webhook.site/your-unique-id",
    webhook_headers={
        "Authorization": "Bearer your-token",
        "Content-Type": "application/json"
    }
)

# Crawl URLs with webhook notifications
urls = ["https://petertam.pro/about", "https://petertam.pro/contact"]
results = spider.crawl_urls_server_parallel(urls, config)

Custom Webhook Payload Template

You can customize the webhook payload format:

from spiderforce4ai import SpiderForce4AI, CrawlConfig

# Initialize crawler
spider = SpiderForce4AI("http://localhost:3004")

# Configure with custom webhook payload
config = CrawlConfig(
    target_selector="article",
    max_concurrent_requests=3,
    
    # Webhook with custom payload template
    webhook_url="https://webhook.site/your-unique-id",
    webhook_payload_template='''{
        "page": {
            "url": "{url}",
            "status": "{status}",
            "processed_at": "{timestamp}"
        },
        "content": {
            "markdown": "{markdown}",
            "error": "{error}"
        },
        "metadata": {
            "service": "SpiderForce4AI",
            "version": "2.6.7",
            "client_id": "your-client-id"
        }
    }'''
)

# Crawl with custom webhook payload
results = spider.crawl_url("https://petertam.pro", config)

Advanced: Post-Extraction Webhooks

This example demonstrates how to use webhooks with the AI post-processing feature:

from spiderforce4ai import SpiderForce4AI, CrawlConfig
import requests
from pathlib import Path

# Define extraction template structure
extraction_template = """
{
    "Title": "Extract the main title of the content",
    "MetaDescription": "Extract a search-friendly meta description (under 160 characters)",
    "KeyPoints": ["Extract 3-5 key points from the content"],
    "Categories": ["Extract relevant categories for this content"],
    "ReadingTimeMinutes": "Estimate reading time in minutes"
}
"""

# Define a custom webhook function
def post_extraction_webhook(extraction_result):
    """Send extraction results to a webhook and return transformed data."""
    # Add custom fields or transform the data as needed
    payload = {
        "url": extraction_result.get("url", ""),
        "title": extraction_result.get("Title", ""),
        "description": extraction_result.get("MetaDescription", ""),
        "key_points": extraction_result.get("KeyPoints", []),
        "categories": extraction_result.get("Categories", []),
        "reading_time": extraction_result.get("ReadingTimeMinutes", ""),
        "processed_at": extraction_result.get("timestamp", "")
    }
    
    # Send to webhook (example using a different webhook than the main one)
    try:
        response = requests.post(
            "https://webhook.site/your-extraction-webhook-id",
            json=payload,
            headers={
                "Authorization": "Bearer extraction-token",
                "Content-Type": "application/json"
            },
            timeout=10
        )
        print(f"Extraction webhook sent: Status {response.status_code}")
    except Exception as e:
        print(f"Extraction webhook error: {str(e)}")
    
    # Return the transformed data (will be stored in result.extraction_result)
    return payload

# Initialize crawler
spider = SpiderForce4AI("http://localhost:3004")

# Configure with post-extraction and webhooks
config = CrawlConfig(
    # Basic crawling settings
    target_selector="article",
    remove_selectors=[".ads", ".navigation", ".comments"],
    max_concurrent_requests=5,
    
    # Regular webhook for crawl results
    webhook_url="https://webhook.site/your-crawl-webhook-id",
    webhook_headers={
        "Authorization": "Bearer crawl-token",
        "Content-Type": "application/json"
    },
    
    # Post-extraction LLM processing
    post_extraction_agent={
        "model": "gpt-4-turbo",  # Or another compatible model
        "api_key": "your-api-key-here",
        "max_tokens": 1000,
        "temperature": 0.3,
        "response_format": "json_object",  # Request JSON response format
        "messages": [
            {
                "role": "system",
                "content": f"Extract the following information from the content. Return ONLY valid JSON, no explanations:\n\n{extraction_template}"
            },
            {
                "role": "user",
                "content": "{here_markdown_content}"  # Will be replaced with actual content
            }
        ]
    },
    # Save combined extraction results
    post_extraction_agent_save_to_file="extraction_results.json",
    # Custom function to transform and send extraction webhook
    post_agent_transformer_function=post_extraction_webhook
)

# Crawl a sitemap with both regular and post-extraction webhooks
results = spider.crawl_sitemap_server_parallel("https://petertam.pro/sitemap.xml", config)

Real-World Example: SEO Analyzer with Webhooks

This complete example crawls a site and uses Mistral AI to extract SEO-focused data, sending custom webhooks at each step:

from spiderforce4ai import SpiderForce4AI, CrawlConfig
from pathlib import Path
import requests

# Define your extraction template
extraction_template = """
{
    "Title": "Extract the main title of the content (ideally under 60 characters for SEO)",
    "MetaDescription": "Extract the meta description of the content (ideally under 160 characters for SEO)",
    "CanonicalUrl": "Extract the canonical URL of the content",
    "Headings": {
        "H1": "Extract the main H1 heading",
        "H2": ["Extract all H2 headings (for better content structure)"],
        "H3": ["Extract all H3 headings (for better content structure)"]
    },
    "KeyPoints": [
        "Point 1 (focus on primary keywords)",
        "Point 2 (focus on secondary keywords)",
        "Point 3 (focus on user intent)"
    ],
    "CallToAction": "Extract current call to actions (e.g., Sign up now for exclusive offers!)"
}
"""

# Define custom transformer function
def post_webhook(post_extraction_agent_response):
    """Transform and send extracted SEO-focused data to webhook."""
    payload = {
        "url": post_extraction_agent_response.get("url", ""),
        "Title": post_extraction_agent_response.get("Title", ""),
        "MetaDescription": post_extraction_agent_response.get("MetaDescription", ""),
        "CanonicalUrl": post_extraction_agent_response.get("CanonicalUrl", ""),
        "Headings": post_extraction_agent_response.get("Headings", {}),
        "KeyPoints": post_extraction_agent_response.get("KeyPoints", []),
        "CallToAction": post_extraction_agent_response.get("CallToAction", "")
    }
    
    # Send webhook with custom headers
    try:
        response = requests.post(
            "https://webhook.site/your-extraction-webhook-id",
            json=payload,
            headers={
                "Authorization": "Bearer token",
                "X-Custom-Header": "value",
                "Content-Type": "application/json"
            },
            timeout=10
        )
        
        # Log the response for debugging
        if response.status_code == 200:
            print(f"✅ Webhook sent successfully for {payload['url']}")
        else:
            print(f"❌ Failed to send webhook. Status code: {response.status_code}")
    except Exception as e:
        print(f"❌ Webhook error: {str(e)}")
    
    return payload  # Return transformed data

# Initialize crawler
spider = SpiderForce4AI("http://localhost:3004")

# Configure post-extraction agent
post_extraction_agent = {
    "model": "mistral/mistral-large-latest",  # Or any model compatible with litellm
    "messages": [
        {
            "role": "system",
            "content": f"Based on the provided markdown content, extract the following information. Return ONLY valid JSON, no comments or explanations:\n\n{extraction_template}"
        },
        {
            "role": "user",
            "content": "{here_markdown_content}"  # Placeholder for markdown content
        }
    ],
    "api_key": "your-mistral-api-key",  # Replace with your actual API key
    "max_tokens": 8000,
    "response_format": "json_object"  # Request JSON format from the API
}

# Create crawl configuration
config = CrawlConfig(
    # Basic crawl settings
    save_reports=True,
    max_concurrent_requests=5,
    remove_selectors=[
        ".header-bottom",
        "#mainmenu", 
        ".headerfix", 
        ".header",
        ".menu-item", 
        ".wpcf7",
        ".followus-section"
    ],
    output_dir=Path("reports"),
    
    # Crawl results webhook
    webhook_url="https://webhook.site/your-crawl-webhook-id",
    webhook_headers={
        "Authorization": "Bearer crawl-token",
        "Content-Type": "application/json"
    },
    
    # Add post-extraction configuration
    post_extraction_agent=post_extraction_agent,
    post_extraction_agent_save_to_file="combined_extraction.json",
    post_agent_transformer_function=post_webhook
)

# Run the crawler with parallel processing
results = spider.crawl_sitemap_parallel(
    "https://petertam.pro/sitemap.xml",
    config
)

# Print summary
successful = len([r for r in results if r.status == "success"])
extracted = len([r for r in results if hasattr(r, 'extraction_result') and r.extraction_result])
print(f"\n📊 Crawling complete:")
print(f"  - {len(results)} URLs processed")
print(f"  - {successful} successfully crawled")
print(f"  - {extracted} with AI extraction")
print(f"  - Reports saved to: {config.report_file}")
print(f"  - Extraction data saved to: {config.post_extraction_agent_save_to_file}")

AI-Powered Post-Processing with Webhook Integration

The package includes a powerful AI post-processing system through the PostExtractionAgent class with integrated webhook capabilities.

Post-Extraction Configuration

from spiderforce4ai import SpiderForce4AI, CrawlConfig, PostExtractionConfig, PostExtractionAgent
import requests

# Define a custom webhook transformer
def transform_and_notify(result):
    """Process extraction results and send to external system."""
    # Send to external system
    requests.post(
        "https://your-api.com/analyze",
        json=result,
        headers={"Authorization": "Bearer token"}
    )
    return result  # Return data for storage

# Configure post-extraction processing with webhooks
config = CrawlConfig(
    # Basic crawl settings
    target_selector="article",
    max_concurrent_requests=5,
    
    # Standard crawl webhook
    webhook_url="https://webhook.site/your-crawl-webhook-id",
    
    # Post-extraction LLM configuration
    post_extraction_agent={
        "model": "gpt-4-turbo",
        "api_key": "your-api-key-here",
        "max_tokens": 1000,
        "temperature": 0.7,
        "base_url": "https://api.openai.com/v1",
        "messages": [
            {
                "role": "system",
                "content": "Extract key information from the following content."
            },
            {
                "role": "user",
                "content": "Please analyze the following content:\n\n{here_markdown_content}"
            }
        ]
    },
    # Save extraction results to file
    post_extraction_agent_save_to_file="extraction_results.json",
    # Custom webhook transformer
    post_agent_transformer_function=transform_and_notify
)

# Crawl with post-processing and webhooks
results = spider.crawl_urls_server_parallel(["https://petertam.pro"], config)

Crawling Methods

All methods support webhook integration automatically when configured:

1. Single URL Processing

# Synchronous with webhook
result = spider.crawl_url("https://petertam.pro", config)

# Asynchronous with webhook
async def crawl():
    result = await spider.crawl_url_async("https://petertam.pro", config)

2. Multiple URLs

urls = ["https://petertam.pro/about", "https://petertam.pro/contact"]

# Server-side parallel with webhooks (recommended)
results = spider.crawl_urls_server_parallel(urls, config)

# Client-side parallel with webhooks
results = spider.crawl_urls_parallel(urls, config)

# Asynchronous with webhooks
async def crawl():
    results = await spider.crawl_urls_async(urls, config)

3. Sitemap Processing

# Server-side parallel with webhooks (recommended)
results = spider.crawl_sitemap_server_parallel("https://petertam.pro/sitemap.xml", config)

# Client-side parallel with webhooks
results = spider.crawl_sitemap_parallel("https://petertam.pro/sitemap.xml", config)

# Asynchronous with webhooks
async def crawl():
    results = await spider.crawl_sitemap_async("https://petertam.pro/sitemap.xml", config)

Webhook Payload Structure

The default webhook payload includes:

{
  "url": "https://petertam.pro/about",
  "status": "success",
  "markdown": "# About\n\nThis is the about page content...",
  "error": null,
  "timestamp": "2025-02-27T12:30:45.123456",
  "config": {
    "target_selector": "article",
    "remove_selectors": [".ads", ".navigation"]
  },
  "extraction_result": {
    "Title": "About Peter Tam",
    "MetaDescription": "Professional developer with expertise in AI and web technologies",
    "KeyPoints": ["Over 10 years experience", "Specializes in AI integration", "Full-stack development"]
  }
}

For failed requests:

{
  "url": "https://petertam.pro/missing-page",
  "status": "failed",
  "markdown": null,
  "error": "HTTP 404: Not Found",
  "timestamp": "2025-02-27T12:31:23.456789",
  "config": {
    "target_selector": "article",
    "remove_selectors": [".ads", ".navigation"]
  },
  "extraction_result": null
}

Custom Webhook Transformation

You can transform webhook data with a custom function:

def transform_webhook_data(result):
    """Custom transformer for webhook data."""
    # Extract only needed fields
    transformed = {
        "url": result.get("url"),
        "title": result.get("Title"),
        "description": result.get("MetaDescription"),
        "processed_at": result.get("timestamp")
    }
    
    # Add custom calculations
    if "raw_content" in result:
        transformed["word_count"] = len(result["raw_content"].split())
        transformed["reading_time_min"] = max(1, transformed["word_count"] // 200)
    
    # Send to external systems if needed
    requests.post("https://your-analytics-api.com/log", json=transformed)
    
    return transformed  # This will be stored in result.extraction_result

Smart Retry Mechanism

The package provides a sophisticated retry system with webhook notifications:

# Retry behavior with webhook notifications
config = CrawlConfig(
    max_concurrent_requests=5,
    request_delay=1.0,
    webhook_url="https://webhook.site/your-webhook-id",
    # Webhook will be called for both initial attempts and retries
)
results = spider.crawl_urls_async(urls, config)

Report Generation

The package can generate detailed reports of crawling operations:

config = CrawlConfig(
    save_reports=True,
    report_file=Path("custom_report.json"),
    output_dir=Path("content"),
    webhook_url="https://webhook.site/your-webhook-id"
)

Combined Content Output

You can combine multiple pages into a single Markdown file:

config = CrawlConfig(
    combine_to_one_markdown="full",
    combined_markdown_file=Path("all_pages.md"),
    webhook_url="https://webhook.site/your-webhook-id"
)

Progress Tracking

The package provides rich progress tracking with detailed statistics:

Fetching sitemap from https://petertam.pro/sitemap.xml...
Found 23 URLs in sitemap
[━━━━━━━━━━━━━━━━━━━━━━━━━━━━] 100% • 23/23 URLs

Retrying failed URLs: 3 (13.0% failed)
[━━━━━━━━━━━━━━━━━━━━━━━━━━━━] 100% • 3/3 retries

Starting post-extraction processing...
[⠋] Post-extraction processing... • 0/20 0% • 00:00:00

Crawling Summary:
Total URLs processed: 23
Initial failures: 3 (13.0%)
Final results:
  ✓ Successful: 22
  ✗ Failed: 1
Retry success rate: 2/3 (66.7%)

Advanced Error Handling with Webhooks

import logging
from datetime import datetime

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    filename='spider_log.txt'
)

# Define error handling webhook function
def error_handler(extraction_result):
    """Handle errors and send notifications."""
    url = extraction_result.get('url', 'unknown')
    
    # Check for errors
    if 'error' in extraction_result and extraction_result['error']:
        # Log the error
        logging.error(f"Error processing {url}: {extraction_result['error']}")
        
        # Send error notification webhook
        try:
            requests.post(
                "https://webhook.site/your-error-webhook-id",
                json={
                    "url": url,
                    "error": extraction_result['error'],
                    "timestamp": datetime.now().isoformat(),
                    "severity": "high" if "404" in extraction_result['error'] else "medium"
                },
                headers={"X-Error-Alert": "true"}
            )
        except Exception as e:
            logging.error(f"Failed to send error webhook: {e}")
    
    # Always return the original result to ensure data is preserved
    return extraction_result

# Add to config
config = CrawlConfig(
    # Regular webhook for all crawl results
    webhook_url="https://webhook.site/your-regular-webhook-id",
    
    # Error handling webhook through the transformer function
    post_agent_transformer_function=error_handler
)

Requirements

Python 3.11+
Running SpiderForce4AI service
Internet connection

Dependencies

aiohttp
asyncio
rich
aiofiles
httpx
litellm
pydantic
requests
pandas
numpy
openai

License

MIT License

Credits

Created by Piotr Tamulewicz

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.7

Feb 27, 2025

2.6.9

Feb 27, 2025

2.6.8

Feb 27, 2025

2.6.7

Feb 16, 2025

2.6.6

Feb 16, 2025

2.6.5

Feb 16, 2025

2.6.4

Feb 15, 2025

2.6.3

Feb 15, 2025

2.6

Feb 15, 2025

2.5.9

Feb 15, 2025

2.5.8

Feb 15, 2025

2.5.7

Feb 15, 2025

2.5.6

Feb 15, 2025

2.5.5

Feb 15, 2025

2.5.4

Feb 15, 2025

2.5.3

Feb 15, 2025

2.5.2

Feb 15, 2025

2.5.1

Feb 15, 2025

2.5

Feb 15, 2025

2.4.9

Feb 15, 2025

2.4.8

Feb 15, 2025

2.4.7

Feb 15, 2025

2.4.6

Feb 15, 2025

2.4.5

Feb 15, 2025

2.4.3

Feb 15, 2025

2.4.2

Feb 15, 2025

2.4.1

Feb 15, 2025

2.4

Feb 15, 2025

2.3.1

Feb 15, 2025

2.1

Feb 15, 2025

2.0

Feb 15, 2025

1.9

Feb 15, 2025

1.8

Feb 15, 2025

1.7

Feb 15, 2025

1.6

Feb 15, 2025

1.5

Feb 15, 2025

1.4

Feb 15, 2025

1.3

Feb 15, 2025

1.2

Feb 15, 2025

1.1

Feb 15, 2025

1.0

Feb 15, 2025

0.1.9

Feb 15, 2025

0.1.8

Feb 15, 2025

0.1.7

Feb 15, 2025

0.1.6

Feb 15, 2025

0.1.5

Feb 15, 2025

0.1.4

Feb 15, 2025

0.1.3

Feb 15, 2025

0.1.2

Feb 15, 2025

0.1.0

Feb 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spiderforce4ai-2.7.tar.gz (28.0 kB view details)

Uploaded Feb 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spiderforce4ai-2.7-py3-none-any.whl (20.5 kB view details)

Uploaded Feb 27, 2025 Python 3

File details

Details for the file spiderforce4ai-2.7.tar.gz.

File metadata

Download URL: spiderforce4ai-2.7.tar.gz
Upload date: Feb 27, 2025
Size: 28.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for spiderforce4ai-2.7.tar.gz
Algorithm	Hash digest
SHA256	`4620e412b1ce2715f71d09611764170171513b04448ac8971957bf623706e34e`
MD5	`0ef89b9a9d4013958e2138f0891723b5`
BLAKE2b-256	`3b99c71f8c991922517aa57a8959f226b1d05c0b84c6cccda3eadf482b5353d4`

See more details on using hashes here.

File details

Details for the file spiderforce4ai-2.7-py3-none-any.whl.

File metadata

Download URL: spiderforce4ai-2.7-py3-none-any.whl
Upload date: Feb 27, 2025
Size: 20.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for spiderforce4ai-2.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3a3e2c685a6c856f107bc70ade6a21ef58da1fa4f5094afa327376bc818ae1c1`
MD5	`6161e7e80ecf95c8f4701238e865b222`
BLAKE2b-256	`019bb4a62538e372f7a78109b39659e69a871ffbd99787a49751f22284816c15`

See more details on using hashes here.

spiderforce4ai 2.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SpiderForce4AI Python Wrapper (FireCrawl, Jina AI, Crawl4AI Alternative)

Prerequisites

Features

Installation

Quick Start with Webhooks

Core Components

Configuration Options

Mandatory Parameters for Webhook Integration

Optional Parameters

When Are Webhooks Triggered?

Complete Webhook Integration Examples

Basic Webhook Integration

Custom Webhook Payload Template

Advanced: Post-Extraction Webhooks

Real-World Example: SEO Analyzer with Webhooks

AI-Powered Post-Processing with Webhook Integration

Post-Extraction Configuration

Crawling Methods

1. Single URL Processing

2. Multiple URLs

3. Sitemap Processing

Webhook Payload Structure

Custom Webhook Transformation

Smart Retry Mechanism

Report Generation

Combined Content Output

Progress Tracking

Advanced Error Handling with Webhooks

Requirements

Dependencies

License

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes