Skip to main content

Scrapy pipeline & extensions for AP Cloudy (logs, stats, requests, items)

Project description

apcloudy-pipeline

apcloudy-pipeline is a Scrapy integration that sends items, requests, logs, and spider statistics to the Your backend using secure HMAC-based authentication.


Features

  • 📦 Item forwarding using Scrapy Item Pipeline
  • 🌐 Request and response logging
  • 📊 Spider statistics reporting
  • 🧾 Spider, user, and Scrapy internal log forwarding
  • 🔐 HMAC-secured API communication

Installation

pip install apcloudy-pipeline

Configuration

Add the following settings to your Scrapy project's settings.py file:

APCLOUDY_API_URL = "http://localhost:8000/api/v1/"
APCLOUDY_API_KEY = "api_test_1234567890"
APCLOUDY_SECRET_KEY = "secret_test_1234567890"
JOB_ID = 123

Item Pipeline (Required)

The item pipeline is required to send scraped items to the backend.

ITEM_PIPELINES = {
    "apcloudy_pipeline.pipelines.APCloudyItemPipeline": 300,
}

Extensions (Optional)

Enable the following extensions if you want to send requests, logs, and spider statistics.

EXTENSIONS = {
    "apcloudy_pipeline.request_logger.APCloudyRequestLogger": 400,
    "apcloudy_pipeline.extensions.APCloudyLoggingExtension": 510,
    "apcloudy_pipeline.extensions.APCloudyStatsExtension": 520,
}

Extensions Overview

  • APCloudyRequestLogger Captures request and response metadata such as URL, HTTP method, status code, timing, and fingerprint.

  • APCloudyLoggingExtension Sends spider logs, user logs, Scrapy internal logs, and exception tracebacks to the backend.

  • APCloudyStatsExtension Sends final spider statistics when the crawl finishes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apcloudy_pipeline-0.1.0.tar.gz (6.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

apcloudy_pipeline-0.1.0-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file apcloudy_pipeline-0.1.0.tar.gz.

File metadata

  • Download URL: apcloudy_pipeline-0.1.0.tar.gz
  • Upload date:
  • Size: 6.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for apcloudy_pipeline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 82353e9ad1b0e5c9b71e058943308142da59fa9a84ac811e119c957137d9e5b9
MD5 1ce46375e34debb3be849d634ab88d24
BLAKE2b-256 fe7b92977b58a282caf7d0e3f5684649cb4f20a41e00c57c9fdf6fdc43789fac

See more details on using hashes here.

File details

Details for the file apcloudy_pipeline-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for apcloudy_pipeline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4aba63b1c362c2d4807a379c469cd3cc1cedf49cab6e256aa16cc9b4af74160
MD5 ccace1f062f9b920b189d7e74239a391
BLAKE2b-256 480a665688ffb761a99a1c1d49c2e73c7480dbe91ec48f96a9afbadcc542e23d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page