Pre-scraping intelligence tool — scan a website before scraping it

These details have not been verified by PyPI

Project links

Homepage

Project description

scrapalyser

Pre-scraping intelligence tool. Scan any website before writing a single line of scraper.

Install

# Core (curl_cffi engine)
pip install scrapalyser

# With Playwright support
pip install scrapalyser[playwright]
playwright install chromium

Usage

import scrapalyser

# Simple scan — returns a dict
report = scrapalyser.scan("https://example.com")
print(report)

# Full options
report = scrapalyser.scan(
    url="https://example.com",
    output="txt",           # "json" (default) or "txt"
    lang="fr",              # "en" (default), "fr", "es", "br"
    save="report.txt",      # optional — write output to file
    engine="curl",          # "curl" (default) or "playwright"
    headless=True,          # True (default) or False — playwright only
    screenshot="shot.png",  # optional — playwright only
)

Example output

JSON

{
  "url": "https://example.com",
  "scanned_at": "2026-04-30T12:32:00Z",
  "status_code": 200,
  "engine": "curl",
  "antibot": {
    "detected": true,
    "name": "Cloudflare Turnstile"
  },
  "technology": {
    "type": "React"
  },
  "js_required": true,
  "api": {
    "detected": true,
    "endpoints": [
      "https://api.example.com/v1/search"
    ]
  },
  "robots_txt": {
    "found": true,
    "url": "https://example.com/robots.txt"
  },
  "sitemap": {
    "found": true,
    "url": "https://example.com/sitemap.xml"
  },
  "login_wall": {
    "detected": false,
    "type": null
  }
}

TXT (French)

═══════════════════════════════════════════
        SCRAPALYSER RAPPORT
        https://example.com
        Scanné le : 2026-04-30T12:32:00Z
        Status : 200
═══════════════════════════════════════════

[ANTI-BOT]
  ⚠️  Détecté : Cloudflare Turnstile

[TECHNOLOGIE]
  🖥️  Type : React

[JAVASCRIPT]
  ⚠️  JS requis : Oui → utiliser Playwright/Selenium

[API DÉTECTÉES]
  ✅  https://api.example.com/v1/search

[ROBOTS.TXT]
  ✅  https://example.com/robots.txt

[SITEMAP]
  ✅  https://example.com/sitemap.xml

[LOGIN WALL]
  ✅  Aucun login requis

═══════════════════════════════════════════

Features

🛡️ Anti-bot detection — Cloudflare, DataDome, PerimeterX, Akamai, Kasada, reCAPTCHA, hCaptcha
🖥️ Technology detection — React, Vue, Angular, Next.js, Nuxt, Svelte, WordPress, Shopify, Drupal, Joomla, Wix, Webflow
⚡ JS requirement detection — know before you write whether requests is enough or Playwright is needed
🌐 API endpoint discovery — via CSP headers, inline scripts, and XHR/Fetch interception (Playwright mode)
🤖 robots.txt parser — URL extraction with common path variants
🗺️ Sitemap detection — via robots.txt or direct probe
🔐 Login wall detection — form, redirect, button, OAuth
📸 Screenshot — capture the page as seen by the browser (Playwright mode)
🌍 Multi-language output — fr, en, es, br
📄 JSON & TXT export — machine-readable or human-readable

Engines

Feature	`curl`	`playwright`
Speed	Fast	Slower
JS execution	❌	✅
XHR/Fetch API detection	❌	✅
Screenshot	❌	✅
Bot detection bypass	Partial	Better with `headless=False`

Contributing

Fork the repo
Create a feature branch (git checkout -b feat/my-feature)
Commit your changes
Open a pull request

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.1

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrapalyser-0.1.1.tar.gz (14.4 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scrapalyser-0.1.1-py3-none-any.whl (16.9 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file scrapalyser-0.1.1.tar.gz.

File metadata

Download URL: scrapalyser-0.1.1.tar.gz
Upload date: Apr 30, 2026
Size: 14.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for scrapalyser-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`da6d8d912750e888a1166eb2dc4d98ed203dc3b4f1ddffc0f06eadd57cc1b4d4`
MD5	`fdf5ab2e67eed5483b88be9c4e776351`
BLAKE2b-256	`a0b6a4183740a41c74cf50c3187d611963c437fd2af94b052e2106993fe68ce6`

See more details on using hashes here.

File details

Details for the file scrapalyser-0.1.1-py3-none-any.whl.

File metadata

Download URL: scrapalyser-0.1.1-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 16.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for scrapalyser-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f6c7e9bd4e1f8a3d80e1f64d5d2b454941ee9ae18fd97ac5e6dd661858c7635`
MD5	`8e2db172aacdeb5091c83cdf38e7e92c`
BLAKE2b-256	`3d345f952b42b610a1507327175c8cbfde913a6e8608ee428350692b30fefff4`

See more details on using hashes here.

scrapalyser 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

scrapalyser

Install

Usage

Example output

JSON

TXT (French)

Features

Engines

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes