Skip to main content

Site-quality signals for a deployed URL or a local static-site dir — accessibility, structure, SEO, link health, framework detection, validity

Project description

site-analyser

Site-quality signals for a deployed URL or a local static-site directory — the lens-family member that reads a rendered/deployed website rather than its source files.

code-analyser reads .html/.css/.js source files in isolation; this one crawls a site (live or local), parses each page, and returns accessibility / structure / SEO / link-health / framework / validity / perf signals. Pure-Python core (always pip-installable); optional Lighthouse and the W3C Nu HTML Checker for deeper signals when those tools are on PATH (graceful degradation otherwise).

Install

pip install site-analyser

Optional deeper signals (no Python install needed — site-analyser shells out if it finds them):

# Lighthouse (perf / a11y / SEO / best-practices scores) — needs Node + Chrome
npx -y lighthouse --version

# W3C Nu HTML Checker (deep HTML/CSS validation) — needs Node + Java
npx -y vnu-jar --help

Use

Python:

from site_analyser import SiteAnalyser

result = SiteAnalyser().analyse(url="https://example.com")          # live URL
result = SiteAnalyser().analyse(path="./build")                     # local static-site dir
print(result.overall.page_count)                                    # 12
print(result.overall.accessibility.alt_text_coverage)               # 0.94
print(result.overall.broken_links)                                  # []
print(result.overall.frameworks_detected)                           # {'bootstrap': ['/index.html']}

CLI:

site-analyser https://example.com                # human summary
site-analyser ./build --json                     # raw JSON of a local dir
site-analyser https://example.com --max-pages 20
site-analyser ./build --no-external              # skip Lighthouse/vnu even if available
site-analyser serve                              # HTTP API on port 8012
site-analyser manifest                           # capability manifest

HTTP (site-analyser serve on port 8012):

curl -X POST http://localhost:8012/analyse \
  -H 'content-type: application/json' \
  -d '{"url": "https://example.com", "max_pages": 10}'

curl http://localhost:8012/health
curl http://localhost:8012/manifest

Like git-analyser, the API takes a JSON body (not a multipart upload) — exactly one of url or path must be set.

Signals

Per page and rolled up across the site:

  • Crawl — internal-link discovery, page count, broken links (internal HEAD checks).
  • Structure — semantic HTML (<header>/<nav>/<main>/<footer>), heading hierarchy (missing <h1>, skipped levels, depth).
  • Accessibility (WCAG heuristics) — alt-text coverage, form-label coverage, lang on <html>, ARIA usage, skip-link, image count, doc-language coverage.
  • SEO<title>, meta description, viewport, canonical, OpenGraph coverage.
  • Tech — framework detection (Bootstrap, Tailwind, Bulma, Materialize, React, Vue, Angular, jQuery, Svelte), CDN-link detection, inline style= / <style> / <script> / on*= counts.
  • Validity — HTML parse-error count (html5lib); deep W3C via vnu if present.
  • Perf — page weight (bytes), HTTP status (URL mode); real Lighthouse scores if present.
  • External tools (when on PATH) — Lighthouse categories (perf/a11y/SEO/best-practices), vnu HTML/CSS error/warning counts. Absence is reported (not an error).

The family

Part of the lens analyser family.

What you want Use
Source-file metrics on .html/.css/.js code-analyser
The rendered/deployed site (URL or build dir) site-analyser (this)
Source-history process signals git-analyser
Any file → right engine auto-analyser

Limits

  • Pure-Python a11y/perf signals are heuristics — Lighthouse remains the gold standard for the full picture. Install it (npx lighthouse --version works) for the deep numbers.
  • The crawler follows same-origin links only, up to max_pages (default 10).
  • Broken-link checking does internal HEAD requests; external links are not fetched by default.
  • v1 doesn't handle JS-rendered content — pages are parsed from the raw HTML response.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

site_analyser-0.1.0.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

site_analyser-0.1.0-py3-none-any.whl (21.8 kB view details)

Uploaded Python 3

File details

Details for the file site_analyser-0.1.0.tar.gz.

File metadata

  • Download URL: site_analyser-0.1.0.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for site_analyser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c77c4d9462c7069db581a422f2f5c7fdb3ca6e93d0808eccd4bd53f712edf537
MD5 c105b8d0ef3b529fac2eb77e35c90544
BLAKE2b-256 03f420b0d60f275a011b4d4f30c5581f943db6378b31470ec3550c3e33e5a046

See more details on using hashes here.

File details

Details for the file site_analyser-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: site_analyser-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 21.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for site_analyser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a5936c48e81eebf556e95bb89bf6cf5a1d765d3007d040ea007d01336c10641
MD5 dccf95852bf425a0ab583734e2bb580b
BLAKE2b-256 5c4c816384982dbbd9b874b9300a7307e7720ef9faf5d77a805ce52b5579a5db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page