Site-quality signals for a deployed URL or a local static-site dir — accessibility, structure, SEO, link health, framework detection, validity
Project description
site-analyser
Site-quality signals for a deployed URL or a local static-site directory — the lens-family member that reads a rendered/deployed website rather than its source files.
code-analyserreads.html/.css/.jssource files in isolation; this one crawls a site (live or local), parses each page, and returns accessibility / structure / SEO / link-health / framework / validity / perf signals. Pure-Python core (always pip-installable); optional Lighthouse and the W3C Nu HTML Checker for deeper signals when those tools are onPATH(graceful degradation otherwise).
Install
pip install site-analyser
Optional deeper signals (no Python install needed — site-analyser shells out if it finds them):
# Lighthouse (perf / a11y / SEO / best-practices scores) — needs Node + Chrome
npx -y lighthouse --version
# W3C Nu HTML Checker (deep HTML/CSS validation) — needs Node + Java
npx -y vnu-jar --help
Use
Python:
from site_analyser import SiteAnalyser
result = SiteAnalyser().analyse(url="https://example.com") # live URL
result = SiteAnalyser().analyse(path="./build") # local static-site dir
print(result.overall.page_count) # 12
print(result.overall.accessibility.alt_text_coverage) # 0.94
print(result.overall.broken_links) # []
print(result.overall.frameworks_detected) # {'bootstrap': ['/index.html']}
CLI:
site-analyser https://example.com # human summary
site-analyser ./build --json # raw JSON of a local dir
site-analyser https://example.com --max-pages 20
site-analyser ./build --no-external # skip Lighthouse/vnu even if available
site-analyser serve # HTTP API on port 8012
site-analyser manifest # capability manifest
HTTP (site-analyser serve on port 8012):
curl -X POST http://localhost:8012/analyse \
-H 'content-type: application/json' \
-d '{"url": "https://example.com", "max_pages": 10}'
curl http://localhost:8012/health
curl http://localhost:8012/manifest
Like git-analyser, the API takes a JSON body (not a multipart upload) — exactly one of
url or path must be set.
Signals
Per page and rolled up across the site:
- Crawl — internal-link discovery, page count, broken links (internal HEAD checks).
- Structure — semantic HTML (
<header>/<nav>/<main>/<footer>), heading hierarchy (missing<h1>, skipped levels, depth). - Accessibility (WCAG heuristics) — alt-text coverage, form-label coverage,
langon<html>, ARIA usage, skip-link, image count, doc-language coverage. - SEO —
<title>, meta description, viewport, canonical, OpenGraph coverage. - Tech — framework detection (Bootstrap, Tailwind, Bulma, Materialize, React, Vue, Angular,
jQuery, Svelte), CDN-link detection, inline
style=/<style>/<script>/on*=counts. - Validity — HTML parse-error count (html5lib); deep W3C via
vnuif present. - Perf — page weight (bytes), HTTP status (URL mode); real Lighthouse scores if present.
- External tools (when on
PATH) — Lighthouse categories (perf/a11y/SEO/best-practices),vnuHTML/CSS error/warning counts. Absence is reported (not an error).
The family
Part of the lens analyser family.
| What you want | Use |
|---|---|
Source-file metrics on .html/.css/.js |
code-analyser |
| The rendered/deployed site (URL or build dir) | site-analyser (this) |
| Source-history process signals | git-analyser |
| Any file → right engine | auto-analyser |
Limits
- Pure-Python a11y/perf signals are heuristics — Lighthouse remains the gold standard for the
full picture. Install it (
npx lighthouse --versionworks) for the deep numbers. - The crawler follows same-origin links only, up to
max_pages(default 10). - Broken-link checking does internal HEAD requests; external links are not fetched by default.
- v1 doesn't handle JS-rendered content — pages are parsed from the raw HTML response.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file site_analyser-0.1.0.tar.gz.
File metadata
- Download URL: site_analyser-0.1.0.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c77c4d9462c7069db581a422f2f5c7fdb3ca6e93d0808eccd4bd53f712edf537
|
|
| MD5 |
c105b8d0ef3b529fac2eb77e35c90544
|
|
| BLAKE2b-256 |
03f420b0d60f275a011b4d4f30c5581f943db6378b31470ec3550c3e33e5a046
|
File details
Details for the file site_analyser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: site_analyser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3a5936c48e81eebf556e95bb89bf6cf5a1d765d3007d040ea007d01336c10641
|
|
| MD5 |
dccf95852bf425a0ab583734e2bb580b
|
|
| BLAKE2b-256 |
5c4c816384982dbbd9b874b9300a7307e7720ef9faf5d77a805ce52b5579a5db
|