SEO crawl-audit Streamlit dashboard for nginx-ingress Googlebot logs
Project description
seo-log-auditor
Streamlit dashboard that ingests a 30-day Grafana/Loki export of nginx-ingress Googlebot logs and runs seven SEO crawl-audit techniques on it:
- Crawl budget distribution – how hits are spread across page types vs how URLs are spread across page types.
- Orphan pages – URLs Googlebot is hitting that are not in your sitemap.
- Status-code waste – non-200 ratio per page type, redirect chains.
- Stale high-value pages – sitemap URLs not crawled in the last N days.
- Performance – page-size vs latency vs hit frequency.
- Bot verification – verified Googlebot vs spoofed User-Agents.
- Parameter traps – paths with an explosive number of query-string variants.
Quick start
Zero-install via uv:
uvx seo-log-auditor
Or install into the current environment:
pip install seo-log-auditor
seo-log-auditor
Either command launches the dashboard at http://localhost:8501. To pass flags through to Streamlit:
seo-log-auditor --server.port 9000 --server.headless true
Then in the sidebar:
- Upload your Grafana/Loki export (
.json,.csv, or.txt). - Paste your sitemap URL (a sitemap index works too).
- Optionally upload a
page_patterns.yaml(seesrc/seo_log_auditor/config/page_patterns.example.yaml).
Exporting logs from Grafana
Use this LogQL query in Grafana Explore against your Loki datasource:
{app="ingress-nginx"} |= "Googlebot"
Set the time range to last 30 days and download as JSON (best fidelity)
or CSV. Plain .txt (one log line per row) also works.
Page-pattern config
src/seo_log_auditor/config/page_patterns.example.yaml is the starting point.
First match wins. Edit it for your URL structure and re-upload via the
sidebar. Tagging traps like paginated and faceted as their own page types
makes leakage immediately visible on the Crawl Budget page.
Development
git clone https://github.com/hitensangani/seo-log-auditor.git
cd seo-log-auditor
uv sync --extra dev
uv run streamlit run src/seo_log_auditor/app.py
Project layout
src/seo_log_auditor/
app.py # Streamlit entry, sidebar uploads, KPI overview
cli.py # `seo-log-auditor` console script
pages/ # Multipage app, one file per technique
config/ # Example page-pattern rules
parsers.py # Loki JSON / CSV / nginx-text -> DataFrame
classify.py # Regex-based URL -> page_type
sitemap.py # Fetch + parse sitemap.xml (incl. index)
verify_bot.py # Google IP ranges + cached reverse DNS
analysis/ # One module per technique
tests/ # pytest fixtures + unit tests
Running tests
uv run pytest
Roadmap
- Internal-link-depth correlation (technique 4 deeper layer) once you have a Screaming Frog / Sitebulb export.
- Direct Loki API streaming so you don't need to download files.
- SQLite cache for week-over-week comparisons.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file seo_log_auditor-0.1.1.tar.gz.
File metadata
- Download URL: seo_log_auditor-0.1.1.tar.gz
- Upload date:
- Size: 30.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a938a9e3f43048376065380362e69f2e88560d3b41fa8d9fdf65f4ca0dc95ce2
|
|
| MD5 |
d1fa0272d9f8c681e60e75bcafc804fa
|
|
| BLAKE2b-256 |
2a7e1bba4cf644e334bdad1a28cf9ba543a91925111940906aa2d4fa1daaf6ff
|
File details
Details for the file seo_log_auditor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: seo_log_auditor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 38.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
01af080e5a8caf890fbae7fe8de2272628e033e52644bacd10b06cbac63dd6a9
|
|
| MD5 |
757fbc06c296157cfa0ea6e496f24428
|
|
| BLAKE2b-256 |
06729f296b4d87afd2faf5780d0a1204b3491f6de0679ecdfeb9131bf0967dcb
|