Fix canonical links, FLAGS, and RO<->EN cross-references across mirrored HTML directories.
Project description
html-intersection
Fix canonical links, FLAGS, and RO↔EN cross-references across mirrored HTML directories.
What it does
- Ensures each file's
<link rel="canonical" ...>matches its exact filename (case-sensitive) - Ensures the FLAGS links for RO (
+40) and EN (+1) match the canonical in the same file - Synchronizes cross-references between
ro/anden/files so the pair points to each other (RO<->EN) - Detects and reports unresolved cases:
- invalid links (pointing to non-existent files)
- pairs with no common links in FLAGS (all four links different)
- unmatched RO/EN files that remain without a valid pair
Inspired by the step-by-step process in your Intersection scripts and packaged like a PyPI library (style similar to html on PyPI).
Install
pip install html-intersection
Quick start
from html_intersection.core import repair_all
repair_all(
ro_directory=r"E:\\path\\to\\site\\ro",
en_directory=r"E:\\path\\to\\site\\en",
base_url="https://neculaifantanaru.com",
)
CLI usage
html-intersection repair --ro-dir "E:\\path\\to\\site\\ro" --en-dir "E:\\path\\to\\site\\en" --base-url https://neculaifantanaru.com
Commands:
repair(runs all 3 steps)fix-canonicalsfix-flagssyncscan(prints detected RO↔EN pairs; add--reportto include invalid links, mismatches, unmatched files)
Python API
fix_canonicals(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)fix_flags_match_canonical(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)sync_cross_references(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)repair_all(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)
Examples
- Basic repair
from html_intersection.core import repair_all
repair_all(
ro_directory=r"E:\\site\\ro",
en_directory=r"E:\\site\\en",
base_url="https://neculaifantanaru.com",
)
- Dry run (no writes)
from html_intersection.core import repair_all
repair_all(
ro_directory=r"E:\\site\\ro",
en_directory=r"E:\\site\\en",
base_url="https://neculaifantanaru.com",
dry_run=True,
)
- CLI one step at a time
html-intersection fix-canonicals --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com
html-intersection fix-flags --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com
html-intersection sync --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com
# Scan with detailed report
html-intersection scan --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com --report
How the logic works (3 steps)
- Canonicals: set canonical to exact file name (case-sensitive); RO →
https://.../<name>.html, EN →https://.../en/<Name>.html. - FLAGS = canonical in the same file: RO uses
cunt_code="+40"; EN usescunt_code="+1". - Cross-references RO↔EN: in
ro/<name>.htmlthe+1link points to the paireden/<Name>.html; inen/<Name>.htmlthe+40link points to the pairedro/<name>.html.
Notes on robustness
- The matching for
+40and+1accepts both"+40"and"\+40"(and similarly for+1). - Accidental
...html.htmlis normalized to...htmlwhen comparing and fixing. scan --reportsurfaces invalid links, mismatched pairs with no common links, and files left unmatched.
Windows install and build
# Create and activate venv
py -m venv .venv
.\.venv\Scripts\Activate.ps1
# Install build tooling
py -m pip install --upgrade pip build twine
# Build the wheel and sdist
py -m build
# Upload to TestPyPI (recommended first)
$env:TWINE_USERNAME = "__token__"
$env:TWINE_PASSWORD = "pypi-<YOUR_TESTPYPI_TOKEN>"
py -m twine upload --repository testpypi dist/*
# Upload to PyPI (when ready)
$env:TWINE_USERNAME = "__token__"
$env:TWINE_PASSWORD = "pypi-<YOUR_PYPI_TOKEN>"
py -m twine upload dist/*
Notes
- Files are written UTF-8; the reader tries
utf-8,latin1,cp1252,iso-8859-1. - You can pass
backup_ext=".bak"to keep a backup of modified files. - The library aims to follow the precise, case-sensitive flow in your instructions.
References
- Diacritice project structure reference:
https://github.com/me-suzy/Diacritice-Proiect---pypi-org - PyPI
htmlpackage page style reference:https://pypi.org/project/html/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file html_intersection-0.2.0.tar.gz.
File metadata
- Download URL: html_intersection-0.2.0.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0a81891116a4b3ee46f956e69c23d2a6abf793d2c4642a5a9956ca142e9527d
|
|
| MD5 |
b11870c479c0071d8d9bda33a67afe4b
|
|
| BLAKE2b-256 |
5542418526a4fd90d4689b983b8a9f2d12f561bb1ca22b6d7b60a448ecdf3fc9
|
File details
Details for the file html_intersection-0.2.0-py3-none-any.whl.
File metadata
- Download URL: html_intersection-0.2.0-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
098b6b95a4f8f48824c11cba4b66b131ba767e98e367d1d61b1dfcc96fd20b0a
|
|
| MD5 |
f12666a234fb3c7340950c91fa94e5bc
|
|
| BLAKE2b-256 |
4f5b1e964093ca58380e202f01fae884b918ce512ab962c0ea8ade1974524f71
|