Skip to main content

Fix canonical links, FLAGS, and RO<->EN cross-references across mirrored HTML directories.

Project description

html-intersection

Fix canonical links, FLAGS, and RO↔EN cross-references across mirrored HTML directories.

What it does

  • Ensures each file's <link rel="canonical" ...> matches its exact filename (case-sensitive)
  • Ensures the FLAGS links for RO (+40) and EN (+1) match the canonical in the same file
  • Synchronizes cross-references between ro/ and en/ files so the pair points to each other (RO<->EN)
  • Detects and reports unresolved cases:
    • invalid links (pointing to non-existent files)
    • pairs with no common links in FLAGS (all four links different)
    • unmatched RO/EN files that remain without a valid pair

Inspired by the step-by-step process in your Intersection scripts and packaged like a PyPI library (style similar to html on PyPI).

Install

pip install html-intersection

Quick start

from html_intersection.core import repair_all

repair_all(
    ro_directory=r"E:\\path\\to\\site\\ro",
    en_directory=r"E:\\path\\to\\site\\en",
    base_url="https://neculaifantanaru.com",
)

CLI usage

html-intersection repair --ro-dir "E:\\path\\to\\site\\ro" --en-dir "E:\\path\\to\\site\\en" --base-url https://neculaifantanaru.com

Commands:

  • repair (runs all 3 steps)
  • fix-canonicals
  • fix-flags
  • sync
  • scan (prints detected RO↔EN pairs; add --report to include invalid links, mismatches, unmatched files)

Python API

  • fix_canonicals(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)
  • fix_flags_match_canonical(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)
  • sync_cross_references(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)
  • repair_all(ro_directory, en_directory, base_url, dry_run=False, backup_ext=None)

Examples

  1. Basic repair
from html_intersection.core import repair_all

repair_all(
    ro_directory=r"E:\\site\\ro",
    en_directory=r"E:\\site\\en",
    base_url="https://neculaifantanaru.com",
)
  1. Dry run (no writes)
from html_intersection.core import repair_all

repair_all(
    ro_directory=r"E:\\site\\ro",
    en_directory=r"E:\\site\\en",
    base_url="https://neculaifantanaru.com",
    dry_run=True,
)
  1. CLI one step at a time
html-intersection fix-canonicals --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com
html-intersection fix-flags      --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com
html-intersection sync           --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com

# Scan with detailed report
html-intersection scan           --ro-dir "E:\\site\\ro" --en-dir "E:\\site\\en" --base-url https://neculaifantanaru.com --report

How the logic works (3 steps)

  1. Canonicals: set canonical to exact file name (case-sensitive); RO → https://.../<name>.html, EN → https://.../en/<Name>.html.
  2. FLAGS = canonical in the same file: RO uses cunt_code="+40"; EN uses cunt_code="+1".
  3. Cross-references RO↔EN: in ro/<name>.html the +1 link points to the paired en/<Name>.html; in en/<Name>.html the +40 link points to the paired ro/<name>.html.

Notes on robustness

  • The matching for +40 and +1 accepts both "+40" and "\+40" (and similarly for +1).
  • Accidental ...html.html is normalized to ...html when comparing and fixing.
  • scan --report surfaces invalid links, mismatched pairs with no common links, and files left unmatched.

Windows install and build

# Create and activate venv
py -m venv .venv
.\.venv\Scripts\Activate.ps1

# Install build tooling
py -m pip install --upgrade pip build twine

# Build the wheel and sdist
py -m build

# Upload to TestPyPI (recommended first)
$env:TWINE_USERNAME = "__token__"
$env:TWINE_PASSWORD = "pypi-<YOUR_TESTPYPI_TOKEN>"
py -m twine upload --repository testpypi dist/*

# Upload to PyPI (when ready)
$env:TWINE_USERNAME = "__token__"
$env:TWINE_PASSWORD = "pypi-<YOUR_PYPI_TOKEN>"
py -m twine upload dist/*

Notes

  • Files are written UTF-8; the reader tries utf-8, latin1, cp1252, iso-8859-1.
  • You can pass backup_ext=".bak" to keep a backup of modified files.
  • The library aims to follow the precise, case-sensitive flow in your instructions.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_intersection-0.2.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

html_intersection-0.2.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file html_intersection-0.2.0.tar.gz.

File metadata

  • Download URL: html_intersection-0.2.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for html_intersection-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b0a81891116a4b3ee46f956e69c23d2a6abf793d2c4642a5a9956ca142e9527d
MD5 b11870c479c0071d8d9bda33a67afe4b
BLAKE2b-256 5542418526a4fd90d4689b983b8a9f2d12f561bb1ca22b6d7b60a448ecdf3fc9

See more details on using hashes here.

File details

Details for the file html_intersection-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for html_intersection-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 098b6b95a4f8f48824c11cba4b66b131ba767e98e367d1d61b1dfcc96fd20b0a
MD5 f12666a234fb3c7340950c91fa94e5bc
BLAKE2b-256 4f5b1e964093ca58380e202f01fae884b918ce512ab962c0ea8ade1974524f71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page