Skip to main content

Replace host and domain names in text under various encoding schemes.

Project description

Host Replace

A Python package for replacing hostnames, domains, and IP addresses in text under common encoding schemes.

Features

  • Replace hostnames and IP addresses in text under common encodings (URL, HTML entity) while avoiding partial matches.
  • Replacements maintain the same encoding as the original text.
  • Provides CLI interface and importable module.
  • Supports UTF-8 string and byte inputs.
  • Supports FQDNS, second level domains, unqualified hostnames, and IPv4/IPv6 addresses.

Installation

Requires Python 3.11 or newer.

Install with pip: pip install host-replace

Install from source:

git clone https://github.com/adamreiser/host_replace
cd host_replace
pip install .

Usage

Command-line interface

Transform the following text file using the provided mapping: host-replace -m mappings.json sample.txt --verbose

Engine selection is configurable with --engine regex|automaton|auto (default regex). Use --expected-runs to hint reuse count when --engine auto is selected.

Guidance:

  • Use regex for one-shot or small inputs (lowest startup overhead).
  • Use automaton for larger workloads or when reusing the same replacer repeatedly.
  • Use auto if you want a heuristic choice based on host-map size, input size, and expected reuse.
  • Current auto heuristic is benchmark-informed and intentionally conservative:
    • one-shot (expected_runs=1) chooses regex
    • automaton is considered for larger maps and inputs when reuse is >=2
1. https://web.example.com/path/to/resource?query=param
2. <a href="https&#x3a;&#x2f;&#x2f;boards&#x2e;example&#x2e;com&#x2f;thread&#x2f;123">Discussion Board</a>
3. Redirecting to https%3A%2F%2Fen.us.wiki.example.com%2Fwelcome
4. https://web-1a.example.com/redirect?q=%65%6e%2e%75%73%2e%77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d
5. <meta http-equiv="refresh" content="0; url=https%3A%2F%2Fweb.example.com%2Fhome">
6. Our domain is still example.com and archived wiki will remain at archive.en.us.wiki.example.com.
{
    "web.example.com": "www.example.com",
    "web-1a.example.com": "www-1a.example.com",
    "boards.example.com": "forums.en.us.example.com",
    "en.us.wiki.example.com": "wiki.example.com",
    "us.example.com": "us-east-1.example.net",
    "example.net": "example.org",
    "images.example.com": "cdn.example.org"
}

Output:

INFO: Replacing web.example.com with www.example.com at offset 11
INFO: Replacing boards&#x2e;example&#x2e;com with forums&#x2e;en&#x2e;us&#x2e;example&#x2e;com at offset 91
INFO: Replacing en.us.wiki.example.com with wiki.example.com at offset 195
INFO: Replacing web-1a.example.com with www-1a.example.com at offset 239
INFO: Replacing %65%6e%2e%75%73%2e%77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d with %77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d at offset 269
INFO: Replacing web.example.com with www.example.com at offset 396
1. https://www.example.com/path/to/resource?query=param
2. <a href="https&#x3a;&#x2f;&#x2f;forums&#x2e;en&#x2e;us&#x2e;example&#x2e;com&#x2f;thread&#x2f;123">Discussion Board</a>
3. Redirecting to https%3A%2F%2Fwiki.example.com%2Fwelcome
4. https://www-1a.example.com/redirect?q=%77%69%6b%69%2e%65%78%61%6d%70%6c%65%2e%63%6f%6d
5. <meta http-equiv="refresh" content="0; url=https%3A%2F%2Fwww.example.com%2Fhome">
6. Our domain is still example.com and archived wiki will remain at archive.en.us.wiki.example.com.

API

To use the module in your Python application:

import host_replace

host_map = {
    "web.example.com": "www.example.com",
    "boards.example.com": "forums.example.net"
}

replacer = host_replace.HostnameReplacer(host_map, engine="auto", expected_runs=2)

# Input text (str or bytes)
input_text = "Visit us at https://web.example.com or leave a comment at https://boards.example.com."

# Apply replacements
output_text = replacer.apply_replacements(input_text)

# Output: Visit us at https://www.example.com or leave a comment at https://forums.example.net.
print(output_text)

Limitations

  • Full pre-encoding case preservation is not supported. Matching is case-insensitive for encoded and unencoded hosts, but replacements preserve case based on the matched representation after encoding, which can lead to cosmetic casing differences in some encoded forms.

  • Full case preservation of individual characters is not supported due to its inherent ambiguity. For example, when mapping WWW.example.com to example.org, it's unclear which if any letters should be capitalized.

  • Variations in encoding representation (e.g., "%2F" vs "%2f"; "&#x2f" vs "&#X2f") can lead to inconsistently cased outputs.

  • Does not process binary data beyond exact byte sequence matching. Encodings like base64 are not supported.

  • Hostnames starting with hex codes can be ambiguous when preceded by %. For instance, %00example.com could be interpreted as example.com or 00example.com.

  • Support for Internationalized Domain Names (IDNs) has not been thoroughly tested.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

host_replace-0.2.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

host_replace-0.2.0-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file host_replace-0.2.0.tar.gz.

File metadata

  • Download URL: host_replace-0.2.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for host_replace-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8661b7dd9323776c2fdff9600344b3f4a2ae4fd524c6a43e8c43f81ed64a4509
MD5 c598633a35fffd54bc6b428a51e441fa
BLAKE2b-256 e68004c8c1df3dfdcacfd148dad4862d8489dc6ac34f966515364b6f6dd95393

See more details on using hashes here.

Provenance

The following attestation bundles were made for host_replace-0.2.0.tar.gz:

Publisher: publish.yml on adamreiser/host_replace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file host_replace-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: host_replace-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for host_replace-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ab4d2e3974d38ef56ddb75753817ad5cd9afe61d5c61a675e9e7f539436de2f5
MD5 64d1646e896d725cc1442ee34f12ef78
BLAKE2b-256 61431ad5ee6fd2e201404d28bc61638467c42019c4980962d208b1fbe5f2c3c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for host_replace-0.2.0-py3-none-any.whl:

Publisher: publish.yml on adamreiser/host_replace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page