Skip to main content

Helper functions to syntactically validate strings according to RFC 3987.

Project description

rfc3987-syntax

Helper functions to parse and validate the syntax of terms defined in RFC 3987 — the IETF standard for Internationalized Resource Identifiers (IRIs).

🎯 Purpose

The goal of rfc3987-syntax is to provide a lightweight, permissively licensed Python module for validating that strings conform to the ABNF grammar defined in RFC 3987. These helpers are:

  • ✅ Strictly aligned with the syntax rules of RFC 3987
  • ✅ Built using a permissive MIT license
  • ✅ Designed for both open source and proprietary use
  • ✅ Powered by Lark, a fast, EBNF-based parser

🧠 Note: This project focuses on syntax validation only. RFC 3987 specifies additional semantic rules (e.g., Unicode normalization, BiDi constraints, percent-encoding requirements) that must be enforced separately.

📄 License, Attribution, and Citation

rfc3987-syntax is licensed under the MIT License, which allows reuse in both open source and commercial software.

This project:

  • ❌ Does not depend on the rfc3987 Python package (GPL-licensed)
  • ✅ Uses lark, licensed under MIT
  • ✅ Implements grammar from RFC 3987, using RFC 3986 where RFC 3987 delegates syntax

⚠️ This project is not affiliated with or endorsed by the authors of RFC 3987 or the rfc3987 Python package.

Please cite this software in accordance with the enclosed CITATION.cff file.

⚠️ Limitations

The grammar and parser enforce only the ABNF syntax defined in RFC 3987. The following are not validated and must be handled separately for full compliance:

  • ✅ Unicode Normalization Form C (NFC)
  • ✅ Bidirectional text (BiDi) constraints (RFC 3987 §4.1)
  • Port number ranges (must be 0–65535)
  • ✅ Valid IPv6 compression (only one ::, max segments)
  • ✅ Context-aware percent-encoding requirements

ChatGPT 40 was used during the original development process. Errors may exist due to this assistance. Additional review, testing, and bug fixes by human experts is welcome.

📦 Installation

pip install rfc3987-syntax

🛠 Usage

List all supported "terms" (i.e., non-terminals and terminals within ABNF production rules) used to validate the syntax of an IRI according to RFC 3987

from rfc3987_syntax import RFC3987_SYNTAX_TERMS

print("Supported terms:")
for term in RFC3987_SYNTAX_TERMS:
    print(term)

Syntactically validate a string using the general-purpose validator

from rfc3987_syntax import is_valid_syntax

if is_valid_syntax(term='iri', value='http://github.com'):
    print("✓ Valid IRI syntax")

if not is_valid_syntax(term='iri', value='bob'):
    print("✗ Invalid IRI syntax")

if not is_valid_syntax(term='iri_reference', value='bob'):
    print("✓ Valid IRI-reference syntax")

Alternatively, use term-specific helpers to validate RFC 3987 syntax.

from rfc3987_syntax import is_valid_syntax_iri
from rfc3987_syntax import is_valid_syntax_iri_reference

if is_valid_syntax_iri('http://github.com'):
    print("✓ Valid IRI syntax")

if not is_valid_syntax_iri('bob'):
    print("✗ Invalid IRI syntax")
    
if is_valid_syntax_iri_reference('bob'):
    print("✓ Valid IRI-reference syntax")

Get the Lark parse tree for a syntax validation (useful for additional semantic validation)

from rfc3987_syntax import parse

ptree: ParseTree = parse(term="iri", value="http://github.com")

print(ptree)

📚 Sources

This grammar was derived from:

📝 When RFC 3986 is listed as the source, it is used in accordance with RFC 3987, which explicitly references it for foundational elements.

Rule-to-Source Mapping

Rule/Component Source Notes
iri RFC 3987 Top-level IRI rule
iri_reference RFC 3987 Top-level IRI Reference rule
absolute_iri RFC 3987 Top-level Absolute IRI rule
scheme RFC 3986 Referenced by RFC 3987 §2.2
ihier_part RFC 3987 IRI-specific hierarchy
irelative_ref RFC 3987 IRI-specific relative ref
irelative_part RFC 3987 IRI-specific relative part
iauthority RFC 3986 Standard URI authority
ipath_abempty RFC 3986 Path format variant
ipath_absolute RFC 3986 Absolute path
ipath_noscheme RFC 3986 Path disallowing scheme prefix
ipath_rootless RFC 3986 Used in non-scheme contexts
iquery RFC 3987 Query extension to URI
ifragment RFC 3987 Fragment extension to URI
ipchar, isegment RFC 3986 Path characters and segments
isegment_nz_nc RFC 3987 IRI-specific path constraint
iunreserved RFC 3987 Includes ucschar
ucschar, iprivate RFC 3987 Unicode support
sub_delims RFC 3986 Reserved characters
ip_literal RFC 3986 IPv6 or IPvFuture in []
ipv6address RFC 3986 Expanded forms only
ipvfuture RFC 3986 Forward-compatible
ipv4address RFC 3986 Dotted-decimal IPv4
ls32 RFC 3986 Final 32 bits of IPv6
h16, dec_octet RFC 3986 Hex and decimal chunks
port RFC 3986 Optional numeric
pct_encoded RFC 3986 Percent encoding (e.g. %20)
alpha, digit, hexdig RFC 3986 Character classes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfc3987_syntax-1.1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

rfc3987_syntax-1.1.0-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file rfc3987_syntax-1.1.0.tar.gz.

File metadata

  • Download URL: rfc3987_syntax-1.1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for rfc3987_syntax-1.1.0.tar.gz
Algorithm Hash digest
SHA256 717a62cbf33cffdd16dfa3a497d81ce48a660ea691b1ddd7be710c22f00b4a0d
MD5 b12f9966a7f15414812eb7c55ac13201
BLAKE2b-256 2c0637c1a5557acf449e8e406a830a05bf885ac47d33270aec454ef78675008d

See more details on using hashes here.

File details

Details for the file rfc3987_syntax-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: rfc3987_syntax-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for rfc3987_syntax-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c3d97604e4c5ce9f714898e05401a0445a641cfa276432b0a648c80856f6a3f
MD5 7fddc63551c99ee6e9f096903fcbb79d
BLAKE2b-256 7e7144ce230e1b7fadd372515a97e32a83011f906ddded8d03e3c6aafbdedbb7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page