Parsing and validation of URIs (RFC 3986) and IRIs (RFC 3987)
Project description
This module provides regular expressions according to RFC 3986 “Uniform Resource Identifier (URI): Generic Syntax” and RFC 3987 “Internationalized Resource Identifiers (IRIs)”, and utilities for composition and relative resolution of references.
API
- match (string, rule=’IRI_reference’)
Convenience function for checking if string matches a specific rule.
Returns a match object or None:
>>> assert match('%C7X', 'pct_encoded') is None >>> assert match('%C7', 'pct_encoded') >>> assert match('%c7', 'pct_encoded')
- parse (string, rule=’IRI_reference’)
Parses string according to rule into a dict of subcomponents.
If rule is None, parse an IRI_reference without validation.
If regex is available, any rule is supported; with re, rule must be ‘IRI_reference’ or some special case thereof (‘IRI’, ‘absolute_IRI’, ‘irelative_ref’, ‘irelative_part’, ‘URI_reference’, ‘URI’, ‘absolute_URI’, ‘relative_ref’, ‘relative_part’).
>>> d = parse('http://tools.ietf.org/html/rfc3986#appendix-A', ... rule='URI') >>> assert all([ d['scheme'] == 'http', ... d['authority'] == 'tools.ietf.org', ... d['path'] == '/html/rfc3986', ... d['query'] == None, ... d['fragment'] == 'appendix-A' ])
- compose (**parts)
Returns an URI composed from named parts.
- resolve (base, uriref, strict=True, return_parts=False)
Resolves an URI reference relative to a base URI.
>>> base = resolve.test_cases_base >>> for relative, resolved in resolve.test_cases.items(): ... assert resolve(base, relative) == resolved
If return_parts is True, returns a dict of named parts instead of a string.
Examples:
>>> assert resolve('urn:rootless', '../../name') == 'urn:name' >>> assert resolve('urn:root/less', '../../name') == 'urn:/name' >>> assert resolve('http://a/b', 'http:g') == 'http:g' >>> assert resolve('http://a/b', 'http:g', strict=False) == 'http://a/g'
- patterns
A dict of regular expressions with useful group names. Compilable (with regex only) without need for any particular compilation flag.
- [bmp_][u]patterns[_no_names]
Alternative versions of patterns. [u]nicode strings without group names for the re module. BMP only for narrow builds.
- get_compiled_pattern (rule, flags=0)
Returns a compiled pattern object for a rule name or template string.
Usage for validation:
>>> uri = get_compiled_pattern('^%(URI)s$') >>> assert uri.match('http://tools.ietf.org/html/rfc3986#appendix-A') >>> assert not get_compiled_pattern('^%(relative_ref)s$').match('#f#g') >>> from unicodedata import lookup >>> smp = 'urn:' + lookup('OLD ITALIC LETTER A') # U+00010300 >>> assert not uri.match(smp) >>> m = get_compiled_pattern('^%(IRI)s$').match(smp)
On narrow builds, non-BMP characters are (incorreclty) excluded:
>>> assert NARROW_BUILD == (not m)
For parsing, some subcomponents are captured in named groups (only if regex is available, otherwise see parse):
>>> match = uri.match('http://tools.ietf.org/html/rfc3986#appendix-A') >>> d = match.groupdict() >>> if REGEX: ... assert all([ d['scheme'] == 'http', ... d['authority'] == 'tools.ietf.org', ... d['path'] == '/html/rfc3986', ... d['query'] == None, ... d['fragment'] == 'appendix-A' ]) >>> for r in patterns.keys(): ... assert get_compiled_pattern(r)
- format_patterns (**names)
Returns a dict of patterns (regular expressions) keyed by rule names for URIs and rule names for IRIs.
See also the module level dicts of patterns, and get_compiled_pattern.
To wrap a rule in a named capture group, pass it as keyword argument: rule_name=’group_name’. By default, the formatted patterns contain no named groups.
Patterns are str instances (be it in python 2.x or 3.x) containing ASCII characters only.
Caveats:
Dependencies
Some features require regex.
This package is tested on python 2.6, 2.7, 3.2, 3.3, 3.4 and 3.5. Note that in python<=3.2, characters beyond the Basic Multilingual Plane are not supported on narrow builds (see issue12729).
What’s new
version 1.3.4:
allowed for lower case percent encoding
version 1.3.3:
fixed a bug in resolve which left “../” at the begining of some paths
version 1.3.2:
convenience function match
patterns restricted to the BMP for narrow builds
adapted doctests for python 3.3
compatibility with python 2.6 (thanks to Thijs Janssen)
version 1.3.1:
version 1.3.0:
python 3.x compatibility
format_patterns
version 1.2.1:
compose, resolve
Support
This is free software. You may show your appreciation with a donation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for rfc3987-1.3.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bcc47e4fb15b5215c244c8255cc02490f70ab166b05506174eb31a405626c26 |
|
MD5 | a9c5ef8c3a988f1900c6bab2eb041718 |
|
BLAKE2b-256 | 53bc4a7bdd89bdb1a1e8e5798799920763cc996fd9f1211e0fba0c8a909a7216 |