Parsing and validation of URIs (RFC 3986) and IRIs (RFC 3987)
Project description
This module provides regular expressions according to RFC 3986 “Uniform Resource Identifier (URI): Generic Syntax” and RFC 3987 “Internationalized Resource Identifiers (IRIs)”, and utilities for composition and relative resolution of references.
Tested on python 2.7 and 3.2. Some features require regex.
API
- get_compiled_pattern (rule, flags=0)
Returns a compiled pattern object for a rule name or template string.
Usage for validation:
>>> uri = get_compiled_pattern('^%(URI)s$') >>> assert uri.match('http://tools.ietf.org/html/rfc3986#appendix-A') >>> from unicodedata import lookup >>> smp = 'urn:' + lookup('OLD ITALIC LETTER A') # U+00010300 >>> assert not uri.match(smp) >>> assert get_compiled_pattern('^%(IRI)s$').match(smp) >>> assert not get_compiled_pattern('^%(relative_ref)s$').match('#f#g')
For parsing, some subcomponents are captured in named groups (only if regex is available, otherwise see parse):
>>> match = uri.match('http://tools.ietf.org/html/rfc3986#appendix-A') >>> d = match.groupdict() >>> if REGEX: ... assert all([ d['scheme'] == 'http', ... d['authority'] == 'tools.ietf.org', ... d['path'] == '/html/rfc3986', ... d['query'] == None, ... d['fragment'] == 'appendix-A' ])
- parse (string, rule=’IRI_reference’)
Parses string according to rule into a dict of subcomponents.
If regex is available, any rule is supported; with re, rule must be ‘IRI_reference’ or some special case thereof (‘IRI’, ‘absolute_IRI’, ‘irelative_ref’, ‘irelative_part’, ‘URI_reference’, ‘URI’, ‘absolute_URI’, ‘relative_ref’, ‘relative_part’).
>>> d = parse('http://tools.ietf.org/html/rfc3986#appendix-A', ... rule='URI') >>> assert all([ d['scheme'] == 'http', ... d['authority'] == 'tools.ietf.org', ... d['path'] == '/html/rfc3986', ... d['query'] == None, ... d['fragment'] == 'appendix-A' ])
- format_patterns (**names)
Returns a dict of patterns (regular expressions) keyed by rule names for URIs and rule names for IRIs.
See also the module level dict patterns, and get_compiled_pattern.
To wrap a rule in a named capture group, pass it as keyword argument: rule_name=’group_name’. By default, the formatted patterns contain no named groups.
Patterns are str instances (be it in python 2.x or 3.x) containing ASCII characters only.
Note that, if compiling with the standard library re module:
\u and \U escapes must be preprocessed (see issue3665):
>>> import re, sys, ast >>> re.compile(patterns['ucschar']) #doctest:+IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... File "/usr/lib/python2.6/re.py", line 245, in _compile raise error, v # invalid expression error: bad character range >>> tpl = 'u"%s"' if sys.version_info[0] < 3 else '"%s"' >>> utext_pattern = ast.literal_eval(tpl % patterns['ucschar']) >>> assert re.compile(utext_pattern)
named capture groups cannot occur on multiple branches of an alternation:
>>> re.compile(patterns['path']) #doctest:+IGNORE_EXCEPTION_DETAIL Traceback (most recent call last): ... File "/usr/lib/python2.6/re.py", line 245, in _compile raise error, v # invalid expression error: redefinition of group name 'path' as group 2; was group 1 >>> pat = format_patterns(path='outermost_group_name')['path'] >>> assert re.compile(pat)
- patterns
A dict of regular expressions with useful group names. Compilable (with regex only) without need for any particular compilation flag.
- compose (**parts)
Returns an URI composed from named parts.
- resolve (base, uriref, strict=True, return_parts=False)
Resolves an URI reference relative to a base URI.
>>> base = resolve.test_cases_base >>> for relative, resolved in resolve.test_cases.items(): ... assert resolve(base, relative) == resolved
If return_parts is True, returns a dict of named parts instead of a string.
What’s new
version 1.3.1:
version 1.3.0:
python 3.x compatibility
format_patterns
version 1.2.1:
compose, resolve
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.