Skip to main content

Parsing and validation of URIs (RFC 3986) and IRIs (RFC 3987)

Project description

This module provides regular expressions according to RFC 3986 “Uniform Resource Identifier (URI): Generic Syntax” and RFC 3987 “Internationalized Resource Identifiers (IRIs)”, and utilities for composition and relative resolution of references.

API

match (string, rule=’IRI_reference’)

Convenience function for checking if string matches a specific rule.

Returns a match object or None:

>>> assert match('%C7X', 'pct_encoded') is None
>>> assert match('%C7', 'pct_encoded')
>>> assert match('%c7', 'pct_encoded')
parse (string, rule=’IRI_reference’)

Parses string according to rule into a dict of subcomponents.

If rule is None, parse an IRI_reference without validation.

If regex is available, any rule is supported; with re, rule must be ‘IRI_reference’ or some special case thereof (‘IRI’, ‘absolute_IRI’, ‘irelative_ref’, ‘irelative_part’, ‘URI_reference’, ‘URI’, ‘absolute_URI’, ‘relative_ref’, ‘relative_part’).

>>> d = parse('http://tools.ietf.org/html/rfc3986#appendix-A',
...           rule='URI')
>>> assert all([ d['scheme'] == 'http',
...              d['authority'] == 'tools.ietf.org',
...              d['path'] == '/html/rfc3986',
...              d['query'] == None,
...              d['fragment'] == 'appendix-A' ])
compose (**parts)

Returns an URI composed from named parts.

resolve (base, uriref, strict=True, return_parts=False)

Resolves an URI reference relative to a base URI.

Test cases:

>>> base = resolve.test_cases_base
>>> for relative, resolved in resolve.test_cases.items():
...     assert resolve(base, relative) == resolved

If return_parts is True, returns a dict of named parts instead of a string.

Examples:

>>> assert resolve('urn:rootless', '../../name') == 'urn:name'
>>> assert resolve('urn:root/less', '../../name') == 'urn:/name'
>>> assert resolve('http://a/b', 'http:g') == 'http:g'
>>> assert resolve('http://a/b', 'http:g', strict=False) == 'http://a/g'
patterns

A dict of regular expressions with useful group names. Compilable (with regex only) without need for any particular compilation flag.

[bmp_][u]patterns[_no_names]

Alternative versions of patterns. [u]nicode strings without group names for the re module. BMP only for narrow builds.

get_compiled_pattern (rule, flags=0)

Returns a compiled pattern object for a rule name or template string.

Usage for validation:

>>> uri = get_compiled_pattern('^%(URI)s$')
>>> assert uri.match('http://tools.ietf.org/html/rfc3986#appendix-A')
>>> assert not get_compiled_pattern('^%(relative_ref)s$').match('#f#g')
>>> from unicodedata import lookup
>>> smp = 'urn:' + lookup('OLD ITALIC LETTER A')  # U+00010300
>>> assert not uri.match(smp)
>>> m = get_compiled_pattern('^%(IRI)s$').match(smp)

On narrow builds, non-BMP characters are (incorrectly) excluded:

>>> assert NARROW_BUILD == (not m)

For parsing, some subcomponents are captured in named groups (only if regex is available, otherwise see parse):

>>> match = uri.match('http://tools.ietf.org/html/rfc3986#appendix-A')
>>> d = match.groupdict()
>>> if REGEX:
...     assert all([ d['scheme'] == 'http',
...                  d['authority'] == 'tools.ietf.org',
...                  d['path'] == '/html/rfc3986',
...                  d['query'] == None,
...                  d['fragment'] == 'appendix-A' ])

>>> for r in patterns.keys():
...     assert get_compiled_pattern(r)
format_patterns (**names)

Returns a dict of patterns (regular expressions) keyed by rule names for URIs and rule names for IRIs.

See also the module level dicts of patterns, and get_compiled_pattern.

To wrap a rule in a named capture group, pass it as keyword argument: rule_name=’group_name’. By default, the formatted patterns contain no named groups.

Patterns are str instances (be it in python 2.x or 3.x) containing ASCII characters only.

Caveats:

  • with re, named capture groups cannot occur on multiple branches of an alternation

  • with re before python 3.3, \u and \U escapes must be preprocessed (see issue3665)

  • on narrow builds, character ranges beyond BMP are not supported

Dependencies

Some features require regex.

This package’s docstrings are tested on python 2.6, 2.7, and 3.2 to 3.6. Note that in python<=3.2, characters beyond the Basic Multilingual Plane are not supported on narrow builds (see issue12729).

Release notes

version 1.3.8:

  • fixed deprecated escape sequence

version 1.3.6:

  • fixed a bug in IPv6 pattern:

    >>> assert match('::0:0:0:0:0.0.0.0', 'IPv6address')
    

version 1.3.4:

  • allowed for lower case percent encoding

version 1.3.3:

  • fixed a bug in resolve which left “../” at the beginning of some paths

version 1.3.2:

  • convenience function match

  • patterns restricted to the BMP for narrow builds

  • adapted doctests for python 3.3

  • compatibility with python 2.6 (thanks to Thijs Janssen)

version 1.3.1:

  • some re compatibility: get_compiled_pattern, parse

  • dropped regex from setup.py requirements

version 1.3.0:

  • python 3.x compatibility

  • format_patterns

version 1.2.1:

  • compose, resolve

Support

This is free software. You may show your appreciation with a donation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rfc3987-1.3.8.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

rfc3987-1.3.8-py2.py3-none-any.whl (13.4 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file rfc3987-1.3.8.tar.gz.

File metadata

  • Download URL: rfc3987-1.3.8.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.10.0 setuptools/3.3 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.4.3

File hashes

Hashes for rfc3987-1.3.8.tar.gz
Algorithm Hash digest
SHA256 d3c4d257a560d544e9826b38bc81db676890c79ab9d7ac92b39c7a253d5ca733
MD5 b6c4028acdc788a9ba697e1c1d6b896c
BLAKE2b-256 14bbf1395c4b62f251a1cb503ff884500ebd248eed593f41b469f89caa3547bd

See more details on using hashes here.

File details

Details for the file rfc3987-1.3.8-py2.py3-none-any.whl.

File metadata

  • Download URL: rfc3987-1.3.8-py2.py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.10.0 setuptools/3.3 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.4.3

File hashes

Hashes for rfc3987-1.3.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 10702b1e51e5658843460b189b185c0366d2cf4cff716f13111b0ea9fd2dce53
MD5 846284d5da753a8c07830655ca29b6e4
BLAKE2b-256 65d4f7407c3d15d5ac779c3dd34fbbc6ea2090f77bd7dd12f207ccf881551208

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page