Skip to main content

Python bindings for the Rust `regex` crate. This implementation uses finite automata and guarantees linear time matching on all inputs.

Project description

rure is the Python binding Rust’s regex library, which guarantees linear time searching using finite automata. In exchange, it must give up some common regex features such as backreferences and arbitrary lookaround. It does however include capturing groups, lazy matching, Unicode support and word boundary assertions. Its matching semantics generally correspond to Perl’s, or “leftmost first.” Namely, the match locations reported correspond to the first match that would be found by a backtracking engine.

The syntax and possibly other useful things are documented in the Rust API documentation: http://doc.rust-lang.org/regex/regex/index.html

Examples

This package presents 2 entry points to the regex engine: Rure, an OO wrapper of the underlying Rust API, and a drop-in replacement for the stdlib re module (compile, search, match, findall, finditer, RegexObject and MatchObject).

The Rure interface exposes the “pay for what you use” API, enabling you to request the minimum computation you need: does the text match (is_match), where does it match (find, find_iter), and where are the submatches (captures, captures_iter).

The drop-in replacement should be as simple as import rure as re, and using the API as documented in the Python documentation ( https://docs.python.org/3/library/re.html , https://docs.python.org/2/library/re.html). The flags supported by re are automatically translated to those supported by rure. Note that the rure engine is more strict than re, and will reject expressions that contain unnecessary escapes, or use features not supported by the engine.

One important note regarding this shim: the Rust engine operates on byte offsets in the given search text, while Python operates on Unicode code points. The Rust engine guarantees returning offsets that correspond to valid UTF8 segments. By default, the MatchObject that is returned by this library will decode the captured text. The offsets returned by start, end, and span, however, are byte offsets and not character offsets. Using them with the string attribute is safe, so you can do:

>>> email = u"tony@tiremove_thisger.net"
>>> m = re.search(u"remove_this", email)
>>> m.string[:m.start()] + m.string[m.end():].decode('utf8')
u'tony@tiger.net'

This package also includes an is_match(pattern, string, flags=0) function (and corresponding method on RegexObject), that only returns a boolean.

Performance

It’s fast. Its core matching engine is a lazy DFA, which is what GNU grep and RE2 use. Like GNU grep, this regex engine can detect multi byte literals in the regex and will use fast literal string searching to quickly skip through the input to find possible match locations.

All memory usage is bounded and all searching takes linear time with respect to the input string.

For more details, see the PERFORMANCE guide: https://github.com/rust-lang-nursery/regex/blob/master/PERFORMANCE.md

Missing

There are a few things missing from this package that are present in the Rust API. There’s no particular (known) reason why they don’t, they just haven’t been implemented yet.

  • Splitting a string by a regex.

  • Replacing regex matches in a string with some other text.

Install

Binary wheels are provided for MacOS. The specific versions of the Rust compiler, rure and regex crates will be available in the changelog.

Installing from a source tarball requires manually building the Rust rure crate and pointing at the built directory. If you are wanting to take advantage of a modern CPU, it’s likely that you’ll want to build the regex crate with SSE3 and SIMD. To do so, you will need to update the regex/regex-capi/Cargo.toml to include the simd-accel feature: regex = { version = “0.2.2”, path = “..”, features=[“simd-accel”] }.

  • git clone https://github.com/rust-lang-nursery/regex

  • cargo build –release –manifest-path /path/to/regex/regex-capi/Cargo.toml

    • To build with SSE3: RUSTFLAGS=”-C target-feature=+ssse3” cargo build –release –features simd-accel

  • RURE_DIR=/path/to/regex/regex-capi python setup.py bdist_wheel

  • pip install rure –no-index -f ./dist

History

0.2.1 (2019-04-07)

  • Update build pipeline to support multiple Python versions

  • Binary wheels compiled against:

    • Rust: 1.33.0

    • regex: 1.0

    • rure (regex-capi): 0.2.1

0.2.0 (2018-03-04)

  • Add support for RegexSet

0.1.2 (2017-10-09)

  • First release on PyPI

  • Binary wheels compiled against:

    • Rust: 1.20.0

    • regex: 0.2.2

    • rure (regex-capi): 0.2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

rure-0.2.2-py2.py3-none-manylinux1_x86_64.whl (3.7 MB view details)

Uploaded Python 2 Python 3

rure-0.2.2-py2.py3-none-macosx_10_14_x86_64.whl (573.8 kB view details)

Uploaded Python 2 Python 3 macOS 10.14+ x86-64

File details

Details for the file rure-0.2.2-py2.py3-none-manylinux1_x86_64.whl.

File metadata

  • Download URL: rure-0.2.2-py2.py3-none-manylinux1_x86_64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for rure-0.2.2-py2.py3-none-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c99524e4000b105ed0c4a6f51f58139758652f4fb2fba4862eff775f84de1733
MD5 ee5a40d6620e053377e45e4c9440ad0c
BLAKE2b-256 4ffd71c6edde95b174abc4c3bbb295d9df7add605aa32413300db85126bf0339

See more details on using hashes here.

File details

Details for the file rure-0.2.2-py2.py3-none-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: rure-0.2.2-py2.py3-none-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 573.8 kB
  • Tags: Python 2, Python 3, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3

File hashes

Hashes for rure-0.2.2-py2.py3-none-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 f8ccec9bea1651ba2334029b12e1665f0f076fadf6ea66402f0554506c30de6c
MD5 33141a5c5c471e70bc3801f1a895c9bd
BLAKE2b-256 d1759474d8b212dca29a2543f6b92ae80dec17f9da4ca8f6be8cfef5d459b0a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page