Skip to main content

Python wrapper for Google RE2 library using Cython

Project description

Build CI Status Release CI Status GitHub tag (latest SemVer, including pre-release) https://badge.fury.io/py/pyre2.svg Conda CI Status License Python version version platforms downloads

Summary

pyre2 is a Python extension that wraps Google’s RE2 regular expression library. The RE2 engine compiles (strictly) regular expressions to deterministic finite automata, which guarantees linear-time behavior.

Intended as a drop-in replacement for re. Unicode is supported by encoding to UTF-8, and bytes strings are treated as UTF-8 when the UNICODE flag is given. For best performance, work with UTF-8 encoded bytes strings.

Installation

Normal usage for Linux/Mac/Windows:

$ pip install pyre2

Compiling from source

Requirements for building the C++ extension from the repo source:

  • A build environment with gcc or clang (e.g. sudo apt-get install build-essential)

  • Build tools and libraries: RE2, pybind11, and cmake installed in the build environment.

    • On Ubuntu/Debian: sudo apt-get install build-essential cmake ninja-build python3-dev cython3 pybind11-dev libre2-dev

    • On Gentoo, install dev-util/cmake, dev-python/pybind11, and dev-libs/re2

    • For a venv you can install the pybind11, cmake, and cython packages from PyPI

On MacOS, use the brew package manager:

$ brew install -s re2 pybind11

On Windows use the vcpkg package manager:

$ vcpkg install re2:x64-windows pybind11:x64-windows

You can pass some cmake environment variables to alter the build type or pass a toolchain file (the latter is required on Windows) or specify the cmake generator. For example:

$ CMAKE_GENERATOR="Unix Makefiles" CMAKE_TOOLCHAIN_FILE=clang_toolchain.cmake tox -e deploy

For development, get the source:

$ git clone git://github.com/andreasvc/pyre2.git
$ cd pyre2
$ make install

Platform-agnostic building with conda

An alternative to the above is provided via the conda recipe (use the miniconda installer if you don’t have conda installed already).

Backwards Compatibility

The stated goal of this module is to be a drop-in replacement for re, i.e.:

try:
    import re2 as re
except ImportError:
    import re

That being said, there are features of the re module that this module may never have; these will be handled through fallback to the original re module:

  • lookahead assertions (?!...)

  • backreferences, e.g., \\1 in search pattern

  • possessive quantifiers *+, ++, ?+, {m,n}+

  • atomic groups (?>...)

  • \W and \S not supported inside character classes

On the other hand, unicode character classes are supported (e.g., \p{Greek}). Syntax reference: https://github.com/google/re2/wiki/Syntax

However, there are times when you may want to be notified of a failover. The function set_fallback_notification determines the behavior in these cases:

try:
    import re2 as re
except ImportError:
    import re
else:
    re.set_fallback_notification(re.FALLBACK_WARNING)

set_fallback_notification takes three values: re.FALLBACK_QUIETLY (default), re.FALLBACK_WARNING (raise a warning), and re.FALLBACK_EXCEPTION (raise an exception).

You might also change the fallback module from re (default) to something else, like regex. You can achieve that with the function set_fallback_module:

>>> import re2
>>> re2.set_fallback_notification(re2.FALLBACK_WARNING)
>>> type(re2.compile(r"foo"))
<class 're2.Pattern'>
>>> type(re2.compile(r"foo(?!bar)"))
<stdin>:1: UserWarning: WARNING: Using re module. Reason: invalid perl operator: (?!
<class 're2.FallbackPattern'>
>>> import regex
>>> re2.set_fallback_module(regex)
>>> type(re2.compile(r"foo(?!bar)"))
<stdin>:1: UserWarning: WARNING: Using regex module. Reason: invalid perl operator: (?!
<class 're2.FallbackPattern'>

Documentation

Consult the docstrings in the source code or interactively through ipython or pydoc re2 etc.

Unicode Support

Python bytes and unicode strings are fully supported, but note that RE2 works with UTF-8 encoded strings under the hood, which means that unicode strings need to be encoded and decoded back and forth. There are two important factors:

  • whether a unicode pattern and search string is used (will be encoded to UTF-8 internally)

  • the UNICODE flag: whether operators such as \w recognize Unicode characters.

To avoid the overhead of encoding and decoding to UTF-8, it is possible to pass UTF-8 encoded bytes strings directly but still treat them as unicode:

In [18]: re2.findall(u'\w'.encode('utf8'), u'Mötley Crüe'.encode('utf8'), flags=re2.UNICODE)
Out[18]: ['M', '\xc3\xb6', 't', 'l', 'e', 'y', 'C', 'r', '\xc3\xbc', 'e']
In [19]: re2.findall(u'\w'.encode('utf8'), u'Mötley Crüe'.encode('utf8'))
Out[19]: ['M', 't', 'l', 'e', 'y', 'C', 'r', 'e']

However, note that the indices in Match objects will refer to the bytes string. The indices of the match in the unicode string could be computed by decoding/encoding, but this is done automatically and more efficiently if you pass the unicode string:

>>> re2.search(u'ü'.encode('utf8'), u'Mötley Crüe'.encode('utf8'), flags=re2.UNICODE)
<re2.Match object; span=(10, 12), match='\xc3\xbc'>
>>> re2.search(u'ü', u'Mötley Crüe', flags=re2.UNICODE)
<re2.Match object; span=(9, 10), match=u'\xfc'>

Finally, if you want to match bytes without regard for Unicode characters, pass bytes strings and leave out the UNICODE flag (this will cause Latin 1 encoding to be used with RE2 under the hood):

>>> re2.findall(br'.', b'\x80\x81\x82')
['\x80', '\x81', '\x82']

Performance

Performance is of course the point of this module, so it better perform well. Regular expressions vary widely in complexity, and the salient feature of RE2 is that it behaves well asymptotically. This being said, for very simple substitutions, I’ve found that occasionally python’s regular re module is actually slightly faster. However, when the re module gets slow, it gets really slow, while this module buzzes along.

In the below example, I’m running the data against 8MB of text from the colossal Wikipedia XML file. I’m running them multiple times, being careful to use the timeit module. To see more details, please see the performance script.

Test

Description

# total runs

re time(s)

re2 time(s)

% re time

regex time(s)

% regex time

Findall URI|Email

Find list of ‘([a-zA-Z][a-zA-Z0-9]*)://([^ /]+)(/[^ ]*)?|([^ @]+)@([^ @]+)’

2

6.262

0.131

2.08%

5.119

2.55%

Replace WikiLinks

This test replaces links of the form [[Obama|Barack_Obama]] to Obama.

100

4.374

0.815

18.63%

1.176

69.33%

Remove WikiLinks

This test splits the data by the <page> tag.

100

4.153

0.225

5.43%

0.537

42.01%

Feel free to add more speed tests to the bottom of the script and send a pull request my way!

Current Status

The tests show the following differences with Python’s re module:

  • The $ operator in Python’s re matches twice if the string ends with \n. This can be simulated using \n?$, except when doing substitutions.

  • The pyre2 module and Python’s re may behave differently with nested groups. See tests/test_emptygroups.txt for the examples.

Please report any further issues with pyre2.

Tests

If you would like to help, one thing that would be very useful is writing comprehensive tests for this. It’s actually really easy:

  • Come up with regular expression problems using the regular python ‘re’ module.

  • Write a session in python traceback format Example.

  • Replace your import re with import re2 as re.

  • Save it with as test_<name>.txt in the tests directory. You can comment on it however you like and indent the code with 4 spaces.

Credits

This code builds on the following projects (in chronological order):

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyre2-0.3.10.tar.gz (1.9 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyre2-0.3.10-cp313-cp313-win_amd64.whl (1.5 MB view details)

Uploaded CPython 3.13Windows x86-64

pyre2-0.3.10-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

pyre2-0.3.10-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

pyre2-0.3.10-cp313-cp313-macosx_14_0_arm64.whl (774.4 kB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

pyre2-0.3.10-cp312-cp312-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.12Windows x86-64

pyre2-0.3.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pyre2-0.3.10-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

pyre2-0.3.10-cp312-cp312-macosx_14_0_arm64.whl (775.3 kB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

pyre2-0.3.10-cp311-cp311-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.11Windows x86-64

pyre2-0.3.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pyre2-0.3.10-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

pyre2-0.3.10-cp311-cp311-macosx_14_0_arm64.whl (778.8 kB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

pyre2-0.3.10-cp310-cp310-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.10Windows x86-64

pyre2-0.3.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

pyre2-0.3.10-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

pyre2-0.3.10-cp310-cp310-macosx_14_0_arm64.whl (775.2 kB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file pyre2-0.3.10.tar.gz.

File metadata

  • Download URL: pyre2-0.3.10.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyre2-0.3.10.tar.gz
Algorithm Hash digest
SHA256 91803818ec15e75d0dfdfe588282f28f46fff5eef9a2c893cac95327f443c585
MD5 5df2e0cfeacf664809962746dcba770b
BLAKE2b-256 8c0c860068ea3bbeca4de5b65fb00eaded2170ac5f7086976d08e13ccbe8ab08

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: pyre2-0.3.10-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyre2-0.3.10-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 2771bc0a3a5f3fd1d34fe8ae80debd1fe59a1cdcecdbeb60818bff8deb247e56
MD5 ab0854d39ffb95116840349dd2379d31
BLAKE2b-256 95f3a1f4d363e176d4dc7e2d0838ec3f46c14e46827ec471c059b30480316867

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 528ac7a181a9ab8224df5f1351a63ac8093b7423e613b1d768eaf968955449a8
MD5 7284f95b88bf71c22c5b60f86c2bd974
BLAKE2b-256 d1539f394c9cb13f754571ad2dc8013435841d2eb577939fbc4b724e00dbae64

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c7c0d11a66d992133c401b48da7b33532d8ec20022745f433503becb58302902
MD5 bcd2d8d4e56f070f7c3addc991fd7cdd
BLAKE2b-256 f5f1e8c978966e6c5c518b73db9c2e8973d775086f7adaf8ad7c5411d69f0a0a

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 a54066c20731bf3c4db13f248ec4a55e05a2b2dee712a15d4ccb6d3e348a654d
MD5 02f4dfedffc388e17baa927ff67b7168
BLAKE2b-256 d4afdf2bb3084d19ba83e948d4798ce4401f5b0010a1d2fafbc23f72349f6f71

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: pyre2-0.3.10-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyre2-0.3.10-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 604b0a12520a119948dd74ae053db558313f5812ea841d36edf13d7b852a2e8a
MD5 e18f273aa575b763ad638877424d87ac
BLAKE2b-256 e6663eaf2e50ab93c90c04eb258ddbedfc3fbb8c6c1c398c1c3521f2284f1d4a

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c5afc0d6b4c0dfa9e760c9b98168ab1a441230d729e8b03ebdcce66744c153a1
MD5 8ba61d55b29b1a6a61d0f7bd6be5bd87
BLAKE2b-256 d584ef52fecc6c145b6552dcb98a68ff5f949c18923edc040b12f67e7b41882e

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 3cc9592d5d46b452a0847916c816c0ec85127af50aab3bb043557c85d9c6ec4a
MD5 d2549a2ab936b94a4157f99f89d5186e
BLAKE2b-256 84a85913af6c57733576e431f83bde7f308995e04537844f13bbd4e4e9e7a65a

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 ca09d74121f3c2cf7605c4c560a990c18b78ae2def2d147bf35c45d2342696f8
MD5 cc9f08d047e02849ae7d75b3e754fc1f
BLAKE2b-256 3def5ff115767ec700861e183c88c7324efeb324bfa90bc27cdc7506810f478a

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: pyre2-0.3.10-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyre2-0.3.10-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 99a190fdcffcf46fb3763fffefd3f6643825d9dec941d9b4190be6968bd9e497
MD5 724f63d9ff57ebf85b01903fc178101d
BLAKE2b-256 bad02ebf5b60d368c69bae86dcea754f782eb843e2dd72bb4f9a4c355752be96

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3e03e47cc9f36503502d75fccd8d2a5b7883d5f5941817dd4c82470df55ec533
MD5 de84226f2a6d2e42bb1a9a17c820bc7b
BLAKE2b-256 07d443445250a650009d42fd2056369937e6371dfee5d8d2a024e055d589d807

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 aa7bfbc86c93cc1b5603d91ee1847da722fe5ebcd4d2b201b82ea3d70a6fde68
MD5 a709bfe888eed065d9d26eb00572af60
BLAKE2b-256 e530fe9740dbaf2907fcc2ac03f90773e18cece2a42a39288c9da5a7e9bcb170

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 f6e1ee8205d0ff103e6835841a642220a185478b33d8870e99de850db53e133a
MD5 ea95f8e961e50b3d82c10fc15ba02964
BLAKE2b-256 9927659ba6600363d8e725d886414aa53cc7d5fdfa2e8f09ebaeb27581597a4a

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: pyre2-0.3.10-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyre2-0.3.10-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 73037ab0a221896d3d12dace196d31cd9966c59e639cec33e1adbeeae5bba09a
MD5 b221dfbcbf9bb7eee70034e4dadc23de
BLAKE2b-256 4adf4db2fd22e4874aaa1c888055757625eb909d2d16297f37bcf8408818e4c3

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d1bf2337fe958007623086e6783ff126650032e56733f223fcf987c5812c7af6
MD5 d631d2d1f288317f2e7a87583b8b1e8e
BLAKE2b-256 829eee334a8b7c7c55b24dee8a8103360ab12f918512b2ecf1e9875a23790c2f

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4d5f140e011885a93a1799fc7e51b48e105742e5fd20242f1ba38c34e73deaaf
MD5 c5eff71550bba1898e86fb4cd9c30a4b
BLAKE2b-256 f2f096e8d03eabccf53b6fca5f0e86cee00908cd336f52a77db5ec0ce6a901f7

See more details on using hashes here.

File details

Details for the file pyre2-0.3.10-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyre2-0.3.10-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 29312fbb22b7d3cf3e522c43267098162e98ec8b4fc202c9260e22df671b42e1
MD5 faef6eedf51139bf76166cfc09c04c35
BLAKE2b-256 980179b747ba274b1a8d9915acc5c92e01fa4c815d4bbdc151a4f4c8a22e3e68

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page