Skip to main content

Python bindings for eregex, an advanced regular expression engine inspired by mrab-regex

Project description

eregex (Python bindings)

Python bindings for eregex — an advanced regular expression engine for Rust inspired by mrab-regex (the Python regex module).

This package exposes eregex's full API to Python via PyO3 and ships as a wheel built with maturin. All matching logic runs in compiled Rust; the Python layer is a thin adapter.

Features

  • Named groups, duplicate group names, repeated captures
  • Greedy / lazy / possessive quantifiers, atomic groups (?>...)
  • Variable-length lookbehind / lookahead
  • Inline scoped flags (?i), (?i-m:...)
  • Backreferences \1, \g<name>, (?P=name)
  • Partial / end-anchored matching (find_partial)
  • find, match_at_start (Python re.match), fullmatch (re.fullmatch)
  • replace, replace_all with $1 / ${name} / $$ templates
  • split, escape, and more

Installation

The wheel is built from the Rust core:

cd crates/eregex-python
python -m venv .venv
. .venv/bin/activate          # or .venv\Scripts\activate on Windows
pip install maturin
maturin develop --release     # editable install into the current venv
# or: maturin build --release && pip install target/wheels/eregex-*.whl

maturin develop installs an import eregex module into the active virtual environment (the extension is named eregex).

Quick start

import eregex

re = eregex.Regex(r"(\w+)\s+(\w+)")
m = re.find("hello world")
m.matched        # 'hello world'
m.group(1)       # 'hello'
m.group(2)       # 'world'
m[1]             # 'hello'  (Match is sequence-like)

# Flags: pass a bitset of the module-level constants, or parse a string.
eregex.Regex("hello", eregex.IGNORECASE).is_match("HELLO")  # True
eregex.Regex("hello", eregex.parse_flags("i")).is_match("HELLO")  # True

# Repeated captures (signature mrab-regex feature).
eregex.Regex(r"(\w)+").find("abc").captures(1)  # ['a', 'b', 'c']

# Replace with named groups.
eregex.Regex(r"(?P<a>\d)(?P<b>\d)").replace_all("12 34", "${b}${a}")  # '21 43'

Regex

class Regex:
    def __init__(self, pattern: str, flags: int = 0): ...
    @property
    def pattern(self) -> str: ...
    @property
    def flags(self) -> int: ...          # resolved (UNICODE + VERSION1 added)
    @property
    def capture_count(self) -> int: ...  # excluding group 0

    def group_names(self) -> list[str]: ...
    def group_index(self, name: str) -> int | None: ...

    def is_match(self, haystack: str) -> bool: ...
    def find(self, haystack: str) -> Match | None: ...
    def find_at(self, haystack: str, start: int) -> Match | None: ...
    def match_at_start(self, haystack: str) -> Match | None: ...  # re.match
    def fullmatch(self, haystack: str) -> Match | None: ...
    def findall(self, haystack: str) -> list[Match]: ...
    def find_partial(self, haystack: str) -> PartialMatch | None: ...

    def replace(self, haystack: str, repl: str) -> str: ...
    def replace_all(self, haystack: str, repl: str) -> str: ...
    def split(self, haystack: str) -> list[str]: ...
    def dump(self) -> str: ...                                  # AST debug aid

flags is a bitwise OR of the module-level constants: IGNORECASE, MULTILINE, DOTALL, UNICODE, ASCII, VERBOSE, FULLCASE, WORD, LOCALE, VERSION0, VERSION1. parse_flags("ims") parses a flag string for re-familiar ergonomics.

Match

Match is sequence-like: len(m) is the number of groups (group 0 first), and m[i] / m["name"] look up by index / name.

class Match:
    @property
    def matched(self) -> str               # whole match (group 0)
    @property
    def group0(self) -> str                # alias of matched
    @property
    def input(self) -> str
    @property
    def start(self) -> int                 # byte offset
    @property
    def end(self) -> int
    @property
    def span(self) -> tuple[int, int]
    @property
    def capture_count(self) -> int
    @property
    def groups(self) -> list[str | None]
    @property
    def named_groups(self) -> dict[str, str]
    @property
    def all_captures(self) -> list[list[str | None]]
    @property
    def captures_dict(self) -> dict[str, list[str | None]]

    def group(self, *indices_or_names) -> ...   # re.match.group semantics
    def captures(self, index: int) -> list[str | None]
    def captures_by_name(self, name: str) -> list[str | None]
    def span_of(self, index: int = 0) -> tuple[int, int] | None
    def start_of(self, index: int = 0) -> int
    def end_of(self, index: int = 0) -> int

All offsets are byte offsets (UTF-8), matching Python's re and the Rust core. None is returned for groups that did not participate.

PartialMatch

find_partial is end-anchored: the match must consume the input to its end.

class PartialMatch:
    @property
    def status(self) -> str                # "full" | "partial"
    @property
    def is_full(self) -> bool
    @property
    def is_partial(self) -> bool
    @property
    def matched(self) -> str
    @property
    def start(self) -> int
    @property
    def end(self) -> int
    @property
    def capture_count(self) -> int

    def group(self, index: int = 0) -> str | None
    def named_group(self, name: str) -> str | None
    def group_state(self, index: int = 0) -> str   # "matched" | "partial" | "none"
  • None from find_partial → the input cannot be a prefix of any match.
  • status == "partial" → the input is a valid prefix of some full match (more input could complete it).
re = eregex.Regex(r"token=([a-z]+)([0-9]+)")
p = re.find_partial("x token=abc")
p.is_partial           # True
p.group(1)             # 'abc'
p.group_state(1)       # 'matched'
p.group_state(2)       # 'partial'  (entered but not completed)

re.find_partial("x token=abc!")  # None — '!' rules out any continuation

Module-level helpers

escape(s: str) -> str
escape_special_only(s: str) -> str
escape_literal_spaces(s: str) -> str
is_match(pattern: str, haystack: str) -> bool       # compiles pattern once
compile(pattern: str, flags: int = 0) -> Regex
parse_flags(flag_str: str) -> int

Testing

. .venv/bin/activate
maturin develop --release
python -m unittest test_eregex -v

Layout

This is one half of eregex's binding story. The same Rust core (eregex) also ships Node.js bindings via napi-rs. See the project root for the core crate and its feature matrix.

License

Apache-2.0, matching the upstream mrab-regex project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

eregex-0.1.3-cp39-abi3-win_amd64.whl (229.6 kB view details)

Uploaded CPython 3.9+Windows x86-64

eregex-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (350.3 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

eregex-0.1.3-cp39-abi3-macosx_11_0_arm64.whl (323.3 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file eregex-0.1.3-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: eregex-0.1.3-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 229.6 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for eregex-0.1.3-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 a1688e91da6668955dbe49fba1ec54ed2757c0936829088273d5ed05b5016632
MD5 8a5d0ab49aa782177935ce8f270f424b
BLAKE2b-256 1c13b6ce607c5d5653ad5c4b49072a7697b1dc696b887b47a0ae5348791fcac0

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.3-cp39-abi3-win_amd64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eregex-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eregex-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f9fb2401ac618fe9fb95b2861426ef73354fdee950c5f6b9da5bc704553f0765
MD5 0a108fa749dd2fa1d1365646cc3b75ad
BLAKE2b-256 93b720ae0b93257493c78f1beba29935d9084ed5744946d9639f6496b139fe5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.3-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eregex-0.1.3-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: eregex-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 323.3 kB
  • Tags: CPython 3.9+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for eregex-0.1.3-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f120d628c09cb464031bd8d5b9dcdf913ab9d54a31ae5acaba779d27416b9104
MD5 20800e855b8b06215bd71503d9b92f8b
BLAKE2b-256 8ed7f67a432d07e5a658d4c71037bb39d44f3f91974979a7666a64e655705385

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.3-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page