Skip to main content

Python bindings for eregex, an advanced regular expression engine inspired by mrab-regex

Project description

eregex (Python bindings)

Python bindings for eregex — an advanced regular expression engine for Rust inspired by mrab-regex (the Python regex module).

This package exposes eregex's full API to Python via PyO3 and ships as a wheel built with maturin. All matching logic runs in compiled Rust; the Python layer is a thin adapter.

Features

  • Named groups, duplicate group names, repeated captures
  • Greedy / lazy / possessive quantifiers, atomic groups (?>...)
  • Variable-length lookbehind / lookahead
  • Inline scoped flags (?i), (?i-m:...)
  • Backreferences \1, \g<name>, (?P=name)
  • Partial / end-anchored matching (find_partial)
  • find, match_at_start (Python re.match), fullmatch (re.fullmatch)
  • replace, replace_all with $1 / ${name} / $$ templates
  • split, escape, and more

Installation

The wheel is built from the Rust core:

cd crates/eregex-python
python -m venv .venv
. .venv/bin/activate          # or .venv\Scripts\activate on Windows
pip install maturin
maturin develop --release     # editable install into the current venv
# or: maturin build --release && pip install target/wheels/eregex-*.whl

maturin develop installs an import eregex module into the active virtual environment (the extension is named eregex).

Quick start

import eregex

re = eregex.Regex(r"(\w+)\s+(\w+)")
m = re.find("hello world")
m.matched        # 'hello world'
m.group(1)       # 'hello'
m.group(2)       # 'world'
m[1]             # 'hello'  (Match is sequence-like)

# Flags: pass a bitset of the module-level constants, or parse a string.
eregex.Regex("hello", eregex.IGNORECASE).is_match("HELLO")  # True
eregex.Regex("hello", eregex.parse_flags("i")).is_match("HELLO")  # True

# Repeated captures (signature mrab-regex feature).
eregex.Regex(r"(\w)+").find("abc").captures(1)  # ['a', 'b', 'c']

# Replace with named groups.
eregex.Regex(r"(?P<a>\d)(?P<b>\d)").replace_all("12 34", "${b}${a}")  # '21 43'

Regex

class Regex:
    def __init__(self, pattern: str, flags: int = 0): ...
    @property
    def pattern(self) -> str: ...
    @property
    def flags(self) -> int: ...          # resolved (UNICODE + VERSION1 added)
    @property
    def capture_count(self) -> int: ...  # excluding group 0

    def group_names(self) -> list[str]: ...
    def group_index(self, name: str) -> int | None: ...

    def is_match(self, haystack: str) -> bool: ...
    def find(self, haystack: str) -> Match | None: ...
    def find_at(self, haystack: str, start: int) -> Match | None: ...
    def match_at_start(self, haystack: str) -> Match | None: ...  # re.match
    def fullmatch(self, haystack: str) -> Match | None: ...
    def findall(self, haystack: str) -> list[Match]: ...
    def find_partial(self, haystack: str) -> PartialMatch | None: ...

    def replace(self, haystack: str, repl: str) -> str: ...
    def replace_all(self, haystack: str, repl: str) -> str: ...
    def split(self, haystack: str) -> list[str]: ...
    def dump(self) -> str: ...                                  # AST debug aid

flags is a bitwise OR of the module-level constants: IGNORECASE, MULTILINE, DOTALL, UNICODE, ASCII, VERBOSE, FULLCASE, WORD, LOCALE, VERSION0, VERSION1. parse_flags("ims") parses a flag string for re-familiar ergonomics.

Match

Match is sequence-like: len(m) is the number of groups (group 0 first), and m[i] / m["name"] look up by index / name.

class Match:
    @property
    def matched(self) -> str               # whole match (group 0)
    @property
    def group0(self) -> str                # alias of matched
    @property
    def input(self) -> str
    @property
    def start(self) -> int                 # byte offset
    @property
    def end(self) -> int
    @property
    def span(self) -> tuple[int, int]
    @property
    def capture_count(self) -> int
    @property
    def groups(self) -> list[str | None]
    @property
    def named_groups(self) -> dict[str, str]
    @property
    def all_captures(self) -> list[list[str | None]]
    @property
    def captures_dict(self) -> dict[str, list[str | None]]

    def group(self, *indices_or_names) -> ...   # re.match.group semantics
    def captures(self, index: int) -> list[str | None]
    def captures_by_name(self, name: str) -> list[str | None]
    def span_of(self, index: int = 0) -> tuple[int, int] | None
    def start_of(self, index: int = 0) -> int
    def end_of(self, index: int = 0) -> int

All offsets are byte offsets (UTF-8), matching Python's re and the Rust core. None is returned for groups that did not participate.

Partial matching

find_partial is an end-anchored search: it asks whether the haystack, taken up to its end, could be the start of a full match. Use it when validating input as the user types, parsing an incomplete stream, or asking "could more input turn this into a match?"

It returns one of three outcomes:

result meaning
PartialMatch (partial) a valid prefix so far — more input could complete it
PartialMatch (full) the input already fully matches (and consumes it to its end)
None a hard mismatch: no possible continuation could match

Each capturing group in a partial match is itself in one of three states, reported by group_state(i): "matched" (fully matched), "partial" (entered but not yet completed), or "none" (never participated — group(i) is None).

class PartialMatch:
    @property
    def status(self) -> str                # "full" | "partial"
    @property
    def is_full(self) -> bool
    @property
    def is_partial(self) -> bool
    @property
    def matched(self) -> str
    @property
    def start(self) -> int                 # byte offset where the match starts
    @property
    def end(self) -> int                   # byte offset of the input end (always len(haystack))
    @property
    def capture_count(self) -> int

    def group(self, index: int = 0) -> str | None
    def named_group(self, name: str) -> str | None
    def group_state(self, index: int = 0) -> str   # "matched" | "partial" | "none"

Incremental typing graduates partialfullNone:

re = eregex.Regex(r"abc")
re.find_partial("")    # None      (nothing started yet)
re.find_partial("a")   # partial   .status == "partial"
re.find_partial("ab")  # partial
re.find_partial("abc") # full      .is_full
re.find_partial("abcd")# None      ('d' rules out any continuation)

Group states as a match fills in. With token=([a-z]+)([0-9]+)([A-Z]+):

re = eregex.Regex(r"token=([a-z]+)([0-9]+)([A-Z]+)")
p = re.find_partial("x token=abc")

p.is_partial           # True
p.matched              # 'token=abc'
p.start                # 2    (byte offset of the match)
p.end                  # 11   (end of input — always, since end-anchored)
p.capture_count        # 3

p.group(1)             # 'abc'   p.group_state(1) # 'matched'
p.group(2)             # ''      p.group_state(2) # 'partial'  (entered, empty so far)
p.group(3)             # None    p.group_state(3) # 'none'     (never entered)

re.find_partial("token=abc123XYZ")  # group 3 -> 'matched', status 'full'
re.find_partial("x token=abc!")     # None   ('!' rules out any continuation)

Named groups work the same way:

re = eregex.Regex(r"token=(?P<word>[a-z]+)(?P<num>[0-9]+)")
p = re.find_partial("token=ab")
p.named_group("word")  # 'ab'   (matched)
p.named_group("num")   # ''     (partial — empty so far)

Module-level helpers

escape(s: str) -> str
escape_special_only(s: str) -> str
escape_literal_spaces(s: str) -> str
is_match(pattern: str, haystack: str) -> bool       # compiles pattern once
compile(pattern: str, flags: int = 0) -> Regex
parse_flags(flag_str: str) -> int

Testing

. .venv/bin/activate
maturin develop --release
python -m unittest test_eregex -v

Layout

This is one half of eregex's binding story. The same Rust core (eregex) also ships Node.js bindings via napi-rs. See the project root for the core crate and its feature matrix.

License

Apache-2.0, matching the upstream mrab-regex project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

eregex-0.1.4-cp39-abi3-win_amd64.whl (230.4 kB view details)

Uploaded CPython 3.9+Windows x86-64

eregex-0.1.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (351.0 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

eregex-0.1.4-cp39-abi3-macosx_11_0_arm64.whl (323.8 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file eregex-0.1.4-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: eregex-0.1.4-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 230.4 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for eregex-0.1.4-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 918e01fa3c7a656f7596d14cf87af1674a05b33b84823df05cdac76fa18bd8c3
MD5 f03ddbc53940c53a2a9af34a629343a5
BLAKE2b-256 714497fba1464f280b27cce8047fb78f9dbe1d9a1eedd62380bff57ec4b0c48f

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.4-cp39-abi3-win_amd64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eregex-0.1.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eregex-0.1.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dada93a751da46c95e437bbfb661e0f2311451d71ad0b03c884f83af5057b900
MD5 3296f74bee297037744c352a265e98a2
BLAKE2b-256 51313b14722c234043a72e3d654df15f2e6f972fb277dde89f5f5136014b9a10

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.4-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eregex-0.1.4-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: eregex-0.1.4-cp39-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 323.8 kB
  • Tags: CPython 3.9+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for eregex-0.1.4-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6fa99097175542a9c1cc50e92941b1e64f2c6040d01d1c917dcfd82c18056f8b
MD5 9a42604e465d4fc28c26a217752ddb86
BLAKE2b-256 5c99cfd7e70a1d18f3a3ab023eff5540ebebca61b6a92fbec0fff81ccda03c63

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.4-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page