Skip to main content

Python bindings for eregex, an advanced regular expression engine inspired by mrab-regex

Project description

eregex (Python bindings)

Python bindings for eregex — an advanced regular expression engine for Rust inspired by mrab-regex (the Python regex module).

This package exposes eregex's full API to Python via PyO3 and ships as a wheel built with maturin. All matching logic runs in compiled Rust; the Python layer is a thin adapter.

Features

  • Named groups, duplicate group names, repeated captures
  • Greedy / lazy / possessive quantifiers, atomic groups (?>...)
  • Variable-length lookbehind / lookahead
  • Inline scoped flags (?i), (?i-m:...)
  • Backreferences \1, \g<name>, (?P=name)
  • Partial / end-anchored matching (find_partial)
  • find, match_at_start (Python re.match), fullmatch (re.fullmatch)
  • replace, replace_all with $1 / ${name} / $$ templates
  • split, escape, and more

Installation

The wheel is built from the Rust core:

cd crates/eregex-python
python -m venv .venv
. .venv/bin/activate          # or .venv\Scripts\activate on Windows
pip install maturin
maturin develop --release     # editable install into the current venv
# or: maturin build --release && pip install target/wheels/eregex-*.whl

maturin develop installs an import eregex module into the active virtual environment (the extension is named eregex).

Quick start

import eregex

re = eregex.Regex(r"(\w+)\s+(\w+)")
m = re.find("hello world")
m.matched        # 'hello world'
m.group(1)       # 'hello'
m.group(2)       # 'world'
m[1]             # 'hello'  (Match is sequence-like)

# Flags: pass a bitset of the module-level constants, or parse a string.
eregex.Regex("hello", eregex.IGNORECASE).is_match("HELLO")  # True
eregex.Regex("hello", eregex.parse_flags("i")).is_match("HELLO")  # True

# Repeated captures (signature mrab-regex feature).
eregex.Regex(r"(\w)+").find("abc").captures(1)  # ['a', 'b', 'c']

# Replace with named groups.
eregex.Regex(r"(?P<a>\d)(?P<b>\d)").replace_all("12 34", "${b}${a}")  # '21 43'

Regex

class Regex:
    def __init__(self, pattern: str, flags: int = 0): ...
    @property
    def pattern(self) -> str: ...
    @property
    def flags(self) -> int: ...          # resolved (UNICODE + VERSION1 added)
    @property
    def capture_count(self) -> int: ...  # excluding group 0

    def group_names(self) -> list[str]: ...
    def group_index(self, name: str) -> int | None: ...

    def is_match(self, haystack: str) -> bool: ...
    def find(self, haystack: str) -> Match | None: ...
    def find_at(self, haystack: str, start: int) -> Match | None: ...
    def match_at_start(self, haystack: str) -> Match | None: ...  # re.match
    def fullmatch(self, haystack: str) -> Match | None: ...
    def findall(self, haystack: str) -> list[Match]: ...
    def find_partial(self, haystack: str) -> PartialMatch | None: ...

    def replace(self, haystack: str, repl: str) -> str: ...
    def replace_all(self, haystack: str, repl: str) -> str: ...
    def split(self, haystack: str) -> list[str]: ...
    def dump(self) -> str: ...                                  # AST debug aid

flags is a bitwise OR of the module-level constants: IGNORECASE, MULTILINE, DOTALL, UNICODE, ASCII, VERBOSE, FULLCASE, WORD, LOCALE, VERSION0, VERSION1. parse_flags("ims") parses a flag string for re-familiar ergonomics.

Match

Match is sequence-like: len(m) is the number of groups (group 0 first), and m[i] / m["name"] look up by index / name.

class Match:
    @property
    def matched(self) -> str               # whole match (group 0)
    @property
    def group0(self) -> str                # alias of matched
    @property
    def input(self) -> str
    @property
    def start(self) -> int                 # byte offset
    @property
    def end(self) -> int
    @property
    def span(self) -> tuple[int, int]
    @property
    def capture_count(self) -> int
    @property
    def groups(self) -> list[str | None]
    @property
    def named_groups(self) -> dict[str, str]
    @property
    def all_captures(self) -> list[list[str | None]]
    @property
    def captures_dict(self) -> dict[str, list[str | None]]

    def group(self, *indices_or_names) -> ...   # re.match.group semantics
    def captures(self, index: int) -> list[str | None]
    def captures_by_name(self, name: str) -> list[str | None]
    def span_of(self, index: int = 0) -> tuple[int, int] | None
    def start_of(self, index: int = 0) -> int
    def end_of(self, index: int = 0) -> int

All offsets are byte offsets (UTF-8), matching Python's re and the Rust core. None is returned for groups that did not participate.

Partial matching

find_partial is an end-anchored search: it asks whether the haystack, taken up to its end, could be the start of a full match. Use it when validating input as the user types, parsing an incomplete stream, or asking "could more input turn this into a match?"

It returns one of three outcomes:

result meaning
PartialMatch (partial) a valid prefix so far — more input could complete it
PartialMatch (full) the input already fully matches (and consumes it to its end)
None a hard mismatch: no possible continuation could match

Each capturing group in a partial match is itself in one of three states, reported by group_state(i): "matched" (fully matched), "partial" (entered but not yet completed), or "none" (never participated — group(i) is None).

class PartialMatch:
    @property
    def status(self) -> str                # "full" | "partial"
    @property
    def is_full(self) -> bool
    @property
    def is_partial(self) -> bool
    @property
    def matched(self) -> str
    @property
    def start(self) -> int                 # byte offset where the match starts
    @property
    def end(self) -> int                   # byte offset of the input end (always len(haystack))
    @property
    def capture_count(self) -> int

    def group(self, index: int = 0) -> str | None
    def named_group(self, name: str) -> str | None
    def group_state(self, index: int = 0) -> str   # "matched" | "partial" | "none"

Incremental typing graduates partialfullNone:

re = eregex.Regex(r"abc")
re.find_partial("")    # None      (nothing started yet)
re.find_partial("a")   # partial   .status == "partial"
re.find_partial("ab")  # partial
re.find_partial("abc") # full      .is_full
re.find_partial("abcd")# None      ('d' rules out any continuation)

Group states as a match fills in. With token=([a-z]+)([0-9]+)([A-Z]+):

re = eregex.Regex(r"token=([a-z]+)([0-9]+)([A-Z]+)")
p = re.find_partial("x token=abc")

p.is_partial           # True
p.matched              # 'token=abc'
p.start                # 2    (byte offset of the match)
p.end                  # 11   (end of input — always, since end-anchored)
p.capture_count        # 3

p.group(1)             # 'abc'   p.group_state(1) # 'matched'
p.group(2)             # ''      p.group_state(2) # 'partial'  (entered, empty so far)
p.group(3)             # None    p.group_state(3) # 'none'     (never entered)

re.find_partial("token=abc123XYZ")  # group 3 -> 'matched', status 'full'
re.find_partial("x token=abc!")     # None   ('!' rules out any continuation)

Named groups work the same way:

re = eregex.Regex(r"token=(?P<word>[a-z]+)(?P<num>[0-9]+)")
p = re.find_partial("token=ab")
p.named_group("word")  # 'ab'   (matched)
p.named_group("num")   # ''     (partial — empty so far)

Module-level helpers

escape(s: str) -> str
escape_special_only(s: str) -> str
escape_literal_spaces(s: str) -> str
is_match(pattern: str, haystack: str) -> bool       # compiles pattern once
compile(pattern: str, flags: int = 0) -> Regex
parse_flags(flag_str: str) -> int

Testing

. .venv/bin/activate
maturin develop --release
python -m unittest test_eregex -v

Layout

This is one half of eregex's binding story. The same Rust core (eregex) also ships Node.js bindings via napi-rs. See the project root for the core crate and its feature matrix.

License

Apache-2.0, matching the upstream mrab-regex project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

eregex-0.1.5-cp39-abi3-win_amd64.whl (230.3 kB view details)

Uploaded CPython 3.9+Windows x86-64

eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (350.9 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ x86-64

eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl (323.9 kB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file eregex-0.1.5-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: eregex-0.1.5-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 230.3 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for eregex-0.1.5-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0b68461914abdceea40f97c5e55b235236710348266693bb4fe49561611f77ed
MD5 01d9f88bbc00a7c48c7e7d465a8ace6a
BLAKE2b-256 eb071e4eaa7988aefb20253db67b4196372f622c134e8d774c00c1aaeb175d88

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.5-cp39-abi3-win_amd64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 78038f459a846dcb48bb0bb10c41587fc2f11b01dd6513dbe0ebc6220c523a92
MD5 7846cc93883569c32ccc3b8ec0341587
BLAKE2b-256 c0ee0e92e0c5d351fc6b103d46a90eead81d8e4cb73743e8e0bf26e6c52f4644

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.5-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 323.9 kB
  • Tags: CPython 3.9+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4b9c8368daac8052bf64f271a5f2cdab556549273d816109dd66870a8909889f
MD5 abd0cc68675b2e81104079933f02480e
BLAKE2b-256 adc3fd11573cefabd0419ede5d9f400b8a2d6eb7467a2d5116d5030942d84a8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for eregex-0.1.5-cp39-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on a5i/eregex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page