Skip to main content

Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.

Project description

Regex-Toolkit

Regex-Toolkit: Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.

Requirements:

Regex-Toolkit requires Python 3.9 or higher, is platform independent, and has no outside dependencies.

Issue reporting

If you discover an issue with Regex-Toolkit, please report it at https://github.com/Phosmic/regex-toolkit/issues.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.


Installing

Most stable version from PyPi:

pip install regex-toolkit

Development version from GitHub:

git clone git+https://github.com/Phosmic/regex-toolkit.git
cd regex-toolkit
pip install .

Usage

Import packages:

import re
# and/or
import re2
# Can import directly if desired
import regex_toolkit as rtk

Library

iter_sort_by_len

Function to iterate strings sorted by length.

Function Signature
iter_sort_by_len(package_name, *, reverse=False)
Parameters
texts(Iterable[str]) Strings to sort.
reverse(int) Sort in descending order (longest to shortest).

Example (ascending shortest to longest):

words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words):
    print(word)

Output:

short
longer
longest

Example reversed (descending longest to shortest):

words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words, reverse=True):
    print(word)

Output:

longest
longer
short

sort_by_len

Function to get a tuple of strings sorted by length.

Function Signature
sort_by_len(package_name, *, reverse=False)
Parameters
texts(Iterable[str]) Strings to sort.
reverse(int) Sort in descending order (longest to shortest).

Example (ascending shortest to longest):

rtk.sort_by_len(["longest", "short", "longer"])

Result:

('short', 'longer', 'longest')

Example reversed (descending longest to shortest):

rtk.sort_by_len(["longest", "short", "longer"], reverse=True)

Result:

('longest', 'longer', 'short')

ord_to_codepoint

Function to get a character codepoint from a character ordinal.

Function Signature
ord_to_codepoint(ordinal)
Parameters
ordinal(int) Character ordinal.

Example:

# ordinal: 127344
ordinal = ord("🅰")
rtk.ord_to_codepoint(ordinal)

Result:

'0001f170'

codepoint_to_ord

Function to get a character ordinal from a character codepoint.

Function Signature
codepoint_to_ord(codepoint)
Parameters
codepoint(str) Character codepoint.

Example:

# char: "🅰"
codepoint = "0001f170"
rtk.codepoint_to_ord(codepoint)

Result:

127344

char_to_codepoint

Function to get a character codepoint from a character.

Function Signature
char_to_codepoint(char)
Parameters
char(str) Character.

Example:

rtk.char_to_codepoint("🅰")

Result:

'0001f170'

char_as_exp

Function to create a RE expression that exactly matches a character.

Function Signature
char_as_exp(char)
Parameters
char(str) Character to match.

Example:

rtk.char_as_exp("🅰")

Result:

r'\🅰'

char_as_exp2

Function to create a RE expression that exactly matches a character.

Function Signature
char_as_exp2(char)
Parameters
char(str) Character to match.

Example:

rtk.char_as_exp2("🅰")

Result:

r'\x{0001f170}'

string_as_exp

Function to create a RE expression that exactly matches a string.

Function Signature
string_as_exp(text)
Parameters
text(str) String to match.

Example:

rtk.string_as_exp("🅰🅱🅲")

Result:

r'\🅰\🅱\🅲'

string_as_exp2

Function to create a RE expression that exactly matches a string.

Function Signature
string_as_exp2(text)
Parameters
text(str) String to match.

Example:

rtk.string_as_exp2("🅰🅱🅲")

Result:

r'\x{0001f170}\x{0001f171}\x{0001f172}'

strings_as_exp

Function to create a RE expression that exactly matches any one string.

Function Signature
strings_as_exp(texts)
Parameters
texts(Iterable[str]) Strings to match.

Example:

rtk.strings_as_exp([
    "bad.word",
    "another-bad-word",
])

Result:

r'another\-bad\-word|bad\.word'

strings_as_exp2

Function to create a RE expression that exactly matches any one string.

Function Signature
strings_as_exp2(texts)
Parameters
texts(Iterable[str]) Strings to match.

Example:

rtk.strings_as_exp2([
    "bad.word",
    "another-bad-word",
])

Result:

r'another\-bad\-word|bad\.word'

iter_char_range

Function to iterate all characters within a range of codepoints (inclusive).

Function
iter_char_range(first_codepoint, second_codepoint)
Parameters
first_codepoint(int) Starting (first) codepoint.
last_codepoint(int) Ending (last) codepoint.

Example:

for char in rtk.iter_char_range("a", "c"):
    print(char)

Output:

a
b
c

char_range

Function to get a tuple of all characters within a range of codepoints (inclusive).

Function
char_range(first_codepoint, second_codepoint)
Parameters
first_codepoint(int) Starting (first) codepoint.
last_codepoint(int) Ending (last) codepoint.

Example:

rtk.char_range("a", "c")

Result:

('a', 'b', 'c')

mask_span

Slice and mask a string using a span.

Function Signature
mask_span(text, span, mask=None)
Parameters
text(str) Text to slice.
span(list[int] | tuple[int, int]) Domain of index positions (start, end) to mask.
mask(str | None) Mask to insert after slicing.

Example:

rtk.mask_span(
    "This is an example",
    (8, 8),
    mask="not ",
)

Result:

'This is not an example'

mask_spans

Slice and mask a string using multiple spans.

Function Signature
mask_spans(text, spans, masks=None)
Parameters
text(str) Text to slice.
spans(Iterable[list[int] | tuple[int, int]]) Domains of index positions (x1, x2) to mask from the text.
masks(Iterable[str] | None) Masks to insert when slicing.

Example:

rtk.mask_spans(
    "This is an example",
    spans=[
        (9, 10),
        (11, 18),
    ],
    masks=[
        " good",
        "sample",
    ],
)

to_utf8

Encode a unicode string to UTF-8 form.

Function Signature
to_utf8(text)
Parameters
text(str) Text to encode.

to_nfc

Normalize a Unicode string to NFC form C.

Function Signature
to_utf8(text)
Parameters
text(str) Text to normalize.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regex_toolkit-0.0.2b0.tar.gz (22.7 kB view details)

Uploaded Source

Built Distribution

regex_toolkit-0.0.2b0-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file regex_toolkit-0.0.2b0.tar.gz.

File metadata

  • Download URL: regex_toolkit-0.0.2b0.tar.gz
  • Upload date:
  • Size: 22.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for regex_toolkit-0.0.2b0.tar.gz
Algorithm Hash digest
SHA256 e9e2f747ea4b233c7d4dc336d91a3be1fa21a2f3e29b167632652ac4beae4d9e
MD5 701eebfab2e3976d76a7e51fbf7b847e
BLAKE2b-256 0de5b1e7e5efdf30071f5ffc39c7d8f15ed1b66a8cd3db4bd366e39dfd0147ef

See more details on using hashes here.

File details

Details for the file regex_toolkit-0.0.2b0-py3-none-any.whl.

File metadata

File hashes

Hashes for regex_toolkit-0.0.2b0-py3-none-any.whl
Algorithm Hash digest
SHA256 6eb12b889799995ed5fece64f3fbe6061640fd64ef8452a7603817050482bbf6
MD5 72cd8d1df3d41734291590007eed3a87
BLAKE2b-256 32faa69e5003df806cc111901d0db39c1dca3c2a21aaca3811f2644ab4c39aee

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page