Skip to main content

Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.

Project description

Regex-Toolkit

Regex-Toolkit: Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.

Requirements:

Regex-Toolkit requires Python 3.9 or higher, is platform independent, and has no outside dependencies.

Issue reporting

If you discover an issue with Regex-Toolkit, please report it at https://github.com/Phosmic/regex-toolkit/issues.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.


Installing

Most stable version from PyPi:

pip install regex-toolkit

Development version from GitHub:

git clone git+https://github.com/Phosmic/regex-toolkit.git
cd regex-toolkit
pip install .

Usage

Import packages:

import re
# and/or
import re2
# Can import directly if desired
import regex_toolkit as rtk

Library

iter_sort_by_len

Function to iterate strings sorted by length.

Function Signature
iter_sort_by_len(package_name, *, reverse=False)
Parameters
texts(Iterable[str]) Strings to sort.
reverse(int) Sort in descending order (longest to shortest).

Example (ascending shortest to longest):

words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words):
    print(word)

Output:

short
longer
longest

Example reversed (descending longest to shortest):

words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words, reverse=True):
    print(word)

Output:

longest
longer
short

sort_by_len

Function to get a tuple of strings sorted by length.

Function Signature
sort_by_len(package_name, *, reverse=False)
Parameters
texts(Iterable[str]) Strings to sort.
reverse(int) Sort in descending order (longest to shortest).

Example (ascending shortest to longest):

rtk.sort_by_len(["longest", "short", "longer"])

Result:

('short', 'longer', 'longest')

Example reversed (descending longest to shortest):

rtk.sort_by_len(["longest", "short", "longer"], reverse=True)

Result:

('longest', 'longer', 'short')

ord_to_codepoint

Function to get a character codepoint from a character ordinal.

Function Signature
ord_to_codepoint(ordinal)
Parameters
ordinal(int) Character ordinal.

Example:

# ordinal: 127344
ordinal = ord("🅰")
rtk.ord_to_codepoint(ordinal)

Result:

'0001f170'

codepoint_to_ord

Function to get a character ordinal from a character codepoint.

Function Signature
codepoint_to_ord(codepoint)
Parameters
codepoint(str) Character codepoint.

Example:

# char: "🅰"
codepoint = "0001f170"
rtk.codepoint_to_ord(codepoint)

Result:

127344

char_to_codepoint

Function to get a character codepoint from a character.

Function Signature
char_to_codepoint(char)
Parameters
char(str) Character.

Example:

rtk.char_to_codepoint("🅰")

Result:

'0001f170'

char_as_exp

Function to create a RE expression that exactly matches a character.

Function Signature
char_as_exp(char)
Parameters
char(str) Character to match.

Example:

rtk.char_as_exp("🅰")

Result:

r'\🅰'

char_as_exp2

Function to create a RE expression that exactly matches a character.

Function Signature
char_as_exp2(char)
Parameters
char(str) Character to match.

Example:

rtk.char_as_exp2("🅰")

Result:

r'\x{0001f170}'

string_as_exp

Function to create a RE expression that exactly matches a string.

Function Signature
string_as_exp(text)
Parameters
text(str) String to match.

Example:

rtk.string_as_exp("🅰🅱🅲")

Result:

r'\🅰\🅱\🅲'

string_as_exp2

Function to create a RE expression that exactly matches a string.

Function Signature
string_as_exp2(text)
Parameters
text(str) String to match.

Example:

rtk.string_as_exp2("🅰🅱🅲")

Result:

r'\x{0001f170}\x{0001f171}\x{0001f172}'

strings_as_exp

Function to create a RE expression that exactly matches any one string.

Function Signature
strings_as_exp(texts)
Parameters
texts(Iterable[str]) Strings to match.

Example:

rtk.strings_as_exp([
    "bad.word",
    "another-bad-word",
])

Result:

r'another\-bad\-word|bad\.word'

strings_as_exp2

Function to create a RE expression that exactly matches any one string.

Function Signature
strings_as_exp2(texts)
Parameters
texts(Iterable[str]) Strings to match.

Example:

rtk.strings_as_exp2([
    "bad.word",
    "another-bad-word",
])

Result:

r'another\-bad\-word|bad\.word'

iter_char_range

Function to iterate all characters within a range of codepoints (inclusive).

Function
iter_char_range(first_codepoint, second_codepoint)
Parameters
first_codepoint(int) Starting (first) codepoint.
last_codepoint(int) Ending (last) codepoint.

Example:

for char in rtk.iter_char_range("a", "c"):
    print(char)

Output:

a
b
c

char_range

Function to get a tuple of all characters within a range of codepoints (inclusive).

Function
char_range(first_codepoint, second_codepoint)
Parameters
first_codepoint(int) Starting (first) codepoint.
last_codepoint(int) Ending (last) codepoint.

Example:

rtk.char_range("a", "c")

Result:

('a', 'b', 'c')

mask_span

Slice and mask a string using a span.

Function Signature
mask_span(text, span, mask=None)
Parameters
text(str) Text to slice.
span(list[int] | tuple[int, int]) Domain of index positions (start, end) to mask.
mask(str | None) Mask to insert after slicing.

Example:

rtk.mask_span(
    "This is an example",
    (8, 8),
    mask="not ",
)

Result:

'This is not an example'

mask_spans

Slice and mask a string using multiple spans.

Function Signature
mask_spans(text, spans, masks=None)
Parameters
text(str) Text to slice.
spans(Iterable[list[int] | tuple[int, int]]) Domains of index positions (x1, x2) to mask from the text.
masks(Iterable[str] | None) Masks to insert when slicing.

Example:

rtk.mask_spans(
    "This is an example",
    spans=[
        (9, 10),
        (11, 18),
    ],
    masks=[
        " good",
        "sample",
    ],
)

to_utf8

Encode a unicode string to UTF-8 form.

Function Signature
to_utf8(text)
Parameters
text(str) Text to encode.

to_nfc

Normalize a Unicode string to NFC form C.

Function Signature
to_utf8(text)
Parameters
text(str) Text to normalize.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regex_toolkit-0.0.2b2.tar.gz (22.8 kB view hashes)

Uploaded Source

Built Distribution

regex_toolkit-0.0.2b2-py3-none-any.whl (21.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page