Skip to main content

Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.

Project description

Regex-Toolkit

Regex-Toolkit: Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.

Requirements:

Regex-Toolkit requires Python 3.9 or higher, is platform independent, and has no outside dependencies.

Issue reporting

If you discover an issue with Regex-Toolkit, please report it at https://github.com/Phosmic/regex-toolkit/issues.

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.


Installing

Most stable version from PyPi:

pip install regex-toolkit

Development version from GitHub:

git clone git+https://github.com/Phosmic/regex-toolkit.git
cd regex-toolkit
pip install .

Usage

Import packages:

import re
# and/or
import re2
# Can import directly if desired
import regex_toolkit as rtk

Library

iter_sort_by_len

Function to iterate strings sorted by length.

Function Signature
iter_sort_by_len(package_name, *, reverse=False)
Parameters
texts(Iterable[str]) Strings to sort.
reverse(int) Sort in descending order (longest to shortest).

Example (ascending shortest to longest):

words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words):
    print(word)

Output:

short
longer
longest

Example reversed (descending longest to shortest):

words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words, reverse=True):
    print(word)

Output:

longest
longer
short

sort_by_len

Function to get a tuple of strings sorted by length.

Function Signature
sort_by_len(package_name, *, reverse=False)
Parameters
texts(Iterable[str]) Strings to sort.
reverse(int) Sort in descending order (longest to shortest).

Example (ascending shortest to longest):

rtk.sort_by_len(["longest", "short", "longer"])

Result:

('short', 'longer', 'longest')

Example reversed (descending longest to shortest):

rtk.sort_by_len(["longest", "short", "longer"], reverse=True)

Result:

('longest', 'longer', 'short')

ord_to_codepoint

Function to get a character codepoint from a character ordinal.

Function Signature
ord_to_codepoint(ordinal)
Parameters
ordinal(int) Character ordinal.

Example:

# ordinal: 127344
ordinal = ord("🅰")
rtk.ord_to_codepoint(ordinal)

Result:

'0001f170'

codepoint_to_ord

Function to get a character ordinal from a character codepoint.

Function Signature
codepoint_to_ord(codepoint)
Parameters
codepoint(str) Character codepoint.

Example:

# char: "🅰"
codepoint = "0001f170"
rtk.codepoint_to_ord(codepoint)

Result:

127344

char_to_codepoint

Function to get a character codepoint from a character.

Function Signature
char_to_codepoint(char)
Parameters
char(str) Character.

Example:

rtk.char_to_codepoint("🅰")

Result:

'0001f170'

char_as_exp

Function to create a RE expression that exactly matches a character.

Function Signature
char_as_exp(char)
Parameters
char(str) Character to match.

Example:

rtk.char_as_exp("🅰")

Result:

r'\🅰'

char_as_exp2

Function to create a RE expression that exactly matches a character.

Function Signature
char_as_exp2(char)
Parameters
char(str) Character to match.

Example:

rtk.char_as_exp2("🅰")

Result:

r'\x{0001f170}'

string_as_exp

Function to create a RE expression that exactly matches a string.

Function Signature
string_as_exp(text)
Parameters
text(str) String to match.

Example:

rtk.string_as_exp("🅰🅱🅲")

Result:

r'\🅰\🅱\🅲'

string_as_exp2

Function to create a RE expression that exactly matches a string.

Function Signature
string_as_exp2(text)
Parameters
text(str) String to match.

Example:

rtk.string_as_exp2("🅰🅱🅲")

Result:

r'\x{0001f170}\x{0001f171}\x{0001f172}'

strings_as_exp

Function to create a RE expression that exactly matches any one string.

Function Signature
strings_as_exp(texts)
Parameters
texts(Iterable[str]) Strings to match.

Example:

rtk.strings_as_exp([
    "bad.word",
    "another-bad-word",
])

Result:

r'another\-bad\-word|bad\.word'

strings_as_exp2

Function to create a RE expression that exactly matches any one string.

Function Signature
strings_as_exp2(texts)
Parameters
texts(Iterable[str]) Strings to match.

Example:

rtk.strings_as_exp2([
    "bad.word",
    "another-bad-word",
])

Result:

r'another\-bad\-word|bad\.word'

iter_char_range

Function to iterate all characters within a range of codepoints (inclusive).

Function
iter_char_range(first_codepoint, second_codepoint)
Parameters
first_codepoint(int) Starting (first) codepoint.
last_codepoint(int) Ending (last) codepoint.

Example:

for char in rtk.iter_char_range("a", "c"):
    print(char)

Output:

a
b
c

char_range

Function to get a tuple of all characters within a range of codepoints (inclusive).

Function
char_range(first_codepoint, second_codepoint)
Parameters
first_codepoint(int) Starting (first) codepoint.
last_codepoint(int) Ending (last) codepoint.

Example:

rtk.char_range("a", "c")

Result:

('a', 'b', 'c')

mask_span

Slice and mask a string using a span.

Function Signature
mask_span(text, span, mask=None)
Parameters
text(str) Text to slice.
span(list[int] | tuple[int, int]) Domain of index positions (start, end) to mask.
mask(str | None) Mask to insert after slicing.

Example:

rtk.mask_span(
    "This is an example",
    (8, 8),
    mask="not ",
)

Result:

'This is not an example'

mask_spans

Slice and mask a string using multiple spans.

Function Signature
mask_spans(text, spans, masks=None)
Parameters
text(str) Text to slice.
spans(Iterable[list[int] | tuple[int, int]]) Domains of index positions (x1, x2) to mask from the text.
masks(Iterable[str] | None) Masks to insert when slicing.

Example:

rtk.mask_spans(
    "This is an example",
    spans=[
        (9, 10),
        (11, 18),
    ],
    masks=[
        " good",
        "sample",
    ],
)

to_utf8

Encode a unicode string to UTF-8 form.

Function Signature
to_utf8(text)
Parameters
text(str) Text to encode.

to_nfc

Normalize a Unicode string to NFC form C.

Function Signature
to_utf8(text)
Parameters
text(str) Text to normalize.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

regex_toolkit-0.0.3.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

regex_toolkit-0.0.3-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file regex_toolkit-0.0.3.tar.gz.

File metadata

  • Download URL: regex_toolkit-0.0.3.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for regex_toolkit-0.0.3.tar.gz
Algorithm Hash digest
SHA256 65d8ec5028f467beafbfdb77d137c655d384f34591485507d7bf133b3a84c1d4
MD5 d9814da75945c91dfdc7dfccce9c58d5
BLAKE2b-256 aa31704d162e2725cbffa4a50fd6faa0a9f184af5bbc35cbe13bee3be93e4ff1

See more details on using hashes here.

File details

Details for the file regex_toolkit-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for regex_toolkit-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 973a0e9b0f10c5804d0a80905d5f3e7924d248f76c0a8311925e2d4b12e2d4a9
MD5 a7461c6ccd46bdbab5b97558bbc13a04
BLAKE2b-256 7f69a10fed5c604278c5e917107ff47cfb117cb926b9b51817fa542eb14c2428

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page