Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.
Project description
Regex-Toolkit
Regex-Toolkit: Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.
Requirements:
Regex-Toolkit requires Python 3.9 or higher, is platform independent, and has no outside dependencies.
Issue reporting
If you discover an issue with Regex-Toolkit, please report it at https://github.com/Phosmic/regex-toolkit/issues.
License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Installing
Most stable version from PyPi:
pip install regex-toolkit
Development version from GitHub:
git clone git+https://github.com/Phosmic/regex-toolkit.git
cd regex-toolkit
pip install .
Usage
Import packages:
import re
# and/or
import re2
# Can import directly if desired
import regex_toolkit as rtk
Library
iter_sort_by_len
Function to iterate strings sorted by length.
Function Signature |
---|
iter_sort_by_len(package_name, *, reverse=False) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to sort. |
reverse(int) | Sort in descending order (longest to shortest). |
Example (ascending shortest to longest):
words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words):
print(word)
Output:
short
longer
longest
Example reversed (descending longest to shortest):
words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words, reverse=True):
print(word)
Output:
longest
longer
short
sort_by_len
Function to get a tuple of strings sorted by length.
Function Signature |
---|
sort_by_len(package_name, *, reverse=False) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to sort. |
reverse(int) | Sort in descending order (longest to shortest). |
Example (ascending shortest to longest):
rtk.sort_by_len(["longest", "short", "longer"])
Result:
('short', 'longer', 'longest')
Example reversed (descending longest to shortest):
rtk.sort_by_len(["longest", "short", "longer"], reverse=True)
Result:
('longest', 'longer', 'short')
ord_to_codepoint
Function to get a character codepoint from a character ordinal.
Function Signature |
---|
ord_to_codepoint(ordinal) |
Parameters | |
---|---|
ordinal(int) | Character ordinal. |
Example:
# ordinal: 127344
ordinal = ord("🅰")
rtk.ord_to_codepoint(ordinal)
Result:
'0001f170'
codepoint_to_ord
Function to get a character ordinal from a character codepoint.
Function Signature |
---|
codepoint_to_ord(codepoint) |
Parameters | |
---|---|
codepoint(str) | Character codepoint. |
Example:
# char: "🅰"
codepoint = "0001f170"
rtk.codepoint_to_ord(codepoint)
Result:
127344
char_to_codepoint
Function to get a character codepoint from a character.
Function Signature |
---|
char_to_codepoint(char) |
Parameters | |
---|---|
char(str) | Character. |
Example:
rtk.char_to_codepoint("🅰")
Result:
'0001f170'
char_as_exp
Function to create a RE expression that exactly matches a character.
Function Signature |
---|
char_as_exp(char) |
Parameters | |
---|---|
char(str) | Character to match. |
Example:
rtk.char_as_exp("🅰")
Result:
r'\🅰'
char_as_exp2
Function to create a RE expression that exactly matches a character.
Function Signature |
---|
char_as_exp2(char) |
Parameters | |
---|---|
char(str) | Character to match. |
Example:
rtk.char_as_exp2("🅰")
Result:
r'\x{0001f170}'
string_as_exp
Function to create a RE expression that exactly matches a string.
Function Signature |
---|
string_as_exp(text) |
Parameters | |
---|---|
text(str) | String to match. |
Example:
rtk.string_as_exp("🅰🅱🅲")
Result:
r'\🅰\🅱\🅲'
string_as_exp2
Function to create a RE expression that exactly matches a string.
Function Signature |
---|
string_as_exp2(text) |
Parameters | |
---|---|
text(str) | String to match. |
Example:
rtk.string_as_exp2("🅰🅱🅲")
Result:
r'\x{0001f170}\x{0001f171}\x{0001f172}'
strings_as_exp
Function to create a RE expression that exactly matches any one string.
Function Signature |
---|
strings_as_exp(texts) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to match. |
Example:
rtk.strings_as_exp([
"bad.word",
"another-bad-word",
])
Result:
r'another\-bad\-word|bad\.word'
strings_as_exp2
Function to create a RE expression that exactly matches any one string.
Function Signature |
---|
strings_as_exp2(texts) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to match. |
Example:
rtk.strings_as_exp2([
"bad.word",
"another-bad-word",
])
Result:
r'another\-bad\-word|bad\.word'
iter_char_range
Function to iterate all characters within a range of codepoints (inclusive).
Function |
---|
iter_char_range(first_codepoint, second_codepoint) |
Parameters | |
---|---|
first_codepoint(int) | Starting (first) codepoint. |
last_codepoint(int) | Ending (last) codepoint. |
Example:
for char in rtk.iter_char_range("a", "c"):
print(char)
Output:
a
b
c
char_range
Function to get a tuple of all characters within a range of codepoints (inclusive).
Function |
---|
char_range(first_codepoint, second_codepoint) |
Parameters | |
---|---|
first_codepoint(int) | Starting (first) codepoint. |
last_codepoint(int) | Ending (last) codepoint. |
Example:
rtk.char_range("a", "c")
Result:
('a', 'b', 'c')
mask_span
Slice and mask a string using a span.
Function Signature |
---|
mask_span(text, span, mask=None) |
Parameters | |
---|---|
text(str) | Text to slice. |
span(list[int] | tuple[int, int]) | Domain of index positions (start, end) to mask. |
mask(str | None) | Mask to insert after slicing. |
Example:
rtk.mask_span(
"This is an example",
(8, 8),
mask="not ",
)
Result:
'This is not an example'
mask_spans
Slice and mask a string using multiple spans.
Function Signature |
---|
mask_spans(text, spans, masks=None) |
Parameters | |
---|---|
text(str) | Text to slice. |
spans(Iterable[list[int] | tuple[int, int]]) | Domains of index positions (x1, x2) to mask from the text. |
masks(Iterable[str] | None) | Masks to insert when slicing. |
Example:
rtk.mask_spans(
"This is an example",
spans=[
(9, 10),
(11, 18),
],
masks=[
" good",
"sample",
],
)
to_utf8
Encode a unicode string to UTF-8 form.
Function Signature |
---|
to_utf8(text) |
Parameters | |
---|---|
text(str) | Text to encode. |
to_nfc
Normalize a Unicode string to NFC form C.
Function Signature |
---|
to_utf8(text) |
Parameters | |
---|---|
text(str) | Text to normalize. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file regex_toolkit-0.0.3.tar.gz
.
File metadata
- Download URL: regex_toolkit-0.0.3.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65d8ec5028f467beafbfdb77d137c655d384f34591485507d7bf133b3a84c1d4 |
|
MD5 | d9814da75945c91dfdc7dfccce9c58d5 |
|
BLAKE2b-256 | aa31704d162e2725cbffa4a50fd6faa0a9f184af5bbc35cbe13bee3be93e4ff1 |
File details
Details for the file regex_toolkit-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: regex_toolkit-0.0.3-py3-none-any.whl
- Upload date:
- Size: 21.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 973a0e9b0f10c5804d0a80905d5f3e7924d248f76c0a8311925e2d4b12e2d4a9 |
|
MD5 | a7461c6ccd46bdbab5b97558bbc13a04 |
|
BLAKE2b-256 | 7f69a10fed5c604278c5e917107ff47cfb117cb926b9b51817fa542eb14c2428 |