Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.
Project description
Regex-Toolkit
Regex-Toolkit: Effortlessly craft efficient RE and RE2 expressions with user-friendly tools.
Requirements:
Regex-Toolkit requires Python 3.9 or higher, is platform independent, and has no outside dependencies.
Issue reporting
If you discover an issue with Regex-Toolkit, please report it at https://github.com/Phosmic/regex-toolkit/issues.
License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Installing
Most stable version from PyPi:
pip install regex-toolkit
Development version from GitHub:
git clone git+https://github.com/Phosmic/regex-toolkit.git
cd regex-toolkit
pip install .
Usage
Import packages:
import re
# and/or
import re2
# Can import directly if desired
import regex_toolkit as rtk
Library
iter_sort_by_len
Function to iterate strings sorted by length.
Function Signature |
---|
iter_sort_by_len(package_name, *, reverse=False) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to sort. |
reverse(int) | Sort in descending order (longest to shortest). |
Example (ascending shortest to longest):
words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words):
print(word)
Output:
short
longer
longest
Example reversed (descending longest to shortest):
words = ["longest", "short", "longer"]
for word in rtk.iter_sort_by_len(words, reverse=True):
print(word)
Output:
longest
longer
short
sort_by_len
Function to get a tuple of strings sorted by length.
Function Signature |
---|
sort_by_len(package_name, *, reverse=False) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to sort. |
reverse(int) | Sort in descending order (longest to shortest). |
Example (ascending shortest to longest):
rtk.sort_by_len(["longest", "short", "longer"])
Result:
('short', 'longer', 'longest')
Example reversed (descending longest to shortest):
rtk.sort_by_len(["longest", "short", "longer"], reverse=True)
Result:
('longest', 'longer', 'short')
ord_to_codepoint
Function to get a character codepoint from a character ordinal.
Function Signature |
---|
ord_to_codepoint(ordinal) |
Parameters | |
---|---|
ordinal(int) | Character ordinal. |
Example:
# ordinal: 127344
ordinal = ord("🅰")
rtk.ord_to_codepoint(ordinal)
Result:
'0001f170'
codepoint_to_ord
Function to get a character ordinal from a character codepoint.
Function Signature |
---|
codepoint_to_ord(codepoint) |
Parameters | |
---|---|
codepoint(str) | Character codepoint. |
Example:
# char: "🅰"
codepoint = "0001f170"
rtk.codepoint_to_ord(codepoint)
Result:
127344
char_to_codepoint
Function to get a character codepoint from a character.
Function Signature |
---|
char_to_codepoint(char) |
Parameters | |
---|---|
char(str) | Character. |
Example:
rtk.char_to_codepoint("🅰")
Result:
'0001f170'
char_as_exp
Function to create a RE expression that exactly matches a character.
Function Signature |
---|
char_as_exp(char) |
Parameters | |
---|---|
char(str) | Character to match. |
Example:
rtk.char_as_exp("🅰")
Result:
r'\🅰'
char_as_exp2
Function to create a RE expression that exactly matches a character.
Function Signature |
---|
char_as_exp2(char) |
Parameters | |
---|---|
char(str) | Character to match. |
Example:
rtk.char_as_exp2("🅰")
Result:
r'\x{0001f170}'
string_as_exp
Function to create a RE expression that exactly matches a string.
Function Signature |
---|
string_as_exp(text) |
Parameters | |
---|---|
text(str) | String to match. |
Example:
rtk.string_as_exp("🅰🅱🅲")
Result:
r'\🅰\🅱\🅲'
string_as_exp2
Function to create a RE expression that exactly matches a string.
Function Signature |
---|
string_as_exp2(text) |
Parameters | |
---|---|
text(str) | String to match. |
Example:
rtk.string_as_exp2("🅰🅱🅲")
Result:
r'\x{0001f170}\x{0001f171}\x{0001f172}'
strings_as_exp
Function to create a RE expression that exactly matches any one string.
Function Signature |
---|
strings_as_exp(texts) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to match. |
Example:
rtk.strings_as_exp([
"bad.word",
"another-bad-word",
])
Result:
r'another\-bad\-word|bad\.word'
strings_as_exp2
Function to create a RE expression that exactly matches any one string.
Function Signature |
---|
strings_as_exp2(texts) |
Parameters | |
---|---|
texts(Iterable[str]) | Strings to match. |
Example:
rtk.strings_as_exp2([
"bad.word",
"another-bad-word",
])
Result:
r'another\-bad\-word|bad\.word'
iter_char_range
Function to iterate all characters within a range of codepoints (inclusive).
Function |
---|
iter_char_range(first_codepoint, second_codepoint) |
Parameters | |
---|---|
first_codepoint(int) | Starting (first) codepoint. |
last_codepoint(int) | Ending (last) codepoint. |
Example:
for char in rtk.iter_char_range("a", "c"):
print(char)
Output:
a
b
c
char_range
Function to get a tuple of all characters within a range of codepoints (inclusive).
Function |
---|
char_range(first_codepoint, second_codepoint) |
Parameters | |
---|---|
first_codepoint(int) | Starting (first) codepoint. |
last_codepoint(int) | Ending (last) codepoint. |
Example:
rtk.char_range("a", "c")
Result:
('a', 'b', 'c')
mask_span
Slice and mask a string using a span.
Function Signature |
---|
mask_span(text, span, mask=None) |
Parameters | |
---|---|
text(str) | Text to slice. |
span(list[int] | tuple[int, int]) | Domain of index positions (start, end) to mask. |
mask(str | None) | Mask to insert after slicing. |
Example:
rtk.mask_span(
"This is an example",
(8, 8),
mask="not ",
)
Result:
'This is not an example'
mask_spans
Slice and mask a string using multiple spans.
Function Signature |
---|
mask_spans(text, spans, masks=None) |
Parameters | |
---|---|
text(str) | Text to slice. |
spans(Iterable[list[int] | tuple[int, int]]) | Domains of index positions (x1, x2) to mask from the text. |
masks(Iterable[str] | None) | Masks to insert when slicing. |
Example:
rtk.mask_spans(
"This is an example",
spans=[
(9, 10),
(11, 18),
],
masks=[
" good",
"sample",
],
)
to_utf8
Encode a unicode string to UTF-8 form.
Function Signature |
---|
to_utf8(text) |
Parameters | |
---|---|
text(str) | Text to encode. |
to_nfc
Normalize a Unicode string to NFC form C.
Function Signature |
---|
to_utf8(text) |
Parameters | |
---|---|
text(str) | Text to normalize. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for regex_toolkit-0.0.2b2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeb7eb374377799dc79cb46a13ae02ca2cb963af8dd57371e09e91f2b47ae233 |
|
MD5 | 6bf81e8a926ddc52db04818049d019c0 |
|
BLAKE2b-256 | 2cfa2c0f1c54f9ec979cdd073ca243776ee15bc0764f7789fcf207bd610660ed |