A simple tokenizer operating on enums with a decent amount of configuration
Project description
Crossandra
Crossandra is a fast and simple tokenization library for Python operating on enums and regular expressions, with a decent amount of configuration.
Installation
Crossandra is available on PyPI and can be installed with pip, or any other Python package manager:
$ pip install crossandra
(Some systems may require you to use pip3
, python -m pip
, or py -m pip
instead)
License
Crossandra is licensed under the MIT License.
Reference
Crossandra
Crossandra(
token_source: type[Enum] = Empty,
*,
ignore_whitespace: bool = False,
ignored_characters: str = "",
suppress_unknown: bool = False,
rules: list[Rule | RuleGroup] | None = None
)
token_source
: an enum containing all possible tokens (defaults to an empty enum)ignore_whitespace
: whether spaces, tabs, newlines etc. should be ignoredignored_characters
: characters to skip during tokenizationsuppress_unknown
: whether unknown tokens should continue without throwing an errorrules
: a list of additional rules to use
The enum takes priority over the rule list.
When all tokens are of length 1 and there are no additional rules, Crossandra will use a simpler tokenization method (the so called Fast Mode).
Example: Tokenizing noisy Brainfuck code (tested on MacBook Air M1 (256/16) with pure Python wheels)
# Setup
from random import choices
from string import punctuation
program = "".join(choices(punctuation, k=...))
k | Default | Fast Mode | Speedup |
---|---|---|---|
10 | 0.00004s | 0.00002s | 2x |
100 | 0.00016s | 0.00003s | 5.3x |
1000 | 0.0015s | 0.00013s | 11.5x |
10000 | 0.014s | 0.0009s | 15.6x |
100000 | 0.29s | 0.009s | 32.2x |
Rule
Rule[T](
pattern: str,
converter: Callable[[str], T] | bool = True,
flags: RegexFlag | int = 0
)
Used for defining custom rules. pattern
is a regex pattern to match (flags
can be supplied).
When converter
is a callable, it's used on the matched substring.
When converter
is True
, it will directly return the matched substring.
When converter
is False
, it will not include the matched substring in the token list.
RuleGroup
RuleGroup(rules: tuple[Rule[Any], ...])
Used for storing multiple Rules in one object. Can be constructed by ORing two or more Rules.
common
The common
submodule is a collection of commonly used patterns.
Rules:
- CHAR (e.g.
'h'
) - LETTER (e.g.
m
) - WORD (e.g.
ball
) - SINGLE_QUOTED_STRING (e.g.
'nice fish'
) - DOUBLE_QUOTED_STRING (e.g.
"hello there"
) - C_NAME (e.g.
crossandra_rocks
) - NEWLINE (
\n
;\r\n
is converted to\n
before tokenization) - DIGIT (e.g.
7
) - HEXDIGIT (e.g.
c
) - DECIMAL (e.g.
3.14
) - INT (e.g.
2137
) - SIGNED_INT (e.g.
-1
) - FLOAT (e.g.
1e3
) - SIGNED_FLOAT (e.g.
+4.3
)
RuleGroups:
- STRING (
SINGLE_QUOTED_STRING | DOUBLE_QUOTED_STRING
) - NUMBER (
INT | FLOAT
) - SIGNED_NUMBER (
SIGNED_INT | SIGNED_FLOAT
)
Examples
from enum import Enum
from crossandra import Crossandra
class Brainfuck(Enum):
ADD = "+"
SUB = "-"
LEFT = "<"
RIGHT = ">"
READ = ","
WRITE = "."
BEGIN_LOOP = "["
END_LOOP = "]"
bf = Crossandra(Brainfuck, suppress_unknown=True)
print(*bf.tokenize("cat program: ,[.,]"), sep="\n")
# Brainfuck.READ
# Brainfuck.BEGIN_LOOP
# Brainfuck.WRITE
# Brainfuck.READ
# Brainfuck.END_LOOP
from crossandra import Crossandra, Rule, common
def hex2rgb(hex_color: str) -> tuple[int, int, int]:
r, g, b = (int(hex_color[i:i+2], 16) for i in range(1, 6, 2))
return r, g, b
t = Crossandra(
ignore_whitespace=True,
rules=[
Rule(r"#[0-9a-fA-F]+", hex2rgb),
common.WORD
]
)
text = "My favorite color is #facade"
print(t.tokenize(text))
# ['My', 'favorite', 'color', 'is', (250, 202, 222)]
# Supporting Samarium's numbers and arithmetic operators
from enum import Enum
from crossandra import Crossandra, Rule
def sm_int(string: str) -> int:
return int(string.replace("/", "1").replace("\\", "0"), 2)
class Op(Enum):
ADD = "+"
SUB = "-"
MUL = "++"
DIV = "--"
POW = "+++"
MOD = "---"
sm = Crossandra(
Op,
ignore_whitespace=True,
rules=[Rule(r"(?:\\|/)+", sm_int)]
)
print(*sm.tokenize(r"//\ ++ /\\/ --- /\/\/ - ///"))
# 6 Op.MUL 9 Op.MOD 21 Op.SUB 7
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for crossandra-1.3.0-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddb1173af1045a7188f14187cf584edc670acbff6068f87b516cd2d32aad8a9e |
|
MD5 | 0ac22824faeee3683895c200150de07f |
|
BLAKE2b-256 | 7a622081051c5f10064b45aae7b9636d7802548a0d1a94f223b6500037f55acf |
Hashes for crossandra-1.3.0-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6f5e911168564fff2d390331c6137de0fb52599ce439f378bc1c6aa8d52e42c3 |
|
MD5 | 4b5a996ead44c5ae15a5475cd989e433 |
|
BLAKE2b-256 | 6b0dcb28a03ec9795c204aeb45ee0b3a276eb72bc4e5cf56e51d3ad5575495d8 |
Hashes for crossandra-1.3.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f513f1d472f70f1b7e06d95a2b8737b2e2f745fa0404639ce35f2ccb469be8e |
|
MD5 | 4d8064a426215df15ad1bbbf768c402c |
|
BLAKE2b-256 | bfc257eaf92020272b3fbdf4ebb203a8035a79f076cf6a6bb06273ed9c5752bc |
Hashes for crossandra-1.3.0-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7d8beb0409fbadfc8824244d53a55cd86636d7f47806c0af796f04386b74067 |
|
MD5 | 1b248cd418fc406e4703da061378c1a9 |
|
BLAKE2b-256 | 328ecd98e89e6f6865ca9ba2c4824015ae1cf9612d5fdb1b28f069449a23a3a2 |
Hashes for crossandra-1.3.0-cp311-cp311-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | edd7be66537e919638fcd14a5b07b11343eb7445f6d53132271ce68ddc41e768 |
|
MD5 | 6357067f914df0c2bdbf031893dafc97 |
|
BLAKE2b-256 | 226e7a1f963604cd3b9f7a7f3c26bb0f63eacb5cd9a37327d27bb02901a45403 |
Hashes for crossandra-1.3.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7170ff1d46f888c4740e25e5da6b6db5ff332756bceec145898d708008b9d915 |
|
MD5 | 865536cee679ecac740e7e518bcd5fa8 |
|
BLAKE2b-256 | c5cd038745866791c6b768a41e646a1a4e68b07c0c539c127b8bf68d92a14c6c |
Hashes for crossandra-1.3.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d99c0f30181f0c7794d01d55cf35b188c78665dfa5536928f9bf9febefdd2ee3 |
|
MD5 | 75127d4ecf119efc40ba32857a02b168 |
|
BLAKE2b-256 | b9f338e2b94b40499481ce863f5fc790cfab55a8a467d9824afcb605c3c54f5e |
Hashes for crossandra-1.3.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79c4043dac0217c666602720945ffd8325b78ccff33528170a968bf5cb747058 |
|
MD5 | 22bd6d69083280380405c377afa2cbd1 |
|
BLAKE2b-256 | 8ba5d916a15b423a6c8200d507d91afb992fe0b6c51645ab1b31819665c0d40a |
Hashes for crossandra-1.3.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3ab3df250cbebee973ccb6b7c48d3eb140269575f99e2a5e4a079751c942352 |
|
MD5 | 5b410d4d4191e4d9bf729ba4be671a8b |
|
BLAKE2b-256 | cfbfc219f3624050e32c1142d617b407fc9bfba231325725f0e615786d3ff84f |
Hashes for crossandra-1.3.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1552faf6be32c291ea5087a3a793a87bdf40e52464f50a361b49845aa62a97b |
|
MD5 | 77fff07803512e21a5c732491be8c86c |
|
BLAKE2b-256 | abcc163d8c656d1d78fb0e7475d715f8544132af0a00e05caaf0cd88e3eb1b8d |
Hashes for crossandra-1.3.0-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6c4aa95a7ec6e0a22924c3183578d683c0b2309a2aafd88a9e553d3574009ed |
|
MD5 | 52c72049eeb2c5fdd656dfb040703b81 |
|
BLAKE2b-256 | 2321c1582beb1ac85a476ee6f1035ee467eb519cc20fb4b248ea3384fd8717f3 |
Hashes for crossandra-1.3.0-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79d0864a2fb472f883f781de0def6fb643c90f34b48fce0dc5e373cc8a5b96ac |
|
MD5 | 74759faf1cafaf88839d87004df51f0a |
|
BLAKE2b-256 | 3b5d19b668ecca1fa9b003cf382e4e37722cc8f90006bad9a42e489cdf116ca6 |
Hashes for crossandra-1.3.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d1c156be77489f29e5e044d35ac64011092f49d513e35016ad6018e11412409 |
|
MD5 | c5a08130f5d881947ef77fb470167609 |
|
BLAKE2b-256 | c5a2fd6cf8d46fad9cdff123948f6464d0643f0a740e80cd38c87ee98b75b211 |
Hashes for crossandra-1.3.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02d2ec5180ba32b595af59e5c6bbfebfe2d79d5a082266fa6c30fe091c5766ea |
|
MD5 | ac1b54afbd076bd94354e9757e990a35 |
|
BLAKE2b-256 | 4c8b9f37a53d51996c7e695be12932b8cd9565c6426cedb7a47478f11deb70d6 |
Hashes for crossandra-1.3.0-cp310-cp310-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21cb2bd8b832eaad5d4c7c6c68fa6aa72eaff4c840df0de10f808a45eaae19d8 |
|
MD5 | 811507913360de4b7202f555c9fa7b9f |
|
BLAKE2b-256 | 8ec63396cf4c41d8f94ae684b39054dff4c154c7e9e0128e1aa2f88bf5548027 |
Hashes for crossandra-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9fadabd59d8a9dfc052a3bba636908c6a4b936450a91325d488248449ac217c1 |
|
MD5 | 92674924095118e0b7bbfe5caafc6ff4 |
|
BLAKE2b-256 | 8818965c52c99843cc3a38e217d82c8c224aaae935ce02dcfd94e0b696655e99 |
Hashes for crossandra-1.3.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c00a21813f23e147d53072b3cb116af04ea481ffc54a99adb8aa28f88d3ffde3 |
|
MD5 | 948b3e136e27014948384d46b09fa202 |
|
BLAKE2b-256 | 3715b4e97086ca1044199b8b84bd26011db8438c326232df31509487746717ee |
Hashes for crossandra-1.3.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2236bb9d235cae83d8912f54b5282e675ddc7017529c973acb2111afca210e01 |
|
MD5 | 5ef31c000bd535374bf65bc6ecd68276 |
|
BLAKE2b-256 | f22068a34bb5b793899273926912d3e23578e9ec76b1e31dc6ae55967792053d |
Hashes for crossandra-1.3.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9a3882217cf125c3e1832265323d30f23d829e8df906dd91471db17f2257717 |
|
MD5 | 07be03896a489038a562a6fb0ea4f5f7 |
|
BLAKE2b-256 | 2830e4e6be37b7a880e58dfe640b12ec15d37413947cc9b7ccec79c0c00805f4 |
Hashes for crossandra-1.3.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b8d255e18250eb60621f8240991fc3d8505849823ff3942b8a5c363453d1b6c |
|
MD5 | 99e75f1316c0dc6498d0e4a57287dac9 |
|
BLAKE2b-256 | 6292ae2e3fa81d1dce4eb74007e035e7c2e0ce45e6398620ef97bcece27807b7 |
Hashes for crossandra-1.3.0-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ad83175d2ce80f978ae9de06db66ea44815680c8d1ff520cb4fcaf1d8249800 |
|
MD5 | 3aa37b5eac337cd660d83f41d2faa6b8 |
|
BLAKE2b-256 | 2034cf640964672cb80f709584c7ecac84a0e80e187109b4f97da8a3a8470c4c |
Hashes for crossandra-1.3.0-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 096191ab6f3709207498124bc29644f73e212be7be296b173d89236954e603ae |
|
MD5 | 7aa488306b5e53e9c623904ab60e3df5 |
|
BLAKE2b-256 | 12c40a48809b06cd2c80c79dce8617ef197d0f761dcfa006f7b6cfd7b7243e62 |
Hashes for crossandra-1.3.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69be3f13f1ad4950ed5954ef5edf7d6bf73fa5ea619592b09057a083296adeed |
|
MD5 | 2929e7868179c31beb9f85194126affc |
|
BLAKE2b-256 | 6a7565fc6a7289899992a1aced410801f8fe9331b6f72319e840d168107f02e1 |
Hashes for crossandra-1.3.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3180a94796d23c425f2d12f711b14c3a4c3c63e049366d16d46ca71b267290c4 |
|
MD5 | e254569db46494779f7a99f671ec3f86 |
|
BLAKE2b-256 | 12b68993521c5da6e6e19201695e7637331cfa9ad78670b9c8afa628f4797fd9 |
Hashes for crossandra-1.3.0-cp39-cp39-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4092b135ff5f4ea0ad37725145eb3bc5d523c5c12dc306b26bc1644c057caead |
|
MD5 | 4bd8cc04ffd5f555c07ed10a8e0b224d |
|
BLAKE2b-256 | d8fdfdc868fa5276aa566eb19bb021b406f5ab6536bf5b3c0d40c28c16974bee |
Hashes for crossandra-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | faa60ed21022193455bbca47b2f7938f2144e2b32d8ca3202c813317622540ab |
|
MD5 | 3545c9eefee1bd7d61814a91ad106874 |
|
BLAKE2b-256 | 00d469ee7405d57fb30eea3e8d5211616a931b029555bec7cf95524c7bd2efd1 |
Hashes for crossandra-1.3.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27d3782fd98ea12c5c5f66623aa77fbea2b9138de88b520d66cb38993d1cc9d0 |
|
MD5 | 19eceb50274be15f97fec4acd9555c50 |
|
BLAKE2b-256 | fb43d0a7e2f555efd08f6ae23c665e3679a9aabe10e0d7573d21cce9d9e584b9 |
Hashes for crossandra-1.3.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0090db4684c5e53024bd82238e8a2198120ffd6fdb7c4ed55e2f78d05fc6972 |
|
MD5 | 370540455a3ff543518aba2aca5af261 |
|
BLAKE2b-256 | 210e0734c50ece919ff8635c192a98c82dbee5a0ce1c12a64e5960dce773fae9 |
Hashes for crossandra-1.3.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 891c4e69334383015af1645e0b75d84f00f819641af2b01a97cc9d58ac45f0cd |
|
MD5 | a0c2005247436d8d271ff2707ff388dd |
|
BLAKE2b-256 | adb4357f26dd25485a0e9a6035ec59c34a8a16a610c55055b0e8b9c580df01af |
Hashes for crossandra-1.3.0-cp39-cp39-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2130633551d4d15dc7aa018f0891b4ccf84615130ce5770b27852b053d685232 |
|
MD5 | 33522c27fc26ac703f523c1b95913795 |
|
BLAKE2b-256 | 1cd7b7ddc83718443af73704364e01d2b051be55429369b784e834723811fe11 |