A simple tokenizer operating on enums with a decent amount of configuration
Project description
Crossandra
Crossandra is a fast and simple tokenization library for Python operating on enums and regular expressions, with a decent amount of configuration.
Installation
Crossandra is available on PyPI and can be installed with pip, or any other Python package manager:
$ pip install crossandra
(Some systems may require you to use pip3
, python -m pip
, or py -m pip
instead)
License
Crossandra is licensed under the MIT License.
Reference
Crossandra
Crossandra(
token_source: type[Enum] = Empty,
*,
ignore_whitespace: bool = False,
ignored_characters: str = "",
suppress_unknown: bool = False,
rules: list[Rule | RuleGroup] | None = None
)
token_source
: an enum containing all possible tokens (defaults to an empty enum)ignore_whitespace
: whether spaces, tabs, newlines etc. should be ignoredignored_characters
: characters to skip during tokenizationsuppress_unknown
: whether unknown tokens should continue without throwing an errorrules
: a list of additional rules to use
The enum takes priority over the rule list.
When all tokens are of length 1 and there are no additional rules, Crossandra will use a simpler tokenization method (the so called Fast Mode).
Example: Tokenizing noisy Brainfuck code (tested on MacBook Air M1 (256/16))
# Setup
from random import choices
from string import punctuation
program = "".join(choices(punctuation, k=...))
k | Default | Fast Mode | Speedup |
---|---|---|---|
10 | 0.00004s | 0.00002s | 2x |
100 | 0.00016s | 0.00003s | 5.3x |
1000 | 0.0015s | 0.00013s | 11.5x |
10000 | 0.014s | 0.0009s | 15.6x |
100000 | 0.29s | 0.009s | 32.2x |
Rule
Rule[T](
pattern: str,
converter: Callable[[str], T] | bool = True,
flags: RegexFlag | int = 0
)
Used for defining custom rules. pattern
is a regex pattern to match (flags
can be supplied).
When converter
is a callable, it's used on the matched substring.
When converter
is True
, it will directly return the matched substring.
When converter
is False
, it will not include the matched substring in the token list.
RuleGroup
RuleGroup(rules: tuple[Rule[Any], ...])
Used for storing multiple Rules in one object. Can be constructed by ORing two or more Rules.
common
The common
submodule is a collection of commonly used patterns.
Rules:
- CHAR (e.g.
'h'
) - LETTER (e.g.
m
) - WORD (e.g.
ball
) - SINGLE_QUOTED_STRING (e.g.
'nice fish'
) - DOUBLE_QUOTED_STRING (e.g.
"hello there"
) - C_NAME (e.g.
crossandra_rocks
) - NEWLINE (
\n
;\r\n
is converted to\n
before tokenization) - DIGIT (e.g.
7
) - HEXDIGIT (e.g.
c
) - DECIMAL (e.g.
3.14
) - INT (e.g.
2137
) - SIGNED_INT (e.g.
-1
) - FLOAT (e.g.
1e3
) - SIGNED_FLOAT (e.g.
+4.3
)
RuleGroups:
- STRING (
SINGLE_QUOTED_STRING | DOUBLE_QUOTED_STRING
) - NUMBER (
INT | FLOAT
) - SIGNED_NUMBER (
SIGNED_INT | SIGNED_FLOAT
)
Examples
from enum import Enum
from crossandra import Crossandra
class Brainfuck(Enum):
ADD = "+"
SUB = "-"
LEFT = "<"
RIGHT = ">"
READ = ","
WRITE = "."
BEGIN_LOOP = "["
END_LOOP = "]"
bf = Crossandra(Brainfuck, suppress_unknown=True)
print(*bf.tokenize("cat program: ,[.,]"), sep="\n")
# Brainfuck.READ
# Brainfuck.BEGIN_LOOP
# Brainfuck.WRITE
# Brainfuck.READ
# Brainfuck.END_LOOP
from crossandra import Crossandra, Rule, common
def hex2rgb(hex_color: str) -> tuple[int, int, int]:
r, g, b = (int(hex_color[i:i+2], 16) for i in range(1, 6, 2))
return r, g, b
t = Crossandra(
ignore_whitespace=True,
rules=[
Rule(r"#[0-9a-fA-F]+", hex2rgb),
common.WORD
]
)
text = "My favorite color is #facade"
print(t.tokenize(text))
# ['My', 'favorite', 'color', 'is', (250, 202, 222)]
# Supporting Samarium's numbers and arithmetic operators
from enum import Enum
from crossandra import Crossandra, Rule
def sm_int(string: str) -> int:
return int(string.replace("/", "1").replace("\\", "0"), 2)
class Op(Enum):
ADD = "+"
SUB = "-"
MUL = "++"
DIV = "--"
POW = "+++"
MOD = "---"
sm = Crossandra(
Op,
ignore_whitespace=True,
rules=[Rule(r"(?:\\|/)+", sm_int)]
)
print(*sm.tokenize(r"//\ ++ /\\/ --- /\/\/ - ///"))
# 6 Op.MUL 9 Op.MOD 21 Op.SUB 7
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for crossandra-1.2.4-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08f0035673f0d9b8d3cc5bc46be39dfcd8435407cb0611c23d024ab1e4add8d0 |
|
MD5 | 0a91de710d0b0a3504ffb570afb3f4a5 |
|
BLAKE2b-256 | e86ac5272b9e9dfb4f9ae7b690bfa9295298479ab9c05c38878002b90b9edc4f |
Hashes for crossandra-1.2.4-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69a8ef7123dc68a7436b3ff87b07b574ab5a6c705db0990dd3bd88b6a0426461 |
|
MD5 | c780200fa4499d96e3da8ec32911b36e |
|
BLAKE2b-256 | 280e1c2c4f8de4e93da93814b641a339b8a9f2c7ec2779077695e059ecdc1bad |
Hashes for crossandra-1.2.4-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c4a2152cb5b8f1ddacd1a80d2c252219349c3a5893b40250784825b99b154e6 |
|
MD5 | 100ec07acc23b3abd6e7d41211053664 |
|
BLAKE2b-256 | 03cbe5d7f126f68227275f2bb297f4b04cc800bd121c7106689d71823887ff66 |
Hashes for crossandra-1.2.4-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdbd2607a141152b6cac2386d61419028533621fef9eb8df8e6118ab6ab8a51f |
|
MD5 | 1c58060d026068487fcf777431099d44 |
|
BLAKE2b-256 | ca0cae9a160e425c872939104e690db9f9c0ff9362035ca7502e549cc843bf6b |
Hashes for crossandra-1.2.4-cp311-cp311-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8154cdb42a5b20491d9d2ac3ecd55219eca8973c7c15e2cfd03a4d2280163732 |
|
MD5 | 8fff660db359cf12edd7d63a18f1e382 |
|
BLAKE2b-256 | cccdc6672a5abe3a2ee25b6c66740e44bf874c4a509fbaf5dd310e196300fee0 |
Hashes for crossandra-1.2.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db3d00e30877d5c23d867dec56a8e36e0ec7cd475f4172db3202ef709d0bd8d6 |
|
MD5 | 3249602c4922e084393b82dc541a8cf3 |
|
BLAKE2b-256 | d84ce666fceb50434501d4ea79b959abddbfaf1b94c051a6e9da7b2065cc22a7 |
Hashes for crossandra-1.2.4-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9dfbca7a37f7824062fdb08fc80d614f64f7f31bed540cac12a375884eb46182 |
|
MD5 | d4a3ad07605068869196f6ac6b261673 |
|
BLAKE2b-256 | f1875784961031e4c5f27586945f531f2123272d9ea5a10c9700458edb925c7f |
Hashes for crossandra-1.2.4-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68e18fba9c7c1cca29b180dcf323e44194e9cae980ecf2e303b649f2a8e1e217 |
|
MD5 | a52f8c4a8085591076fafa19ab119e65 |
|
BLAKE2b-256 | 6d9b34542e1d15fc7538d7099f712842fb915d76f5b83a2c124bab519e5d4316 |
Hashes for crossandra-1.2.4-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21ef41a6489093a06885babb099f4943feee8f572b09ece779c899b4972300c1 |
|
MD5 | a135de42ba4bea3c7f173896954997bf |
|
BLAKE2b-256 | a6760093811e66958165cc2e9a65bb4ce54c648ae1ebc07372b6cc6342085755 |
Hashes for crossandra-1.2.4-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5055723dab8ecf28f5ca512b733ddfc896785c3f50ac65ab7f025928a1547b40 |
|
MD5 | 8aa06481f0bf50e72d37192e276c8736 |
|
BLAKE2b-256 | 621f2c7dd4609c7fe0bbf5d66b0cf8ff0749f7c67df576d96b2a6edacceb6e30 |
Hashes for crossandra-1.2.4-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa8c1fc192c91d2b4dbe0ed259816a787277006588ed7f5294a4b2db15e2021d |
|
MD5 | e1f9e8a825c9ef314a7867097914ed7b |
|
BLAKE2b-256 | c5cb861ae8c75b45a2a00f6599465ed37158445cc94885d0a34bec2898817e53 |
Hashes for crossandra-1.2.4-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f55a74fbd0d16c5ecd1c617666a53d11de21d5e5b7201b15b19b59ecfdea1b24 |
|
MD5 | 26b19f000a65a2f0e1a589c3d74c53b8 |
|
BLAKE2b-256 | dfd479cf373c67f5dac971bf4c90a3f23b5b7493790148f7f15a4052c7947fd6 |
Hashes for crossandra-1.2.4-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb13d5bfe4566d8e98153970e8bb425bc4ad91ae03d916a87b88caa7ecf1b291 |
|
MD5 | b3b45b94aad07be643c84e3fd759c240 |
|
BLAKE2b-256 | 8aaa131c9856d644a2482ef8976bff2f28764da18a1d32ba02e260a2aa118626 |
Hashes for crossandra-1.2.4-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e11f858142a71d2541440e5014c01a3d440e9bf625c9eb0f66fc7f32143d2d16 |
|
MD5 | 587af76278b3d28d98c0e5c9ebbf9583 |
|
BLAKE2b-256 | e8886889fc6d36ba8cb42d63bb3c8cc75788beeb9a055f64888a8e34d9a1ec43 |
Hashes for crossandra-1.2.4-cp310-cp310-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 286149af1c2e6a9cfa4e269493a453792708fae3259d8bc9e2c93e956f5656ef |
|
MD5 | 1085ddbaa157213ff79e50ac0c8c604e |
|
BLAKE2b-256 | 31d7b0869ca6e3a710374858d0c59d44bcd95e95f1b49bfb897cb605bd273375 |
Hashes for crossandra-1.2.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7d5299be21fc022fc629e84253cf609509630d3a163546c49ad1b4ea30c4f7f1 |
|
MD5 | 4081fe102635dec8eeb32c01a533a2db |
|
BLAKE2b-256 | ddd49709eb615dbfbf365ded557a68efcf6254185b5ad4d2f24f7d3fbb9412d4 |
Hashes for crossandra-1.2.4-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2fbd5ee6c476fb4072e6b2e453130baac8a241224db2fb1039fc00c4a59ceb62 |
|
MD5 | c532ac5251ec67f6d742d7160900f3cf |
|
BLAKE2b-256 | e221ef85b7c5aea24fa63b83cb0ef236ee7e9f3e707d55c4110e133c19f48aa4 |
Hashes for crossandra-1.2.4-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bfda033399fc36259c126c10d7fdf50ecae1a082c2d3d749d42bbe631a0934db |
|
MD5 | 47e9fbbd7df3f29c52611099f9ad4881 |
|
BLAKE2b-256 | 6ec9fa6356c7bac4e3a40545d8fa993fddb524806ff7f6d9bdeacba9e0e511f9 |
Hashes for crossandra-1.2.4-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a23060af8c04d6369d58c0724522f8009c5e9ae8e5a53b366f0a9068ccd3f20e |
|
MD5 | 9343e092b6060fbf93438915f6f83809 |
|
BLAKE2b-256 | cd57931a1ec1b753ffbacb996b06f0b26d28dce090897c88835d6be9da2e23ee |
Hashes for crossandra-1.2.4-cp310-cp310-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 460375145e22d6a4e5a67a8fb46b6cdc795ce1556c5e68828d3c11d30b91fae4 |
|
MD5 | 4f594d575777741596c4346ca9f5ccd0 |
|
BLAKE2b-256 | bd9ca60b0545e7590c5cb2d0b54c4d42a1c82eab932a08d91771240aa8b070b7 |
Hashes for crossandra-1.2.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a511d148e347cb84096942c433cd0a27313d485d87af280b67666726a0e7e107 |
|
MD5 | ed833a9c11d74c8078da8f0f235da8a4 |
|
BLAKE2b-256 | b995d9927433dc3026ace380cb4e1a9ec0eee7134a070f8b390a0a55af66d235 |
Hashes for crossandra-1.2.4-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0db657d80096652b5588c6051410b839c8363ec802a1f743921f99daea1949f |
|
MD5 | 8394843ee72acc61797dbd44794ebd10 |
|
BLAKE2b-256 | c048e1548f262cbecbe620dc4f20d4311f98201bbd66095aac1ea0a97599e98f |
Hashes for crossandra-1.2.4-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bae1918a1dcc9d140c0cd963ae283d8cc8f2a1886b27c49190384ca293e4ef9 |
|
MD5 | 3084f6c110432ef542a8e32b414201cb |
|
BLAKE2b-256 | 6f175b077ae1e826b0231500fb8465fa2de33aaea06cd66559cd0db7f7b80c02 |
Hashes for crossandra-1.2.4-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f42f0965493fda4542c31bf5a156e5a5ec31b7556a3a6dcbd5270519cf855a4d |
|
MD5 | 8147ca25d5de7f8520d22f40d5dc9680 |
|
BLAKE2b-256 | c3872d0346a69783f9cffc7c424725da3e9bbbad0e91a094b4cf98b20279b210 |
Hashes for crossandra-1.2.4-cp39-cp39-musllinux_1_1_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ec198536c9a2c68ea509d2b53412def8f5e36a26f4573a54817ae79cdc11d0e |
|
MD5 | 4c831f15392dd74db8b49caaf9c6b1eb |
|
BLAKE2b-256 | 7dec13237eda6bcd7980b4499538f635d5a67715a4b4e3c116235d63d06b0831 |
Hashes for crossandra-1.2.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b096890d67ddf57c0894b4eaae8b9bb28db552ed7db65828b3bb40c128e2d27d |
|
MD5 | 44eed6835782807556c39a248a631593 |
|
BLAKE2b-256 | 4c43130e9b16367c988bc4c219e476181869adfdbcbebbf88b7c96da5bd62675 |
Hashes for crossandra-1.2.4-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d28ea6879e6925152d1636a1e6be6f60a17991cf5c576fd234eaa196e146b94 |
|
MD5 | 38c912c649b88f2db226be9d479561bc |
|
BLAKE2b-256 | 910d273bde89df1fa89792057c277cbe984d8fec0738fc819d84e59d8027049b |
Hashes for crossandra-1.2.4-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | aeb3a2856522a4455a3fbc3e1ee84be1409c786e3c9cec31453c02f4466b392b |
|
MD5 | 508a4a616b22d4e8a60218bce727464a |
|
BLAKE2b-256 | 7584cce494010b375aaee3d64fc4747e0af8ea786c546e80834d6338f18d9b5c |
Hashes for crossandra-1.2.4-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3fc156319597ea9dad5ac0a0047e126e3baa062b7fb5de66c694286b6a570852 |
|
MD5 | 2034eb578475f08f0ac9a5f7f34da934 |
|
BLAKE2b-256 | 0060b07388ebf8e2e700d1c266110441135c60bf88731f2d4071ff1efab82a64 |
Hashes for crossandra-1.2.4-cp39-cp39-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 181e62148aab4c0cdbcab1c594e92e0ca33417effd5c2d6b91d809cb60ec8981 |
|
MD5 | 295a31080d3bb2a54c297ef2f1eb5102 |
|
BLAKE2b-256 | 7ee95845e769800d2867aa3d3af0b75dc520735b9ae5b95853ebe990531dfa9a |