A simple tokenizer operating on enums with a decent amount of configuration
Project description
Crossandra
Crossandra is a fast and simple tokenizer operating on enums with a decent amount of configuration.
Installation
Crossandra is available on PyPI and can be installed with pip, or any other Python package manager:
$ pip install crossandra
(Some systems may require you to use pip3
, python -m pip
, or py -m pip
instead)
License
Crossandra is licensed under the MIT License.
Reference
Crossandra
Crossandra(
token_source: type[Enum] = Empty,
*,
ignore_whitespace: bool = False,
ignored_characters: str = "",
suppress_unknown: bool = False,
rules: list[Rule | RuleGroup] | None = None
)
token_source
: an enum containing all possible tokens (defaults to an empty enum)ignore_whitespace
: whether spaces, tabs, newlines etc. should be ignoredignored_characters
: characters to skip during tokenizationsuppress_unknown
: whether unknown tokens should continue without throwing an errorrules
: a list of additional rules to use
The enum takes priority over the rule list.
When all tokens are of length 1 and there are no additional rules, Crossandra will use a simpler tokenization method (the so called Fast Mode).
Example: Tokenizing noisy Brainfuck code (tested on MacBook Air M1 (256/16))
# Setup
from random import choices
from string import punctuation
program = "".join(choices(punctuation, k=...))
k | Default | Fast Mode | Speedup |
---|---|---|---|
10 | 0.00004s | 0.00002s | 2x |
100 | 0.00016s | 0.00003s | 5.3x |
1000 | 0.0015s | 0.00013s | 11.5x |
10000 | 0.014s | 0.0009s | 15.6x |
100000 | 0.29s | 0.009s | 32.2x |
Rule
Rule[T](
pattern: str,
converter: Callable[[str], T] | bool = True,
flags: RegexFlag | int = 0
)
Used for defining custom rules. pattern
is a regex pattern to match (flags
can be supplied).
When converter
is a callable, it's used on the matched substring.
When converter
is True
, it will directly return the matched substring.
When converter
is False
, it will not include the matched substring in the token list.
RuleGroup
RuleGroup(rules: tuple[Rule[Any], ...])
Used for storing multiple Rules in one object. Can be constructed by ORing two or more Rules.
common
The common
submodule is a collection of commonly used patterns.
Rules:
- CHAR (e.g.
'h'
) - LETTER (e.g.
m
) - WORD (e.g.
ball
) - SINGLE_QUOTED_STRING (e.g.
'nice fish'
) - DOUBLE_QUOTED_STRING (e.g.
"hello there"
) - C_NAME (e.g.
crossandra_rocks
) - NEWLINE (
\n
;\r\n
is converted to\n
before tokenization) - DIGIT (e.g.
7
) - HEXDIGIT (e.g.
c
) - DECIMAL (e.g.
3.14
) - INT (e.g.
2137
) - SIGNED_INT (e.g.
-1
) - FLOAT (e.g.
1e3
) - SIGNED_FLOAT (e.g.
+4.3
)
RuleGroups:
- STRING (
SINGLE_QUOTED_STRING | DOUBLE_QUOTED_STRING
) - NUMBER (
INT | FLOAT
) - SIGNED_NUMBER (
SIGNED_INT | SIGNED_FLOAT
)
Examples
from enum import Enum
from crossandra import Crossandra
class Brainfuck(Enum):
ADD = "+"
SUB = "-"
LEFT = "<"
RIGHT = ">"
READ = ","
WRITE = "."
BEGIN_LOOP = "["
END_LOOP = "]"
bf = Crossandra(Brainfuck, suppress_unknown=True)
print(*bf.tokenize("cat program: ,[.,]"), sep="\n")
# Brainfuck.READ
# Brainfuck.BEGIN_LOOP
# Brainfuck.WRITE
# Brainfuck.READ
# Brainfuck.END_LOOP
from crossandra import Crossandra, Rule, common
def hex2rgb(hex_color: str) -> tuple[int, int, int]:
r, g, b = (int(hex_color[i:i+2], 16) for i in range(1, 6, 2))
return r, g, b
t = Crossandra(
ignore_whitespace=True,
rules=[
Rule(r"#[0-9a-fA-F]+", hex2rgb),
common.WORD
]
)
text = "My favorite color is #facade"
print(t.tokenize(text))
# ['My', 'favorite', 'color', 'is', (250, 202, 222)]
# Supporting Samarium's numbers and arithmetic operators
from enum import Enum
from crossandra import Crossandra, Rule
def sm_int(string: str) -> int:
return int(string.replace("/", "1").replace("\\", "0"), 2)
class Op(Enum):
ADD = "+"
SUB = "-"
MUL = "++"
DIV = "--"
POW = "+++"
MOD = "---"
sm = Crossandra(
Op,
ignore_whitespace=True,
rules=[Rule(r"(?:\\|/)+", sm_int)]
)
print(*sm.tokenize(r"//\ ++ /\\/ --- /\/\/ - ///"))
# 6 Op.MUL 9 Op.MOD 21 Op.SUB 7
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for crossandra-1.2.3-cp311-cp311-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e6563038c632f994870c2c6100c1600d410995fd78b239077ed6c5032c7e027 |
|
MD5 | e1f4e561d613e90a78defbfba2d9e281 |
|
BLAKE2b-256 | 55d12d3aa9969c2c66d445fee14fb219bde6df2d8a509cd68df6869f10743cc1 |
Hashes for crossandra-1.2.3-cp311-cp311-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fed972644fa9c474e37e3a93c2b7b1deba784d63b9dfd9b18db3c1288159039 |
|
MD5 | e235bb781d0eac430b5e9c5bfe742f4c |
|
BLAKE2b-256 | d28aa7866e41c9fb7e06d9022717cc771d2202a3c1d2ee718784e78bef418e23 |
Hashes for crossandra-1.2.3-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c68b1edd57693b2058e51773abc99d28e0d6933422c2dec8edf6e1a835a7943d |
|
MD5 | a8483366d5924a0053247f67ea61922c |
|
BLAKE2b-256 | 317929c2bb02db9197d56d0c861b742977a18663c7e3218269711cdc358571f5 |
Hashes for crossandra-1.2.3-cp311-cp311-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3957e3f42b63112a7c681e5402d0f427667c28b96e563b4fbda66acc1834ff1 |
|
MD5 | 98d7b4b699e150aa032ca87e32074a2e |
|
BLAKE2b-256 | 04737cee1df1ab31eb25da586ff9404a8c6886dea114a392e4055a96330fa9e9 |
Hashes for crossandra-1.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 700d8cb0bd5838b214269fed4d6e05f9be490dd5bd6dd87b511c43f610875074 |
|
MD5 | 898742109faf2162bc8409fb55bf013d |
|
BLAKE2b-256 | 0c582addbd0f7b632cea46795391041554a04582ec867a21d66d5501f66a93a8 |
Hashes for crossandra-1.2.3-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f9bd5d03d30e4514731b3a4dc04d4261a98103d59ac5f0c4cc3959a092d027a9 |
|
MD5 | 14a495201701597db2de069c5838e8f8 |
|
BLAKE2b-256 | 8a26f03d3598ae7739eeb3e93c2e170fe77de7c88fb357be03dfc269e1f43ad9 |
Hashes for crossandra-1.2.3-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57d8f9a3e93af5132253d7142904201e0e3ad2d06ef10267fe3f26ec0a7ff797 |
|
MD5 | 79752136d693ef425e2baa696bec9319 |
|
BLAKE2b-256 | 2bc717a50bc82023788fd6553a6cb2842f4ca530a328245c3d5cd8522103d313 |
Hashes for crossandra-1.2.3-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60aa800f0ce908f815957fcad20dee47dc3b49566c2eca815e93119db5d05cff |
|
MD5 | 61a28a5f252d283b066f50af718ee5bd |
|
BLAKE2b-256 | da9e68808df6130c8575127bb2081b4b0086dc24a769ce51b1e927eb16b7dce6 |
Hashes for crossandra-1.2.3-cp310-cp310-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67a835fc4783ea141531e986fd8e7436dd989101452ca13d442076625b43c799 |
|
MD5 | e36d805e19e6b539d1670bb2adf35ba9 |
|
BLAKE2b-256 | 4c321f9d6c153c37b0279e4d21c2f3e8a7db1a3249b1ec0c5ccdd320eb2f12e3 |
Hashes for crossandra-1.2.3-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74d656aae30219b70d8cf293cfcd332366f149bcf87bd6357034b2582213edde |
|
MD5 | 7c7223855e8fc55db2a2b42f6defa5d8 |
|
BLAKE2b-256 | 6508e9bd54018a857869c67d6379375414d74d489d37e940a3dfb8dc07fcace9 |
Hashes for crossandra-1.2.3-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cd6557d45b7d6774ec77e311257d2133737569f442c6c57aec48fde7b90cee2c |
|
MD5 | 2ba42a306fb4ba12652689c41bfc03af |
|
BLAKE2b-256 | 36cda52861238039eb16c41b62754d25adaa0ffa506cb3e5d0c6821025cafe1a |
Hashes for crossandra-1.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ff77a654007e47079004bffe20ad91d5d17f243aa7d70d6fb921027b31c4485 |
|
MD5 | d7707a3ca27c2a15838d0ed8a3263b2d |
|
BLAKE2b-256 | ea915168073bf9a8df7dedd5f5c1a4f408fc97177c2b3459eaf51c01197efa2a |
Hashes for crossandra-1.2.3-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c9f84e54db1c8bb396ee980282bf8446871063cb554c213ef5a6ddf9e37cd9b |
|
MD5 | 8a7a786bc5600e59652efc88687065a2 |
|
BLAKE2b-256 | 070f65e19f7a11e1d95218ebf0eea6ffada92190366bcbcf8e6261b0c8cb9477 |
Hashes for crossandra-1.2.3-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2de1201e87eee8ec1d9d3ff06a46fd318351a52747e9e698a8c83287f714b64f |
|
MD5 | 4f88e1134fd1c73c19ad3175f2423ec4 |
|
BLAKE2b-256 | 3a71f84cd0d5c03c6541feee2159a2d1177e07690175e7446baafbe592604dd7 |
Hashes for crossandra-1.2.3-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e666f0303a197a92e664b25fd34ee6c3a3cd0e02fc743677c9c7024d0205043 |
|
MD5 | dacd682f8f28766da09fc048b087b437 |
|
BLAKE2b-256 | ad87d895b075acacf709ee00d19372515c619a1939f91486315ac4dfa07d49b7 |
Hashes for crossandra-1.2.3-cp39-cp39-win32.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cf7d3666312f30e4560d1abf0e82fc07b4aeef469fe2147af7a326f1447b4c9 |
|
MD5 | ea0307676082d15265234bbed184d0ea |
|
BLAKE2b-256 | f0b558425ea134bfcbdfd4cb2c5b2e9402382041b300f53b754ace17877fdb7b |
Hashes for crossandra-1.2.3-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 35151c8c3d886f591c6a396f1868c665c1ca7f702a07e0a5eb71408ecd7a0204 |
|
MD5 | 1be3fd106a83ad1cd09078d66a777072 |
|
BLAKE2b-256 | 4af7cf7885e37336ac5c8dbb4584d92aab01ceaadd0d1af16f4e0152e10acc85 |
Hashes for crossandra-1.2.3-cp39-cp39-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 505c4065c8308c86ad26913b3961bfcaba7f3233df18c560cfd18af73a313c8c |
|
MD5 | 4045437d9f2ca79087f21a31403c3e39 |
|
BLAKE2b-256 | 8104a59b0b9076f1b4ef138ed002b8c7b8f74c4c42801ceb6e06a04cd79c37fd |
Hashes for crossandra-1.2.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d7f0b3b2ccdf044308a4373dc160887f702ed38cc8a12c708122d901192d5a2 |
|
MD5 | f6d840865ce1d0fad01dbf363a437177 |
|
BLAKE2b-256 | d6f5b872001f6dfa0428dd0d8639434712b0809c92551bb57dc38951fb197060 |
Hashes for crossandra-1.2.3-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6fbbc16aa8d8a7148d977db222230f427e300a5829646b5435be81e3b29c6d7 |
|
MD5 | 6677121a80629650ee64426a86782f79 |
|
BLAKE2b-256 | b68f3b934ffad29b967a95f956e659813116b1b08724c94b77147ef163d124ba |
Hashes for crossandra-1.2.3-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce3c47cf2141fe851064c194e6e05d2b1b1e45bacf3149278808d91524c25002 |
|
MD5 | 84b30ad38508b1bff0c2f909b23a0880 |
|
BLAKE2b-256 | 4554470f0edd00e03b04eba97737bd55fd2ca1bb1c96cec4526600c3f390055d |