Powerful text parsing made intuitive
Project description
Trilobyte
Powerful text pattern parsing made intuitive
Key features:
- Variable assignment
- Smart algorithm
- Powerful expressions
Using an algorithm based on text keypoints, Trilobyte's implementation differs vastly from most other text searching engines available, such as those for Regex. A somewhat high-level overview of the algorithm is presented near the top of ./trilobyte/keypoints/classes.py
. This is copied below:
# @dev
# If inputs have so far matched keypoint but keypoint is not yet completed,
# `matched` = False, `completed` = False
# If previous inputs have matched keypoint, keypoint is complete, and next input does not match,
# `matched` = True, `completed` = True
# If inputs have so far matched keypoint and keypoint is completed and cannot go on,
# `matched` = True, `completed` = True
# If inputs have so far matched keypoint and keypoint is completed but can still go on,
# `matched` = True, `completed` = False
# If previous inputs have matched keypoint, keypoint is yet to complete, and current input does not match,
# `matched` = False, `completed` = True
# @dev
# @algorithm
# If !`matched` and !`completed` continue on current search branch with current keypoint
# If `matched` and !`completed` continue on current search branch with current keypoint,
# while also forking a new search branch with next keypoint
# If !`matched` and `completed` delete current search branch
# If `matched` and `completed` continue on current search branch with next keypoint
# @dev
# @algorithm
# Open new search branch with root keypoint at every new character in sequence
# @dev
# @algorithm
# When all branches have been computed, resolve conflicting (overlapping) branches,
# giving priority to branches discovered first (unless user specifies otherwise).
# Then, if user does not want recursive search, remove branches nested inside bigger branches.
# @update
# Kill branch early if already overlapped
Docs
Trilobyte is still under development; the following commands have mostly been implemented programmatically, but cannot be parsed from plain text yet.
\ / Makes the trilo treat the following expression as normal text
~ ( num ) [ pat ] / Take the negative of pat, optionally supplying max checking length as a
number `num`
* [ text ] Ignore case
{ char1 - char2 } / Command that detects any character between char1 and char2 on UNICODE
(inclusive)
{ pat1 , pat2 , pat3 , ... } / Command that detects any trilo between the list of alternatives (use `\,`
to avoid compiler treating `,` as a delimiter)
@r ( $var / num ) [ pat ] / Command that detects repeated patterns of pat (r for repeat),
optionally specify a variable name `var` for the number of repetitions matched,
or supplying a number `num` which fixes the number of repetitions
@d [ delim_pat ] [ pat ] / Command that detects a list of pat delimited by delim_pat (d for delimited)
@a [ main_pat ] [ rep_pat ] / Command that detects main_pat, followed by an optional repeated occurrence of
rep_pat after (a for after)
@b [ rep_pat ] [ main_pat ] / Command that detects main_pat, preceded by an optional repeated occurrence of
rep_pat before (b for before)
%s / The space character
%t / The tab character
%n / The newline character
%r / The return character
%w / Any whitespace character, including newline
%m / Only spaces or tabs (m = s + t, also think of it mono - all on 1 line!)
%f / Only newline or return (f = n + r, also think of it as flush)
%U / Any uppercase character
%l / Any lowercase character
%a / Any alphabetical character
%d / Any numerical digit
%b / Any alphanumeric character (b for basic)
%v / Shorthand for any sequence of alphanumeric characters that doesn't
start with a number (v for variable, as these are generally named in
this convention)
%o / Shorthand for repetition of %w (o = r + w, also think of it as omni -
everything!)
%e / Shorthand for repetition of %m (e = r + m, also think of it as exclusive)
%x / Shorthand for repetition of %f (x = r + f, also think of X as separation -
that's what a sequence of %f is anyway)
%p / Shorthand for any comma-delimited sequence of %v (v for parameters, as
these are generally formatted in this convention)
$var [ pat ] / Represents the expression that follows pat as a variable (use '' for pat
in order to detect any possible expression)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file trilobyte-0.0.1.tar.gz
.
File metadata
- Download URL: trilobyte-0.0.1.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5da46a4368b8f22c2fa8fb089f21959ff9b83890bf102f9057b1d1f2dca117aa |
|
MD5 | 6d763ed63bfb4991932faa086ce8459f |
|
BLAKE2b-256 | 04965ee70535aaab013474d742eb9805c74f57c4e7fc137249a44897df845e12 |
File details
Details for the file trilobyte-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: trilobyte-0.0.1-py3-none-any.whl
- Upload date:
- Size: 35.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.6.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.62.1 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae1ec131645b9e9c0c1c5ca362a6f51468f214ecd7b68b14932ae87d21e2ea47 |
|
MD5 | 8d11ef517a1ab0682221cf1b04d9aa88 |
|
BLAKE2b-256 | f0aae1c50ef89795083ef70c80868f2b1dca9ed5ca525486e8d39f64f2a34b96 |