Get identifiers, names, paths, URLs and words from the command output.
Project description
Get identifiers, names, paths, URLs and words from the command output.
The xontrib-output-search for xonsh shell is using this library.
If you like the idea click ⭐ on the repo and stay tuned by watching releases.
Install
pip install -U tokenize-output
Usage
You can use tokenize-output
command as well as export the tokenizers in Python:
from tokenize_output.tokenize_output import *
tokenizer_split("Hello world!")
# {'final': set(), 'new': {'Hello', 'world!'}}
Words tokenizing
echo "Try https://github.com/xxh/xxh" | tokenize-output -p
# Try
# https://github.com/xxh/xxh
JSON, Python dict and JavaScript object tokenizing
echo '{"Try": "xonsh shell"}' | tokenize-output -p
# Try
# shell
# xonsh
# xonsh shell
env tokenizing
echo 'PATH=/one/two:/three/four' | tokenize-output -p
# /one/two
# /one/two:/three/four
# /three/four
# PATH
Development
Tokenizers
Tokenizer is a functions which extract tokens from the text.
Priority | Tokenizer | Text example | Tokens |
---|---|---|---|
1 | dict | {"key": "val as str"} |
key , val as str |
2 | env | PATH=/bin:/etc |
PATH , /bin:/etc , /bin , /etc |
3 | split | Split me \n now! |
Split , me , now! |
4 | strip | {Hello}!. |
Hello |
You can create your tokenizer and add it to tokenizers_all
in tokenize_output.py
.
Tokenizing is a recursive process where every tokenizer returns final
and new
tokens.
The final
tokens directly go to the result list of tokens. The new
tokens go to all
tokenizers again to find new tokens. As result if there is a mix of json and env data
in the output it will be found and tokenized in appropriate way.
How to add tokenizer
You can start from env
tokenizer:
- Prepare regexp
- Prepare tokenizer function
- Add the function to the list and to the preset.
- Add test.
- Now you can test and debug (see below).
Test and debug
Run tests:
cd ~
git clone https://github.com/anki-code/tokenize-output
cd tokenize-output
python -m pytest tests/
To debug the tokenizer:
echo "Hello world" | ./tokenize-output -p
Related projects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tokenize-output-0.4.10.tar.gz
.
File metadata
- Download URL: tokenize-output-0.4.10.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2930974b5e47e3fb12be4526085b87a2fcc96781c995c30645c33ea9c8d4d011 |
|
MD5 | d9e03135fdc7bfe569722c8d1744d452 |
|
BLAKE2b-256 | 4b18a301ad7a8ad40744179544a377c9e660ad6de18321159986aa4f93a859ad |
File details
Details for the file tokenize_output-0.4.10-py3-none-any.whl
.
File metadata
- Download URL: tokenize_output-0.4.10-py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1efb30e229d26840e5ff4c23c9d8fcfb059a641c20b3286fdf29ebaaab9bdb5c |
|
MD5 | f56708d5972c91c4cd6ad82e917cb3ce |
|
BLAKE2b-256 | 0eac616053a95c2ab3eddf086222235434287124887369d700eb089eb5f337da |