Get identifiers, names, paths, URLs and words from the command output.
Project description
Get identifiers, names, paths, URLs and words from the command output.
The xontrib-output-search for xonsh shell is using this library.
If you like the idea of tokenize-output click ⭐ on the repo and stay tuned by watching releases.
Install
pip install -U tokenize-output
Usage
Words tokenizing
$ echo "Try https://github.com/xxh/xxh" | tokenize-output -p
Hello
https://github.com/xxh/xxh
JSON, Python dict and JavaScript object tokenizing
$ echo '{"Try": "xonsh shell"}' | tokenize-output -p
Try
shell
xonsh
xonsh shell
env tokenizing
$ echo 'PATH=/one/two:/three/four' | tokenize-output -p
/one/two
/one/two:/three/four
/three/four
PATH
Development
Tokenizers
Tokenizer is a functions which extract tokens from the text.
Priority | Tokenizer | Text | Tokens |
---|---|---|---|
1 | dict | {"key": "val as str"} |
['key', 'val as str'] |
2 | env | PATH=/bin:/etc |
['PATH', '/bin:/etc', '/bin', '/etc'] |
3 | split | Split me \n now! |
['Split', 'me', 'now!'] |
4 | strip | {Hello} |
['Hello'] |
You can create your tokenizer and add it to tokenizers_all
in tokenize_output.py
.
Tokenizing is a recursive process where every tokenizer returns final
and new
tokens.
The final
tokens directly go to the result list of tokens. The new
tokens go to all
tokenizers again to find new tokens. As result if there is a mix of json and env data
in the output it will be found and tokenized in appropriate way.
Test and debug
Run tests:
cd ~
git clone https://github.com/tokenizer/tokenize-output
cd tokenize-output
python -m pytest tests/
To debug the tokenizer:
echo "Hello world" | ./tokenize_outupt -p
Related projects
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tokenize-output-0.4.2.tar.gz
.
File metadata
- Download URL: tokenize-output-0.4.2.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a99c660edf9a4766c67b342e247f8fca93c2d68dbc422da0f9986516da052f8 |
|
MD5 | 50112aeb8e3e8e1c49c077d8e94a55fe |
|
BLAKE2b-256 | 22b83e17b4d28823639210df1c3556ab6b822354e3729677bdf5d6be84817469 |
File details
Details for the file tokenize_output-0.4.2-py3-none-any.whl
.
File metadata
- Download URL: tokenize_output-0.4.2-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 200f4c55e007b216bd0209da4996d7d273ee08424b197d6006a4f07fd3948fc6 |
|
MD5 | 2d6b23afc2d4550438818ad483afcc41 |
|
BLAKE2b-256 | 37646b3c3a930b4d65a30024654282e7d32faff655f05484299e02f0e039a002 |