Skip to main content

Get identifiers, names, paths, URLs and words from the command output.

Project description

Get identifiers, names, paths, URLs and words from the command output.
The xontrib-output-search for xonsh shell is using this library.

If you like the idea click ⭐ on the repo and stay tuned by watching releases.

Install

pip install -U tokenize-output

Usage

Words tokenizing

$ echo "Try https://github.com/xxh/xxh" | tokenize-output -p
Try
https://github.com/xxh/xxh

JSON, Python dict and JavaScript object tokenizing

$ echo '{"Try": "xonsh shell"}' | tokenize-output -p
Try
shell
xonsh
xonsh shell

env tokenizing

$  echo 'PATH=/one/two:/three/four' | tokenize-output -p
/one/two
/one/two:/three/four
/three/four
PATH

Development

Tokenizers

Tokenizer is a functions which extract tokens from the text.

Priority Tokenizer Text Tokens
1 dict {"key": "val as str"} ['key', 'val as str']
2 env PATH=/bin:/etc ['PATH', '/bin:/etc', '/bin', '/etc']
3 split Split me \n now! ['Split', 'me', 'now!']
4 strip {Hello} ['Hello']

You can create your tokenizer and add it to tokenizers_all in tokenize_output.py.

Tokenizing is a recursive process where every tokenizer returns final and new tokens. The final tokens directly go to the result list of tokens. The new tokens go to all tokenizers again to find new tokens. As result if there is a mix of json and env data in the output it will be found and tokenized in appropriate way.

Test and debug

Run tests:

cd ~
git clone https://github.com/tokenizer/tokenize-output
cd tokenize-output
python -m pytest tests/

To debug the tokenizer:

echo "Hello world" | ./tokenize_outupt -p

Related projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenize-output-0.4.6.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

tokenize_output-0.4.6-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file tokenize-output-0.4.6.tar.gz.

File metadata

  • Download URL: tokenize-output-0.4.6.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for tokenize-output-0.4.6.tar.gz
Algorithm Hash digest
SHA256 8e11a3b602c8e08fbdd0139057e44f2069e814785ad197a7fdfae0eda9f54d89
MD5 cf0869e9fca6d0038b16d5789e15a77f
BLAKE2b-256 5ce03b785c782d401ced2d586f7b869105b5a916db1bf6c4bfb7f3cf623808f5

See more details on using hashes here.

File details

Details for the file tokenize_output-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: tokenize_output-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for tokenize_output-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f76689de4bbbc6320d66d8aca9cd36ab368df532bbb02ab3266cab2c82ffe05b
MD5 d1beb02494d13fe546f2c2e088b669cc
BLAKE2b-256 ca546162ae62aacf8f61ce30d34bf4349097278834af901e6bb29d7f350ee98b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page