Skip to main content

Get identifiers, names, paths, URLs and words from the command output.

Project description

Get identifiers, names, paths, URLs and words from the command output.
The xontrib-output-search for xonsh shell is using this library.

If you like the idea click ⭐ on the repo and stay tuned by watching releases.

Install

pip install -U tokenize-output

Usage

You can use tokenize-output command as well as export the tokenizers in Python:

from tokenize_output.tokenize_output import *
tokenizer_split("Hello world!")
# {'final': set(), 'new': {'Hello', 'world!'}}

Words tokenizing

echo "Try https://github.com/xxh/xxh" | tokenize-output -p
# Try
# https://github.com/xxh/xxh

JSON, Python dict and JavaScript object tokenizing

echo '{"Try": "xonsh shell"}' | tokenize-output -p
# Try
# shell
# xonsh
# xonsh shell

env tokenizing

echo 'PATH=/one/two:/three/four' | tokenize-output -p
# /one/two
# /one/two:/three/four
# /three/four
# PATH

Development

Tokenizers

Tokenizer is a functions which extract tokens from the text.

Priority Tokenizer Text Tokens
1 dict {"key": "val as str"} key, val as str
2 env PATH=/bin:/etc PATH, /bin:/etc, /bin, /etc
3 split Split me \n now! Split, me, now!
4 strip {Hello} Hello

You can create your tokenizer and add it to tokenizers_all in tokenize_output.py.

Tokenizing is a recursive process where every tokenizer returns final and new tokens. The final tokens directly go to the result list of tokens. The new tokens go to all tokenizers again to find new tokens. As result if there is a mix of json and env data in the output it will be found and tokenized in appropriate way.

How to add tokenizer

You can start from env tokenizer:

  1. Prepare regexp
  2. Prepare tokenizer function
  3. Add the function to the list and to the preset.
  4. Add test.
  5. Now you can test and debug (see below).

Test and debug

Run tests:

cd ~
git clone https://github.com/anki-code/tokenize-output
cd tokenize-output
python -m pytest tests/

To debug the tokenizer:

echo "Hello world" | ./tokenize-output -p

Related projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenize-output-0.4.9.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

tokenize_output-0.4.9-py3-none-any.whl (6.0 kB view details)

Uploaded Python 3

File details

Details for the file tokenize-output-0.4.9.tar.gz.

File metadata

  • Download URL: tokenize-output-0.4.9.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for tokenize-output-0.4.9.tar.gz
Algorithm Hash digest
SHA256 df4c07cbe0987c56b6719ab6c7b20c08b5522e46756b0e0ccdb2cd63cafff48f
MD5 a0e93bbc7052993c35ff4621880fbeb7
BLAKE2b-256 21ed08f731cf7e1de976f71ebc63e3435e32963e0fece7964489c48b5c5a0821

See more details on using hashes here.

File details

Details for the file tokenize_output-0.4.9-py3-none-any.whl.

File metadata

File hashes

Hashes for tokenize_output-0.4.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a517663da4bb249ddef5be6cd05f454c554dc1a688d0dbef5f3e6d2949899069
MD5 61f911128e1590362124749400c2f54f
BLAKE2b-256 c3123eac1663c2d531e62a2a383e790cc7c1e35534acd8279569d284f9c19bc5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page