tokenize-output

Get identifiers, names, paths, URLs and words from the command output.

These details have not been verified by PyPI

Project links

Project description

Get identifiers, names, paths, URLs and words from the command output.
The xontrib-output-search for xonsh shell is using this library.

If you like the idea click ⭐ on the repo and stay tuned by watching releases.

Install

pip install -U tokenize-output

Usage

You can use tokenize-output command as well as export the tokenizers in Python:

from tokenize_output.tokenize_output import *
tokenizer_split("Hello world!")
# {'final': set(), 'new': {'Hello', 'world!'}}

Words tokenizing

echo "Try https://github.com/xxh/xxh" | tokenize-output -p
# Try
# https://github.com/xxh/xxh

JSON, Python dict and JavaScript object tokenizing

echo '{"Try": "xonsh shell"}' | tokenize-output -p
# Try
# shell
# xonsh
# xonsh shell

env tokenizing

echo 'PATH=/one/two:/three/four' | tokenize-output -p
# /one/two
# /one/two:/three/four
# /three/four
# PATH

Development

Tokenizers

Tokenizer is a functions which extract tokens from the text.

Priority	Tokenizer	Text	Tokens
1	dict	`{"key": "val as str"}`	`key`, `val as str`
2	env	`PATH=/bin:/etc`	`PATH`, `/bin:/etc`, `/bin`, `/etc`
3	split	`Split me \n now!`	`Split`, `me`, `now!`
4	strip	`{Hello}`	`Hello`

You can create your tokenizer and add it to tokenizers_all in tokenize_output.py.

Tokenizing is a recursive process where every tokenizer returns final and new tokens. The final tokens directly go to the result list of tokens. The new tokens go to all tokenizers again to find new tokens. As result if there is a mix of json and env data in the output it will be found and tokenized in appropriate way.

How to add tokenizer

You can start from env tokenizer:

Prepare regexp
Prepare tokenizer function
Add the function to the list and to the preset.
Add test.
Now you can test and debug (see below).

Test and debug

Run tests:

cd ~
git clone https://github.com/anki-code/tokenize-output
cd tokenize-output
python -m pytest tests/

To debug the tokenizer:

echo "Hello world" | ./tokenize-output -p

Related projects

xontrib-output-search for xonsh shell

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.10

Apr 12, 2023

This version

0.4.9

Mar 24, 2023

0.4.8

Mar 23, 2023

0.4.7

Jul 5, 2022

0.4.6

Feb 28, 2022

0.4.4

Sep 10, 2020

0.4.3

Sep 10, 2020

0.4.2

Sep 10, 2020

0.4.1

May 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenize-output-0.4.9.tar.gz (5.5 kB view details)

Uploaded Mar 24, 2023 Source

Built Distribution

tokenize_output-0.4.9-py3-none-any.whl (6.0 kB view details)

Uploaded Mar 24, 2023 Python 3

File details

Details for the file tokenize-output-0.4.9.tar.gz.

File metadata

Download URL: tokenize-output-0.4.9.tar.gz
Upload date: Mar 24, 2023
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for tokenize-output-0.4.9.tar.gz
Algorithm	Hash digest
SHA256	`df4c07cbe0987c56b6719ab6c7b20c08b5522e46756b0e0ccdb2cd63cafff48f`
MD5	`a0e93bbc7052993c35ff4621880fbeb7`
BLAKE2b-256	`21ed08f731cf7e1de976f71ebc63e3435e32963e0fece7964489c48b5c5a0821`

See more details on using hashes here.

File details

Details for the file tokenize_output-0.4.9-py3-none-any.whl.

File metadata

Download URL: tokenize_output-0.4.9-py3-none-any.whl
Upload date: Mar 24, 2023
Size: 6.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for tokenize_output-0.4.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a517663da4bb249ddef5be6cd05f454c554dc1a688d0dbef5f3e6d2949899069`
MD5	`61f911128e1590362124749400c2f54f`
BLAKE2b-256	`c3123eac1663c2d531e62a2a383e790cc7c1e35534acd8279569d284f9c19bc5`