Skip to main content

Very simple tokenizer for teaching purposes

Project description

pytokr

Very simple, somewhat stoned tokenizer for teaching purposes.

Current pip-installable version is 0.0.2 but the current status of the repository is version 0.1.0.

Behaviorally inspired by the early versions of the easyinput module; shares with it some similar aims, but not the aim of conceptual consistency with C/C++.

A separate, different evolution of easyinput is yogi.

Install

The usual incantation should work: pip install pytokr (maybe with either "sudo" or "--user" or within a virtual environment).

If that does not work, download or clone the repo, then put the pytokr folder where Python can see it from wherever you want to use it.

Simplest usage since version 0.1.0

Finds items (simple tokens white-space separated) in a string-based iterable such as stdin (default). Ends of line are counted as white space but are otherwise ignored. Usage:

from pytokr import pytokr

then call pytokr to obtain the tokenizer function; give it whatever name you see fit, say, item:

item = pytokr()

Then, successive calls to item() will provide you with successive tokens from stdin. In case no items remain, an EndOfDataError exception will be raised. Note that, as white-space is ignored, in case only white-space remains then the program is at end of data.

If a different source of items is desired, say source (e.g. a file just open'ed or a list of strings), simply pass it on:

item = pytokr(source)

In either case, a second output can be requested, namely, an iterator over the items, say you want to name it items:

item, items = pytokr(iter = True)

(such call would accept as well a source as first parameter). Then you can run for itm in items(): or make up a ls = list(items()) and, with some care, avoid the dependence on the EndOfDataError exception.

Both combine naturally: the individual item function can be called inside a for loop on the iterator, provided there is still at least one item not yet read. That call will advance the items; so, the next item at the loop will be the current one after the local advances. Briefly: both advance the same iterator.

All items provided are of type str and will not contain white space; casting into int or float or whatever, if convenient, falls upon the caller.

Example

Based on Jutge problem P29448 Correct Dates (and removing spoilers):

from pytokr import pytokr
item, items = pytokr(f, iter = True)
for d in items():
    m, y = item(), item()
    if correct_date(int(d), int(m), int(y)):
        print("Correct Date")
    else:
        print("Incorrect Date")

Deprecated usage of versions 0.0.*

These versions were employed in a different manner. Version 0.1.0 can still be employed in the same way for some backwards compatibility, but will print a deprecation message to stderr. This old usage was:

from pytokr import item, items

(or only one of them as convenient). Then item() will provide the next item in stdin and for w in items() will iterate on whatever remains there. Calling item() at end of file will raise an exception EndOfDataError.

Old, deprecated usage on other string-based iterables

Again, this still works on 0.1.0 but will print a deprecation message on stderr:

from pytokr import make_tokr

Then, if g is an iterable of strings such as an open file or a list of strings, the call

items, item = make_tokr(g)

will provide adapted versions of item and items that will read them in from g instead of from stdin.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytokr-0.1.0.tar.gz (6.0 kB view hashes)

Uploaded Source

Built Distribution

pytokr-0.1.0-py3-none-any.whl (6.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page