This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

tokenizertools

Text file tokenizers that support multiple start states and lexical tracking implemented as standard Python iterators.

Classes provided:

  • Tokenizer – Base class.
  • RegexTokenizer – Python re-based tokenizer.
  • TokenizeAhead – A look-ahead iterator that can wrap any tokenizer.

Overview

The class RegexTokenizer implements a tokenizer using the re module to recognize tokens in the input stream. Tokens and actions are defined by rules. The tokenizer calls user action functions associated with each rule. In most cases, the user action function can simply be a @classmethod constructor of a user-provided token class.

Each rule is specified as a tuple. The first element of the tuple is a regular expression that will be compiled by re and used to match a token. The second element of the tuple is a user-provided callable that will be passed the recognized text, along with the current lexical position.

In this example, the user class Token implements constructors as @classmethod functions, and these serve as the callables in each lexical rule.

Rules are specified in the class variable spec, which is a list of rules.

import tokenizertools as tt
class MyTokenizer(tt.RegexTokenizer):
    spec = [
        (r'[a-zA-Z][a-zA-Z0-9_]*',Token.type_ident), # idents and keywords
        (r'[0-9]+\.[0-9]+',Token.type_float), # floats
        (r'[0-9]+', Token.type_int), # ints
        (r'\s*',None), # ignore white space
    ]

Nothing else needs to be defined. All methods are inherited. Instantiate a lexer and commence parsing. The specification rules are compiled and cached on creation of the first instance.:

tokenizer = MyTokenizer()
with open('foo.bar') as f:
    token_stream = Lookahead(tokenizer.lex(f, f.name))
    compiled_stuff = my_parser.parse(token_stream)

tokenizertools news

Release history.

15-Sept-2016

1.0 Time to put some mileage on it.

Clean ups, PEP8-ification. Improved exception handling for bad begin() state.

16-June-2014

0.1a Initial alpha.

Release History

Release History

1.0

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1a

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
tokenizertools-1.0.tar.gz (10.9 kB) Copy SHA256 Checksum SHA256 Source Sep 16, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting