Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

simple tokenization and parsing library

Project Description
`nr.strex` String Processing Library

Strex is a simple library to tokenize and parse small or even
complex languages. It has been developed specifically for the
recursive decent parser technique, although it might work well
with other parsing techniques as well.


- `Scanner` class for very simple tasks
- `Lexer` class for tokenizing text input
- Control tokenization via `Rule`s
- Contextual tokenization


To describe a set of rules to tokenize a text, you simply have to
create a list. You can re-use that list with multiple lexers.

rules = [
nr.strex.Keyword('bopen', '{'),
nr.strex.Keyword('bclose', '}'),
nr.strex.Keyword('semicolon', ';'),
nr.strex.Keyword('group', 'GROUP'),
nr.strex.Keyword('menu', 'MENU'),
nr.strex.Charset('ident', string.ascii_letters + string.digits + '_'),
nr.strex.Charset('ws', string.whitespace, skip=True),

Now you can use the lexer to tokenize the stream. Iterating over the
lexer is like getting the next token until the end of the input (that
will yield a token with type`'eof'`). In a context sensitive parser
architecture, you can pass arguments to `` or use the
`Lexer.accept()` method to prioritize or expect a specific token type.

text = '''
} '''
scanner = nr.strex.Scanner(text)
lexer = nr.strex.Lexer(scanner, rules)
for token in lexer:


Token(type='group', cursor=Cursor(index=1, lineno=2, colno=0), value='GROUP')
Token(type='bopen', cursor=Cursor(index=7, lineno=2, colno=6), value='{')
Token(type='menu', cursor=Cursor(index=11, lineno=3, colno=2), value='MENU')
Token(type='ident', cursor=Cursor(index=16, lineno=3, colno=7), value='FOOBAR')
Token(type='semicolon', cursor=Cursor(index=22, lineno=3, colno=13), value=';')
Token(type='menu', cursor=Cursor(index=26, lineno=4, colno=2), value='MENU')
Token(type='ident', cursor=Cursor(index=31, lineno=4, colno=7), value='SPAM')
Token(type='semicolon', cursor=Cursor(index=35, lineno=4, colno=11), value=';')
Token(type='group', cursor=Cursor(index=39, lineno=5, colno=2), value='GROUP')
Token(type='bopen', cursor=Cursor(index=45, lineno=5, colno=8), value='{')
Token(type='menu', cursor=Cursor(index=47, lineno=5, colno=10), value='MENU')
Token(type='ident', cursor=Cursor(index=52, lineno=5, colno=15), value='HAM')
Token(type='semicolon', cursor=Cursor(index=55, lineno=5, colno=18), value=';')
Token(type='bclose', cursor=Cursor(index=57, lineno=5, colno=20), value='}')
Token(type='bclose', cursor=Cursor(index=59, lineno=6, colno=0), value='}')

## Changelog


- Added `Lexer.accept(*names)` method and `*expectation)`
argument for specifying a specific and prioritized rule (or multiple
rules) that can be accepted or are required in the current context
- Added `Lexer.rules_map` member (read only) and `Lexer.update_map()`
function which are important for the speed of the lexer
- Added `UnexpectedTokenError` class and proper string representation
for the `TokenizationError` class


Copyright (C) 2015 Niklas Rosenstein.
Release History

Release History

This version
History Node


History Node


History Node


History Node


Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
nr.strex-1.4.1.tar.gz (8.1 kB) Copy SHA256 Checksum SHA256 Source Jan 30, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting