A wrapper around the stdlib `tokenize` which roundtrips.
Project description
tokenize-rt
The stdlib tokenize
module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens ESCAPED_NL
and
UNIMPORTANT_WS
, and a Token
data type. Use src_to_tokens
and
tokens_to_src
to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
Installation
pip install tokenize-rt
Usage
datastructures
tokenize_rt.Offset(line=None, utf8_byte_offset=None)
A token offset, useful as a key when cross referencing the ast
and the
tokenized source.
tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)
Construct a token
name
: one of the token names listed intoken.tok_name
orESCAPED_NL
orUNIMPORTANT_WS
src
: token's source as textline
: the line number that this token appears on. This will beNone
forESCAPED_NL
andUNIMPORTANT_WS
tokens.utf8_byte_offset
: the utf8 byte offset that this token appears on in the line. This will beNone
forESCAPED_NL
andUNIMPORTANT_WS
tokens.
tokenize_rt.Token.offset
Retrieves an Offset
for this token.
converting to and from Token
representations
tokenize_rt.src_to_tokens(text) -> List[Token]
tokenize_rt.tokens_to_src(Sequence[Token]) -> text
additional tokens added by tokenize-rt
tokenize_rt.ESCAPED_NL
tokenize_rt.UNIMPORTANT_WS
helpers
tokenize_rt.NON_CODING_TOKENS
A frozenset
containing tokens which may appear between others while not
affecting control flow or code:
COMMENT
ESCAPED_NL
NL
UNIMPORTANT_WS
tokenize_rt.parse_string_literal(text) -> Tuple[str, str]
parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]
yields (index, token)
pairs. Useful for rewriting source.
Differences from tokenize
tokenize-rt
addsESCAPED_NL
for a backslash-escaped newline "token"tokenzie-rt
addsUNIMPORTANT_WS
for whitespace (discarded intokenize
)tokenize-rt
normalizes string prefixes, even if they are not parsed -- for instance, this means you'll seeToken('STRING', "f'foo'", ...)
even in python 2.tokenize-rt
normalizesDEDENT
tokens so they appear before the indentation instead of after
Sample usage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for tokenize_rt-3.0.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ad773efe3325971ebb06231879166d63d0f64acbc224a78c91e3f5b5ec34c0a |
|
MD5 | e3fa038dcd76dacafe0da9331a76d095 |
|
BLAKE2b-256 | 95e63d487dc522a442f88e5730538a3b9a380d04a331c305251e44f9e5ddabd5 |