A wrapper around the stdlib `tokenize` which roundtrips.
Project description
tokenize-rt
The stdlib tokenize
module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens ESCAPED_NL
and
UNIMPORTANT_WS
, and a Token
data type. Use src_to_tokens
and
tokens_to_src
to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
Installation
pip install tokenize-rt
Usage
datastructures
tokenize_rt.Offset(line=None, utf8_byte_offset=None)
A token offset, useful as a key when cross referencing the ast
and the
tokenized source.
tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)
Construct a token
name
: one of the token names listed intoken.tok_name
orESCAPED_NL
orUNIMPORTANT_WS
src
: token's source as textline
: the line number that this token appears on.utf8_byte_offset
: the utf8 byte offset that this token appears on in the line.
tokenize_rt.Token.offset
Retrieves an Offset
for this token.
converting to and from Token
representations
tokenize_rt.src_to_tokens(text: str) -> List[Token]
tokenize_rt.tokens_to_src(Iterable[Token]) -> str
additional tokens added by tokenize-rt
tokenize_rt.ESCAPED_NL
tokenize_rt.UNIMPORTANT_WS
helpers
tokenize_rt.NON_CODING_TOKENS
A frozenset
containing tokens which may appear between others while not
affecting control flow or code:
COMMENT
ESCAPED_NL
NL
UNIMPORTANT_WS
tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]
parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]
yields (index, token)
pairs. Useful for rewriting source.
tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]
find the indices of the string parts of a (joined) string literal
i
should start at the end of the string literal- returns
()
(an empty tuple) for things which are not string literals
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
Differences from tokenize
tokenize-rt
addsESCAPED_NL
for a backslash-escaped newline "token"tokenize-rt
addsUNIMPORTANT_WS
for whitespace (discarded intokenize
)tokenize-rt
normalizes string prefixes, even if they are not parsed -- for instance, this means you'll seeToken('STRING', "f'foo'", ...)
even in python 2.tokenize-rt
normalizes python 2 long literals (4l
/4L
) and octal literals (0755
) in python 3 (for easier rewriting of python 2 code while running python 3).
Sample usage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tokenize_rt-6.1.0.tar.gz
.
File metadata
- Download URL: tokenize_rt-6.1.0.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e8ee836616c0877ab7c7b54776d2fefcc3bde714449a206762425ae114b53c86 |
|
MD5 | 48bdf2b8db11ee253ea3943a3e750a73 |
|
BLAKE2b-256 | 6b0a5854d8ced8c1e00193d1353d13db82d7f813f99bd5dcb776ce3e2a4c0d19 |
File details
Details for the file tokenize_rt-6.1.0-py2.py3-none-any.whl
.
File metadata
- Download URL: tokenize_rt-6.1.0-py2.py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d706141cdec4aa5f358945abe36b911b8cbdc844545da99e811250c0cee9b6fc |
|
MD5 | b5aaf30ed9873884c66151995f3cd12c |
|
BLAKE2b-256 | 87ba576aac29b10dfa49a6ce650001d1bb31f81e734660555eaf144bfe5b8995 |