A wrapper around the stdlib `tokenize` which roundtrips.
Project description
tokenize-rt
The stdlib tokenize module does not properly roundtrip. This wrapper
around the stdlib provides two additional tokens ESCAPED_NL and
UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and
tokens_to_src to roundtrip.
This library is useful if you're writing a refactoring tool based on the python tokenization.
Installation
pip install tokenize-rt
Usage
datastructures
tokenize_rt.Offset(line=None, utf8_byte_offset=None)
A token offset, useful as a key when cross referencing the ast and the
tokenized source.
tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)
Construct a token
name: one of the token names listed intoken.tok_nameorESCAPED_NLorUNIMPORTANT_WSsrc: token's source as textline: the line number that this token appears on.utf8_byte_offset: the utf8 byte offset that this token appears on in the line.
tokenize_rt.Token.offset
Retrieves an Offset for this token.
converting to and from Token representations
tokenize_rt.src_to_tokens(text: str) -> List[Token]
tokenize_rt.tokens_to_src(Iterable[Token]) -> str
additional tokens added by tokenize-rt
tokenize_rt.ESCAPED_NL
tokenize_rt.UNIMPORTANT_WS
helpers
tokenize_rt.NON_CODING_TOKENS
A frozenset containing tokens which may appear between others while not
affecting control flow or code:
COMMENTESCAPED_NLNLUNIMPORTANT_WS
tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]
parse a string literal into its prefix and string content
>>> parse_string_literal('f"foo"')
('f', '"foo"')
tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]
yields (index, token) pairs. Useful for rewriting source.
tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]
find the indices of the string parts of a (joined) string literal
ishould start at the end of the string literal- returns
()(an empty tuple) for things which are not string literals
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)
Differences from tokenize
tokenize-rtaddsESCAPED_NLfor a backslash-escaped newline "token"tokenize-rtaddsUNIMPORTANT_WSfor whitespace (discarded intokenize)tokenize-rtnormalizes string prefixes, even if they are not parsed -- for instance, this means you'll seeToken('STRING', "f'foo'", ...)even in python 2.tokenize-rtnormalizes python 2 long literals (4l/4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).
Sample usage
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tokenize_rt-6.2.0.tar.gz.
File metadata
- Download URL: tokenize_rt-6.2.0.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8439c042b330c553fdbe1758e4a05c0ed460dbbbb24a606f11f0dee75da4cad6
|
|
| MD5 |
d167a19eafccb2a783a17e25fa81626d
|
|
| BLAKE2b-256 |
69ed8f07e893132d5051d86a553e749d5c89b2a4776eb3a579b72ed61f8559ca
|
File details
Details for the file tokenize_rt-6.2.0-py2.py3-none-any.whl.
File metadata
- Download URL: tokenize_rt-6.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 6.0 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a152bf4f249c847a66497a4a95f63376ed68ac6abf092a2f7cfb29d044ecff44
|
|
| MD5 |
ac66eebcbb33851fb90f48549ac2eb04
|
|
| BLAKE2b-256 |
33f03fe8c6e69135a845f4106f2ff8b6805638d4e85c264e70114e8126689587
|