A wrapper around the stdlib `tokenize` which roundtrips.

These details have not been verified by PyPI

Project links

Homepage

Project description

tokenize-rt

The stdlib tokenize module does not properly roundtrip. This wrapper around the stdlib provides two additional tokens ESCAPED_NL and UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and tokens_to_src to roundtrip.

This library is useful if you're writing a refactoring tool based on the python tokenization.

Installation

pip install tokenize-rt

Usage

datastructures

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

A token offset, useful as a key when cross referencing the ast and the tokenized source.

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

Construct a token

name: one of the token names listed in token.tok_name or ESCAPED_NL or UNIMPORTANT_WS
src: token's source as text
line: the line number that this token appears on.
utf8_byte_offset: the utf8 byte offset that this token appears on in the line.

`tokenize_rt.Token.offset`

Retrieves an Offset for this token.

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

helpers

`tokenize_rt.NON_CODING_TOKENS`

A frozenset containing tokens which may appear between others while not affecting control flow or code:

COMMENT
ESCAPED_NL
NL
UNIMPORTANT_WS

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

parse a string literal into its prefix and string content

>>> parse_string_literal('f"foo"')
('f', '"foo"')

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

yields (index, token) pairs. Useful for rewriting source.

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

find the indices of the string parts of a (joined) string literal

i should start at the end of the string literal
returns () (an empty tuple) for things which are not string literals

>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)

Differences from `tokenize`

tokenize-rt adds ESCAPED_NL for a backslash-escaped newline "token"
tokenize-rt adds UNIMPORTANT_WS for whitespace (discarded in tokenize)
tokenize-rt normalizes string prefixes, even if they are not parsed -- for instance, this means you'll see Token('STRING', "f'foo'", ...) even in python 2.
tokenize-rt normalizes python 2 long literals (4l / 4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).

Sample usage

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

6.2.0

May 23, 2025

6.1.0

Oct 22, 2024

6.0.0

Aug 4, 2024

5.2.0

Jul 30, 2023

5.1.0

Jun 10, 2023

5.0.0

Oct 3, 2022

4.2.1

Oct 21, 2021

4.2.0 yanked

Oct 21, 2021

Reason this release was yanked:

bug with multiline strings

4.1.0

Jan 26, 2021

4.0.0

Feb 28, 2020

3.2.0

Jul 7, 2019

3.1.0

Jul 5, 2019

3.0.1

Jun 16, 2019

3.0.0

Jun 16, 2019

2.2.0

Feb 28, 2019

2.1.0

Oct 7, 2018

2.0.1

Jul 26, 2017

2.0.0

Jul 14, 2017

1.0.0

Jun 2, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenize_rt-6.2.0.tar.gz (5.5 kB view details)

Uploaded May 23, 2025 Source

Built Distribution

tokenize_rt-6.2.0-py2.py3-none-any.whl (6.0 kB view details)

Uploaded May 23, 2025 Python 2Python 3

File details

Details for the file tokenize_rt-6.2.0.tar.gz.

File metadata

Download URL: tokenize_rt-6.2.0.tar.gz
Upload date: May 23, 2025
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tokenize_rt-6.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8439c042b330c553fdbe1758e4a05c0ed460dbbbb24a606f11f0dee75da4cad6`
MD5	`d167a19eafccb2a783a17e25fa81626d`
BLAKE2b-256	`69ed8f07e893132d5051d86a553e749d5c89b2a4776eb3a579b72ed61f8559ca`

See more details on using hashes here.

File details

Details for the file tokenize_rt-6.2.0-py2.py3-none-any.whl.

File metadata

Download URL: tokenize_rt-6.2.0-py2.py3-none-any.whl
Upload date: May 23, 2025
Size: 6.0 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tokenize_rt-6.2.0-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`a152bf4f249c847a66497a4a95f63376ed68ac6abf092a2f7cfb29d044ecff44`
MD5	`ac66eebcbb33851fb90f48549ac2eb04`
BLAKE2b-256	`33f03fe8c6e69135a845f4106f2ff8b6805638d4e85c264e70114e8126689587`

See more details on using hashes here.

tokenize-rt 6.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tokenize-rt

Installation

Usage

datastructures

tokenize_rt.Offset(line=None, utf8_byte_offset=None)

tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)

tokenize_rt.Token.offset

converting to and from Token representations

tokenize_rt.src_to_tokens(text: str) -> List[Token]

tokenize_rt.tokens_to_src(Iterable[Token]) -> str

additional tokens added by tokenize-rt

tokenize_rt.ESCAPED_NL

tokenize_rt.UNIMPORTANT_WS

helpers

tokenize_rt.NON_CODING_TOKENS

tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]

tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]

tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]

Differences from tokenize

Sample usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

`tokenize_rt.Token.offset`

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

`tokenize_rt.NON_CODING_TOKENS`

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

Differences from `tokenize`