Skip to main content

A wrapper around the stdlib `tokenize` which roundtrips.

Project description

build status pre-commit.ci status

tokenize-rt

The stdlib tokenize module does not properly roundtrip. This wrapper around the stdlib provides two additional tokens ESCAPED_NL and UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and tokens_to_src to roundtrip.

This library is useful if you're writing a refactoring tool based on the python tokenization.

Installation

pip install tokenize-rt

Usage

datastructures

tokenize_rt.Offset(line=None, utf8_byte_offset=None)

A token offset, useful as a key when cross referencing the ast and the tokenized source.

tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)

Construct a token

  • name: one of the token names listed in token.tok_name or ESCAPED_NL or UNIMPORTANT_WS
  • src: token's source as text
  • line: the line number that this token appears on.
  • utf8_byte_offset: the utf8 byte offset that this token appears on in the line.

tokenize_rt.Token.offset

Retrieves an Offset for this token.

converting to and from Token representations

tokenize_rt.src_to_tokens(text: str) -> List[Token]

tokenize_rt.tokens_to_src(Iterable[Token]) -> str

additional tokens added by tokenize-rt

tokenize_rt.ESCAPED_NL

tokenize_rt.UNIMPORTANT_WS

helpers

tokenize_rt.NON_CODING_TOKENS

A frozenset containing tokens which may appear between others while not affecting control flow or code:

  • COMMENT
  • ESCAPED_NL
  • NL
  • UNIMPORTANT_WS

tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]

parse a string literal into its prefix and string content

>>> parse_string_literal('f"foo"')
('f', '"foo"')

tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]

yields (index, token) pairs. Useful for rewriting source.

tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]

find the indices of the string parts of a (joined) string literal

  • i should start at the end of the string literal
  • returns () (an empty tuple) for things which are not string literals
>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)

Differences from tokenize

  • tokenize-rt adds ESCAPED_NL for a backslash-escaped newline "token"
  • tokenize-rt adds UNIMPORTANT_WS for whitespace (discarded in tokenize)
  • tokenize-rt normalizes string prefixes, even if they are not parsed -- for instance, this means you'll see Token('STRING', "f'foo'", ...) even in python 2.
  • tokenize-rt normalizes python 2 long literals (4l / 4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).

Sample usage

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenize_rt-6.2.0.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

tokenize_rt-6.2.0-py2.py3-none-any.whl (6.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file tokenize_rt-6.2.0.tar.gz.

File metadata

  • Download URL: tokenize_rt-6.2.0.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tokenize_rt-6.2.0.tar.gz
Algorithm Hash digest
SHA256 8439c042b330c553fdbe1758e4a05c0ed460dbbbb24a606f11f0dee75da4cad6
MD5 d167a19eafccb2a783a17e25fa81626d
BLAKE2b-256 69ed8f07e893132d5051d86a553e749d5c89b2a4776eb3a579b72ed61f8559ca

See more details on using hashes here.

File details

Details for the file tokenize_rt-6.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: tokenize_rt-6.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 6.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for tokenize_rt-6.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a152bf4f249c847a66497a4a95f63376ed68ac6abf092a2f7cfb29d044ecff44
MD5 ac66eebcbb33851fb90f48549ac2eb04
BLAKE2b-256 33f03fe8c6e69135a845f4106f2ff8b6805638d4e85c264e70114e8126689587

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page