Skip to main content

Python bindings to the Tree-sitter parsing library. Temporary repo to get support for tree-sitter langange version 15

Project description

Python Tree-sitter

[!IMPORTANT] This clone of the original repository py-tree-sitter is only intended to release a version that supports tree-sitter language version 15. The support for it is already provided on the main branch but the maintainer is not reponsive to push a new version. When a new version is released to Pypi this repository will be deleted again. See the relevant discussion

CI pypi docs

This module provides Python bindings to the tree-sitter parsing library.

Installation

The package has no library dependencies and provides pre-compiled wheels for all major platforms.

[!NOTE] If your platform is not currently supported, please submit an issue on GitHub.

pip install tree-sitter

Usage

Setup

Install languages

Tree-sitter language implementations also provide pre-compiled binary wheels. Let's take Python as an example.

pip install tree-sitter-python

Then, you can load it as a Language object:

import tree_sitter_python as tspython
from tree_sitter import Language, Parser

PY_LANGUAGE = Language(tspython.language())

Basic parsing

Create a Parser and configure it to use a language:

parser = Parser(PY_LANGUAGE)

Parse some source code:

tree = parser.parse(
    bytes(
        """
def foo():
    if bar:
        baz()
""",
        "utf8"
    )
)

If you have your source code in some data structure other than a bytes object, you can pass a "read" callable to the parse function.

The read callable can use either the byte offset or point tuple to read from buffer and return source code as bytes object. An empty bytes object or None terminates parsing for that line. The bytes must be encoded as UTF-8 or UTF-16.

For example, to use the byte offset with UTF-8 encoding:

src = bytes(
    """
def foo():
    if bar:
        baz()
""",
    "utf8",
)


def read_callable_byte_offset(byte_offset, point):
    return src[byte_offset : byte_offset + 1]


tree = parser.parse(read_callable_byte_offset, encoding="utf8")

And to use the point:

src_lines = ["\n", "def foo():\n", "    if bar:\n", "        baz()\n"]


def read_callable_point(byte_offset, point):
    row, column = point
    if row >= len(src_lines) or column >= len(src_lines[row]):
        return None
    return src_lines[row][column:].encode("utf8")


tree = parser.parse(read_callable_point, encoding="utf8")

Inspect the resulting Tree:

root_node = tree.root_node
assert root_node.type == 'module'
assert root_node.start_point == (1, 0)
assert root_node.end_point == (4, 0)

function_node = root_node.children[0]
assert function_node.type == 'function_definition'
assert function_node.child_by_field_name('name').type == 'identifier'

function_name_node = function_node.children[1]
assert function_name_node.type == 'identifier'
assert function_name_node.start_point == (1, 4)
assert function_name_node.end_point == (1, 7)

function_body_node = function_node.child_by_field_name("body")

if_statement_node = function_body_node.child(0)
assert if_statement_node.type == "if_statement"

function_call_node = if_statement_node.child_by_field_name("consequence").child(0).child(0)
assert function_call_node.type == "call"

function_call_name_node = function_call_node.child_by_field_name("function")
assert function_call_name_node.type == "identifier"

function_call_args_node = function_call_node.child_by_field_name("arguments")
assert function_call_args_node.type == "argument_list"


assert str(root_node) == (
    "(module "
        "(function_definition "
            "name: (identifier) "
            "parameters: (parameters) "
            "body: (block "
                "(if_statement "
                    "condition: (identifier) "
                    "consequence: (block "
                        "(expression_statement (call "
                            "function: (identifier) "
                            "arguments: (argument_list))))))))"
)

Or, to use the byte offset with UTF-16 encoding:

parser.language = JAVASCRIPT
source_code = bytes("'😎' && '🐍'", "utf16")

def read(byte_position, _):
    return source_code[byte_position: byte_position + 2]

tree = parser.parse(read, encoding="utf16")
root_node = tree.root_node
statement_node = root_node.children[0]
binary_node = statement_node.children[0]
snake_node = binary_node.children[2]
snake = source_code[snake_node.start_byte:snake_node.end_byte]

assert binary_node.type == "binary_expression"
assert snake_node.type == "string"
assert snake.decode("utf16") == "'🐍'"

Walking syntax trees

If you need to traverse a large number of nodes efficiently, you can use a TreeCursor:

cursor = tree.walk()

assert cursor.node.type == "module"

assert cursor.goto_first_child()
assert cursor.node.type == "function_definition"

assert cursor.goto_first_child()
assert cursor.node.type == "def"

# Returns `False` because the `def` node has no children
assert not cursor.goto_first_child()

assert cursor.goto_next_sibling()
assert cursor.node.type == "identifier"

assert cursor.goto_next_sibling()
assert cursor.node.type == "parameters"

assert cursor.goto_parent()
assert cursor.node.type == "function_definition"

[!IMPORTANT] Keep in mind that the cursor can only walk into children of the node that it started from.

See examples/walk_tree.py for a complete example of iterating over every node in a tree.

Editing

When a source file is edited, you can edit the syntax tree to keep it in sync with the source:

new_src = src[:5] + src[5 : 5 + 2].upper() + src[5 + 2 :]

tree.edit(
    start_byte=5,
    old_end_byte=5,
    new_end_byte=5 + 2,
    start_point=(0, 5),
    old_end_point=(0, 5),
    new_end_point=(0, 5 + 2),
)

Then, when you're ready to incorporate the changes into a new syntax tree, you can call Parser.parse again, but pass in the old tree:

new_tree = parser.parse(new_src, tree)

This will run much faster than if you were parsing from scratch.

The Tree.changed_ranges method can be called on the old tree to return the list of ranges whose syntactic structure has been changed:

for changed_range in tree.changed_ranges(new_tree):
    print("Changed range:")
    print(f"  Start point {changed_range.start_point}")
    print(f"  Start byte {changed_range.start_byte}")
    print(f"  End point {changed_range.end_point}")
    print(f"  End byte {changed_range.end_byte}")

Pattern-matching

You can search for patterns in a syntax tree using a tree query:

query = PY_LANGUAGE.query(
    """
(function_definition
  name: (identifier) @function.def
  body: (block) @function.block)

(call
  function: (identifier) @function.call
  arguments: (argument_list) @function.args)
"""
)

Captures

captures = query.captures(tree.root_node)
assert len(captures) == 4
assert captures["function.def"][0] == function_name_node
assert captures["function.block"][0] == function_body_node
assert captures["function.call"][0] == function_call_name_node
assert captures["function.args"][0] == function_call_args_node

Matches

matches = query.matches(tree.root_node)
assert len(matches) == 2

# first match
assert matches[0][1]["function.def"] == [function_name_node]
assert matches[0][1]["function.block"] == [function_body_node]

# second match
assert matches[1][1]["function.call"] == [function_call_name_node]
assert matches[1][1]["function.args"] == [function_call_args_node]

The difference between the two methods is that Query.matches() groups captures into matches, which is much more useful when your captures within a query relate to each other.

To try out and explore the code referenced in this README, check out examples/usage.py.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree_sitter_language_version_15-0.25.0.tar.gz (187.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tree_sitter_language_version_15-0.25.0-cp313-cp313-win_amd64.whl (126.9 kB view details)

Uploaded CPython 3.13Windows x86-64

tree_sitter_language_version_15-0.25.0-cp313-cp313-macosx_14_0_arm64.whl (139.8 kB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

tree_sitter_language_version_15-0.25.0-cp313-cp313-macosx_10_13_universal2.whl (277.7 kB view details)

Uploaded CPython 3.13macOS 10.13+ universal2 (ARM64, x86-64)

tree_sitter_language_version_15-0.25.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (577.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

tree_sitter_language_version_15-0.25.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (563.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

File details

Details for the file tree_sitter_language_version_15-0.25.0.tar.gz.

File metadata

File hashes

Hashes for tree_sitter_language_version_15-0.25.0.tar.gz
Algorithm Hash digest
SHA256 be8e42bcc4b986d498d122453111c2c73d85fc446f309429e3609dd3ad96834a
MD5 544548bd08558664d745f9ad0c3cf9ff
BLAKE2b-256 bc5ff014dcefb102e285bd2b21d4b7191508b9dfa35f1189dfebaa0fe3e68dee

See more details on using hashes here.

File details

Details for the file tree_sitter_language_version_15-0.25.0-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for tree_sitter_language_version_15-0.25.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 91f327711bd8e8ada61baea127c0d8a352c553806330d3525a7ffbe32f99b797
MD5 7296e833fa11e5c8f68ee52016f1f005
BLAKE2b-256 9a6d4b104fd6ab0a82747bcee1306144408abad61592f08b7199c8cdfaa68d2c

See more details on using hashes here.

File details

Details for the file tree_sitter_language_version_15-0.25.0-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for tree_sitter_language_version_15-0.25.0-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 aa2d67b69a86ec841964ca7194a9cd3cbb1c2bae706598ca228af184b9601c8b
MD5 4537182905303cddfe8db4ae90c52ccf
BLAKE2b-256 d8cd2458c0cbb690eb8fcc2c8f24b9a602a809b652b91bbf32912cdc70ba978a

See more details on using hashes here.

File details

Details for the file tree_sitter_language_version_15-0.25.0-cp313-cp313-macosx_10_13_universal2.whl.

File metadata

File hashes

Hashes for tree_sitter_language_version_15-0.25.0-cp313-cp313-macosx_10_13_universal2.whl
Algorithm Hash digest
SHA256 f0b4f683c10b3deab113a322e294153825be76163547d28dcef3464082d1f45c
MD5 8299836c1b1619570b4c7f843608db3e
BLAKE2b-256 303bc9fd35bb3e6c2f46eabd8fb6f2c65250de9c83ba4f6d6104f03d09a2d4fa

See more details on using hashes here.

File details

Details for the file tree_sitter_language_version_15-0.25.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for tree_sitter_language_version_15-0.25.0-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 5bdcbc7714b282c1b287675a1be881b2e3429e6431af86ea8a64b5a3777a46ec
MD5 4c21d7c388f91b2b09340e9d725dc451
BLAKE2b-256 6194d7e2e69eb9d0f2619ab061f681433df816fe0e53e63eed01caac28010039

See more details on using hashes here.

File details

Details for the file tree_sitter_language_version_15-0.25.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for tree_sitter_language_version_15-0.25.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 9927abecaa2da126a20cb79008fc853dbbebb341d2147955924109638e652b06
MD5 418a1c86db227a998077a0756b435885
BLAKE2b-256 a398f1c3ed504207ab421920fb565bbafe35002c4403eeb660800f0d87f16db0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page