Python bindings to the Tree-sitter parsing library

These details have not been verified by PyPI

Project links

Project description

Python Tree-sitter

This module provides Python bindings to the tree-sitter parsing library.

Installation

The package has no library dependencies and provides pre-compiled wheels for all major platforms.

[!NOTE] If your platform is not currently supported, please submit an issue on GitHub.

pip install tree-sitter

Usage

Setup

Install languages

Tree-sitter language implementations also provide pre-compiled binary wheels. Let's take Python as an example.

pip install tree-sitter-python

Then, you can load it as a Language object:

import tree_sitter_python as tspython
from tree_sitter import Language, Parser

PY_LANGUAGE = Language(tspython.language())

Basic parsing

Create a Parser and configure it to use a language:

parser = Parser(PY_LANGUAGE)

Parse some source code:

tree = parser.parse(
    bytes(
        """
def foo():
    if bar:
        baz()
""",
        "utf8"
    )
)

If you have your source code in some data structure other than a bytes object, you can pass a "read" callable to the parse function.

The read callable can use either the byte offset or point tuple to read from buffer and return source code as bytes object. An empty bytes object or None terminates parsing for that line. The bytes must be encoded as UTF-8 or UTF-16.

For example, to use the byte offset with UTF-8 encoding:

src = bytes(
    """
def foo():
    if bar:
        baz()
""",
    "utf8",
)


def read_callable_byte_offset(byte_offset, point):
    return src[byte_offset : byte_offset + 1]


tree = parser.parse(read_callable_byte_offset, encoding="utf8")

And to use the point:

src_lines = ["\n", "def foo():\n", "    if bar:\n", "        baz()\n"]


def read_callable_point(byte_offset, point):
    row, column = point
    if row >= len(src_lines) or column >= len(src_lines[row]):
        return None
    return src_lines[row][column:].encode("utf8")


tree = parser.parse(read_callable_point, encoding="utf8")

Inspect the resulting Tree:

root_node = tree.root_node
assert root_node.type == 'module'
assert root_node.start_point == (1, 0)
assert root_node.end_point == (4, 0)

function_node = root_node.children[0]
assert function_node.type == 'function_definition'
assert function_node.child_by_field_name('name').type == 'identifier'

function_name_node = function_node.children[1]
assert function_name_node.type == 'identifier'
assert function_name_node.start_point == (1, 4)
assert function_name_node.end_point == (1, 7)

function_body_node = function_node.child_by_field_name("body")

if_statement_node = function_body_node.child(0)
assert if_statement_node.type == "if_statement"

function_call_node = if_statement_node.child_by_field_name("consequence").child(0).child(0)
assert function_call_node.type == "call"

function_call_name_node = function_call_node.child_by_field_name("function")
assert function_call_name_node.type == "identifier"

function_call_args_node = function_call_node.child_by_field_name("arguments")
assert function_call_args_node.type == "argument_list"


assert str(root_node) == (
    "(module "
        "(function_definition "
            "name: (identifier) "
            "parameters: (parameters) "
            "body: (block "
                "(if_statement "
                    "condition: (identifier) "
                    "consequence: (block "
                        "(expression_statement (call "
                            "function: (identifier) "
                            "arguments: (argument_list))))))))"
)

Or, to use the byte offset with UTF-16 encoding:

parser.language = JAVASCRIPT
source_code = bytes("'😎' && '🐍'", "utf16")

def read(byte_position, _):
    return source_code[byte_position: byte_position + 2]

tree = parser.parse(read, encoding="utf16")
root_node = tree.root_node
statement_node = root_node.children[0]
binary_node = statement_node.children[0]
snake_node = binary_node.children[2]
snake = source_code[snake_node.start_byte:snake_node.end_byte]

assert binary_node.type == "binary_expression"
assert snake_node.type == "string"
assert snake.decode("utf16") == "'🐍'"

Walking syntax trees

If you need to traverse a large number of nodes efficiently, you can use a TreeCursor:

cursor = tree.walk()

assert cursor.node.type == "module"

assert cursor.goto_first_child()
assert cursor.node.type == "function_definition"

assert cursor.goto_first_child()
assert cursor.node.type == "def"

# Returns `False` because the `def` node has no children
assert not cursor.goto_first_child()

assert cursor.goto_next_sibling()
assert cursor.node.type == "identifier"

assert cursor.goto_next_sibling()
assert cursor.node.type == "parameters"

assert cursor.goto_parent()
assert cursor.node.type == "function_definition"

[!IMPORTANT] Keep in mind that the cursor can only walk into children of the node that it started from.

See examples/walk_tree.py for a complete example of iterating over every node in a tree.

Editing

When a source file is edited, you can edit the syntax tree to keep it in sync with the source:

new_src = src[:5] + src[5 : 5 + 2].upper() + src[5 + 2 :]

tree.edit(
    start_byte=5,
    old_end_byte=5,
    new_end_byte=5 + 2,
    start_point=(0, 5),
    old_end_point=(0, 5),
    new_end_point=(0, 5 + 2),
)

Then, when you're ready to incorporate the changes into a new syntax tree, you can call Parser.parse again, but pass in the old tree:

new_tree = parser.parse(new_src, tree)

This will run much faster than if you were parsing from scratch.

The Tree.changed_ranges method can be called on the old tree to return the list of ranges whose syntactic structure has been changed:

for changed_range in tree.changed_ranges(new_tree):
    print("Changed range:")
    print(f"  Start point {changed_range.start_point}")
    print(f"  Start byte {changed_range.start_byte}")
    print(f"  End point {changed_range.end_point}")
    print(f"  End byte {changed_range.end_byte}")

Pattern-matching

You can search for patterns in a syntax tree using a tree query:

query = PY_LANGUAGE.query(
    """
(function_definition
  name: (identifier) @function.def
  body: (block) @function.block)

(call
  function: (identifier) @function.call
  arguments: (argument_list) @function.args)
"""
)

Captures

captures = query.captures(tree.root_node)
assert len(captures) == 4
assert captures["function.def"][0] == function_name_node
assert captures["function.block"][0] == function_body_node
assert captures["function.call"][0] == function_call_name_node
assert captures["function.args"][0] == function_call_args_node

Matches

matches = query.matches(tree.root_node)
assert len(matches) == 2

# first match
assert matches[0][1]["function.def"] == [function_name_node]
assert matches[0][1]["function.block"] == [function_body_node]

# second match
assert matches[1][1]["function.call"] == [function_call_name_node]
assert matches[1][1]["function.args"] == [function_call_args_node]

The difference between the two methods is that Query.matches() groups captures into matches, which is much more useful when your captures within a query relate to each other.

To try out and explore the code referenced in this README, check out examples/usage.py.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.24.0

Feb 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dom_tree_sitter-0.24.0-cp311-cp311-macosx_10_16_universal2.whl (135.1 kB view details)

Uploaded Feb 15, 2025 CPython 3.11macOS 10.16+ universal2 (ARM64, x86-64)

File details

Details for the file dom_tree_sitter-0.24.0-cp311-cp311-macosx_10_16_universal2.whl.

File metadata

Download URL: dom_tree_sitter-0.24.0-cp311-cp311-macosx_10_16_universal2.whl
Upload date: Feb 15, 2025
Size: 135.1 kB
Tags: CPython 3.11, macOS 10.16+ universal2 (ARM64, x86-64)
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for dom_tree_sitter-0.24.0-cp311-cp311-macosx_10_16_universal2.whl
Algorithm	Hash digest
SHA256	`cc96e473378b8b0c6ca0160d32dffc9ef49bec0e6faf4ca03afa13105fe29b9f`
MD5	`b64cf4d6e3214f2f87854cdc4d12e65a`
BLAKE2b-256	`c03bdac68c4f705f1666dd46d8c9a35d21383ab754441f6748551fae7cd329e7`

See more details on using hashes here.

dom-tree-sitter 0.24.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Python Tree-sitter

Installation

Usage

Setup

Install languages

Basic parsing

Walking syntax trees

Editing

Pattern-matching

Captures

Matches

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes