Skip to main content

ASTAligner is designed to align tokens from source code snippets to Abstract Syntax Tree (AST) nodes using Tree-sitter for AST generation and various HuggingFace tokenizers for language tokenization. The library supports a wide range of programming languages and Fast tokenizers, enabling precise mapping between source code elements and their AST representations.

Project description

AST-Alignment Tool

Aligns the tokens from a code snippet to their corresponding nodes in an AST representation.

Description

A Large Language Model (LLM) is a type of AI model designed to understand and generate human-like text based on vast amounts of data. Trained on diverse source code datasets, LLMs can automate Software Engineering tasks across various contexts, such as code translation, code summarization, test-case generation, and code completion. A critical component of LLMs is the tokenizer, which breaks down text into smaller units, typically words or subwords, that the model can process. The tokenizer's role is essential because it converts source code into a format the model can understand, ensuring efficient and accurate code processing and generation. In the context of Interpretability for AI, post-hoc techniques such as ASTScore, rely on alienation functions (phi) to match the tokens generated by an LLM’s tokenizer with their corresponding nodes in the AST representation of a snippet.

Goals

This project has two goals:

(1) Create a library for aligning the tokens from a code snippet to their corresponding nodes in the AST representation

(2) Create a tool to visualize the alignment of the tokens with their matching AST.

Additional Information

For more information regarding this project's background and dependencies, please refer to these readings:

(1) Evaluating and Explaining Large Language Models for Code Using Syntactic Structures

(2) Tree-Sitter Programming Language Parser

(3) Hugging Face Tokenizer

Installation

Use the package manager pip to install all backend dependencies needed for the AST-Alignment Tool. All required packages for the backend can be downloaded using requirements.txt, which can found in the base repository.

pip install -r /path/to/requirements.txt

Supported Features

Library Usage

EXPLAIN HOW TO USE THE PYTHON LIBRARY

Contributing

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

astaligner-0.1.0.tar.gz (8.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ASTAligner-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file astaligner-0.1.0.tar.gz.

File metadata

  • Download URL: astaligner-0.1.0.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for astaligner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1c172a17c908f23b1a3e52e0ad312c99c6e50f546eb8d2c289be401d21393ccb
MD5 3c6ce79497f9e2523f72438e8a065808
BLAKE2b-256 24c6e6060f9a9c0560ce45620a19c25d5528d374af6a7b8d810a61a6de6f0e74

See more details on using hashes here.

File details

Details for the file ASTAligner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ASTAligner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for ASTAligner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca9ba1ed6524a8444aab206ea1f2913fc0d4369e4efb67f7f1bc6ff71bb4a702
MD5 0de24ca03b0c0f4a601a791a97358e97
BLAKE2b-256 e02a17880146d91129dc79211aa7fc5531f4e655254b32d89b516da240b8f3ce

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page