A tool to process codebases, generate embeddings for code chunks, and query code snippets using natural language models like CodeBERT.
Project description
xTrAct-NLP: A Code Query and Embedding Toolkit
xTrAct-NLP is a toolkit designed to process codebases, generate embeddings from code chunks, and retrieve relevant snippets using natural language queries. It uses state-of-the-art models to create meaningful embeddings and facilitates sophisticated query expansion and ranking mechanisms. This project is especially useful for developers looking to integrate NLP into code search engines.
Features
- Code Parsing: Supports code parsing using AST to extract functions and classes as code chunks.
- Embedding Generation: Generates embeddings from code chunks using HuggingFace models (e.g., CodeBERT, T5).
- Query Expansion: Automatically expands natural language queries with relevant technical terms using language models.
- Reranking: Supports BM25 and cosine similarity-based ranking for more relevant code retrieval.
- Visualization: Supports both scatter plots (for PCA and t-SNE) and heatmaps to visually analyze and compare code embeddings.
Installation
pip install xtract-nlp
For development:
git clone https://github.com/ooojustin/xTrAct-NLP.git
cd xTrAct-NLP
pip install -e .
Usage
CLI Usage
-
Process Codebase:
xtract process <path_to_codebase>
-
Generate Embeddings:
xtract generate
-
Query the Codebase:
xtract query "parse python code using ast"
Python Library Usage
from xtract.core import process_code, generate_embeddings, query_code
# Process codebase
num_chunks = process_code("/path/to/codebase")
# Generate embeddings
num_embeddings = generate_embeddings()
# Query codebase
results = query_code("parse python code using ast")
License
This project is licensed under the MIT License. See the LICENSE for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file xtract_nlp-0.1.2.tar.gz
.
File metadata
- Download URL: xtract_nlp-0.1.2.tar.gz
- Upload date:
- Size: 82.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 93485b5dee549a58ee84c3a67b20782476cb40df5f9aba7f21c85e673ffbd658 |
|
MD5 | b693c65f5d499083e1f2affdd99b989c |
|
BLAKE2b-256 | 25cec3dc7f79a4f53d02b4d1b86e5b5440645a62eb8fc3de9237f4a9fe667cd5 |
File details
Details for the file xtract_nlp-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: xtract_nlp-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d96bb182f503a0c61e7a3ea098aa9b24e2738e4a7cdadcae2df73a5aefa9da7e |
|
MD5 | ba8081ad0c7ef281ee38f701f78593ad |
|
BLAKE2b-256 | 047433dd43b3c7215b3df354d58b60716405a5a2c987fba6224f9582d85263eb |