Skip to main content

A tool to process codebases, generate embeddings for code chunks, and query code snippets using natural language models like CodeBERT.

Project description

xTrAct-NLP: A Code Query and Embedding Toolkit

xTrAct-NLP is a toolkit designed to process codebases, generate embeddings from code chunks, and retrieve relevant snippets using natural language queries. It uses state-of-the-art models to create meaningful embeddings and facilitates sophisticated query expansion and ranking mechanisms. This project is especially useful for developers looking to integrate NLP into code search engines.

Features

  • Code Parsing: Supports code parsing using AST to extract functions and classes as code chunks.
  • Embedding Generation: Generates embeddings from code chunks using HuggingFace models (e.g., CodeBERT, T5).
  • Query Expansion: Automatically expands natural language queries with relevant technical terms using language models.
  • Reranking: Supports BM25 and cosine similarity-based ranking for more relevant code retrieval.
  • Visualization: Supports both scatter plots (for PCA and t-SNE) and heatmaps to visually analyze and compare code embeddings.

Installation

pip install xtract-nlp

For development:

git clone https://github.com/ooojustin/xTrAct-NLP.git
cd xTrAct-NLP
pip install -e .

Usage

CLI Usage

  1. Process Codebase:

    xtract process <path_to_codebase>
    
  2. Generate Embeddings:

    xtract generate
    
  3. Query the Codebase:

    xtract query "parse python code using ast"
    

Python Library Usage

from xtract.core import process_code, generate_embeddings, query_code

# Process codebase
num_chunks = process_code("/path/to/codebase")

# Generate embeddings
num_embeddings = generate_embeddings()

# Query codebase
results = query_code("parse python code using ast")

License

This project is licensed under the MIT License. See the LICENSE for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xtract_nlp-0.1.2.tar.gz (82.2 kB view details)

Uploaded Source

Built Distribution

xtract_nlp-0.1.2-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file xtract_nlp-0.1.2.tar.gz.

File metadata

  • Download URL: xtract_nlp-0.1.2.tar.gz
  • Upload date:
  • Size: 82.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for xtract_nlp-0.1.2.tar.gz
Algorithm Hash digest
SHA256 93485b5dee549a58ee84c3a67b20782476cb40df5f9aba7f21c85e673ffbd658
MD5 b693c65f5d499083e1f2affdd99b989c
BLAKE2b-256 25cec3dc7f79a4f53d02b4d1b86e5b5440645a62eb8fc3de9237f4a9fe667cd5

See more details on using hashes here.

File details

Details for the file xtract_nlp-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: xtract_nlp-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for xtract_nlp-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d96bb182f503a0c61e7a3ea098aa9b24e2738e4a7cdadcae2df73a5aefa9da7e
MD5 ba8081ad0c7ef281ee38f701f78593ad
BLAKE2b-256 047433dd43b3c7215b3df354d58b60716405a5a2c987fba6224f9582d85263eb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page