Skip to main content

TriVector Code Intelligence - Multi-view code relationship model with advanced semantic embeddings

Project description

TriCoder Code Intelligence

image PyPI - Python Version

Build Status Downloads

TriCoder learns high-quality symbol-level embeddings from codebases using three complementary views:

  1. Graph View: Structural relationships via PPMI and SVD
  2. Context View: Semantic context via Node2Vec random walks and Word2Vec
  3. Typed View: Type information via type-token co-occurrence (optional)

Features

  • Subtoken Semantic Graph: Captures fine-grained semantic relationships through subtoken analysis
  • File & Module Hierarchy: Leverages file/directory structure for better clustering
  • Static Call-Graph Expansion: Propagates call relationships to depth 2-3
  • Type Semantic Expansion: Expands composite types into constructors and primitives
  • Context Window Co-occurrence: Captures lexical context within ±5 lines
  • Improved Negative Sampling: Biased sampling for better temperature calibration
  • Hybrid Similarity Scoring: Length-penalized cosine similarity
  • Iterative Embedding Smoothing: Diffusion-based smoothing for better clustering
  • Query-Time Semantic Expansion: Expands queries with subtokens and types

Installation

Using Poetry (Recommended)

poetry install

Using pip

pip install .

Usage

1. Extract Symbols from Codebase

tricoder-extract --input-dir /path/to/codebase --output-nodes nodes.jsonl --output-edges edges.jsonl --output-types types.jsonl

2. Train Model

tricoder-train --nodes nodes.jsonl --edges edges.jsonl --types types.jsonl --out model_output

3. Query Model

# Single query
tricoder-query --model-dir model_output --symbol sym_0001 --top-k 10

# Interactive mode
tricoder-query --model-dir model_output --interactive

Advanced Options

Training Options

  • --graph-dim: Graph view dimensionality (default: auto)
  • --context-dim: Context view dimensionality (default: auto)
  • --typed-dim: Typed view dimensionality (default: auto)
  • --final-dim: Final fused embedding dimensionality (default: auto)
  • --num-walks: Number of random walks per node (default: 10)
  • --walk-length: Length of each random walk (default: 80)
  • --train-ratio: Fraction of edges for training (default: 0.8)
  • --random-state: Random seed for reproducibility (default: 42)

Extraction Options

  • --include-dirs: Include only specific subdirectories
  • --exclude-dirs: Exclude specific directories
  • --no-gitignore: Disable .gitignore filtering

Requirements

  • Python 3.8+
  • numpy >= 1.21.0
  • scipy >= 1.7.0
  • scikit-learn >= 1.0.0
  • gensim >= 4.0.0
  • annoy >= 1.17.0
  • click >= 8.0.0
  • rich >= 13.0.0

License

TriCoder is available under a Non-Commercial License.

  • Free for non-commercial use: Personal projects, education, research, open-source
  • Commercial license required: Paid products, SaaS, commercial consulting, enterprise use

For commercial licensing inquiries, please contact: j.f.otoupal@gmail.com

See LICENSE for full terms and LICENSE_COMMERCIAL.md for commercial license information.


Did I made your life less painfull ?

Support my coffee addiction ;)
Buy me a Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tricoder-1.1.6.tar.gz (40.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tricoder-1.1.6-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file tricoder-1.1.6.tar.gz.

File metadata

  • Download URL: tricoder-1.1.6.tar.gz
  • Upload date:
  • Size: 40.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.6.tar.gz
Algorithm Hash digest
SHA256 6a11f1e71c95adea76320dca71f17a58da7cb044ebb1af4874496ea60bbc6d49
MD5 d15a0770a934b11cce5b3147187b9464
BLAKE2b-256 0e36c01578d67d22e1d44439b78a847f1801b55a23b5c7cc159db5a50131188e

See more details on using hashes here.

File details

Details for the file tricoder-1.1.6-py3-none-any.whl.

File metadata

  • Download URL: tricoder-1.1.6-py3-none-any.whl
  • Upload date:
  • Size: 46.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 34b000a9ed465fb13d555b485ff7485c42087a8e0a61fa9b5a1976f6c3e4bedf
MD5 4d4e6f1f189bd26ac26dad4c9bcdcff6
BLAKE2b-256 438fb3211278e9581f6126ead487dcf75c95e452bdd4c6020706cca9d90b9f27

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page