Skip to main content

TriVector Code Intelligence - Multi-view code relationship model with advanced semantic embeddings

Project description

TriCoder Code Intelligence

image PyPI - Python Version

Build Status Downloads

TriCoder learns high-quality symbol-level embeddings from codebases using three complementary views:

  1. Graph View: Structural relationships via PPMI and SVD
  2. Context View: Semantic context via Node2Vec random walks and Word2Vec
  3. Typed View: Type information via type-token co-occurrence (optional)

Features

  • Subtoken Semantic Graph: Captures fine-grained semantic relationships through subtoken analysis
  • File & Module Hierarchy: Leverages file/directory structure for better clustering
  • Static Call-Graph Expansion: Propagates call relationships to depth 2-3
  • Type Semantic Expansion: Expands composite types into constructors and primitives
  • Context Window Co-occurrence: Captures lexical context within ±5 lines
  • Improved Negative Sampling: Biased sampling for better temperature calibration
  • Hybrid Similarity Scoring: Length-penalized cosine similarity
  • Iterative Embedding Smoothing: Diffusion-based smoothing for better clustering
  • Query-Time Semantic Expansion: Expands queries with subtokens and types

Installation

Using Poetry (Recommended)

poetry install

Using pip

pip install .

Usage

1. Extract Symbols from Codebase

tricoder-extract --input-dir /path/to/codebase --output-nodes nodes.jsonl --output-edges edges.jsonl --output-types types.jsonl

2. Train Model

tricoder-train --nodes nodes.jsonl --edges edges.jsonl --types types.jsonl --out model_output

3. Query Model

# Single query
tricoder-query --model-dir model_output --symbol sym_0001 --top-k 10

# Interactive mode
tricoder-query --model-dir model_output --interactive

Advanced Options

Training Options

  • --graph-dim: Graph view dimensionality (default: auto)
  • --context-dim: Context view dimensionality (default: auto)
  • --typed-dim: Typed view dimensionality (default: auto)
  • --final-dim: Final fused embedding dimensionality (default: auto)
  • --num-walks: Number of random walks per node (default: 10)
  • --walk-length: Length of each random walk (default: 80)
  • --train-ratio: Fraction of edges for training (default: 0.8)
  • --random-state: Random seed for reproducibility (default: 42)

Extraction Options

  • --include-dirs: Include only specific subdirectories
  • --exclude-dirs: Exclude specific directories
  • --no-gitignore: Disable .gitignore filtering

Requirements

  • Python 3.8+
  • numpy >= 1.21.0
  • scipy >= 1.7.0
  • scikit-learn >= 1.0.0
  • gensim >= 4.0.0
  • annoy >= 1.17.0
  • click >= 8.0.0
  • rich >= 13.0.0

License

TriCoder is available under a Non-Commercial License.

  • Free for non-commercial use: Personal projects, education, research, open-source
  • Commercial license required: Paid products, SaaS, commercial consulting, enterprise use

For commercial licensing inquiries, please contact: j.f.otoupal@gmail.com

See LICENSE for full terms and LICENSE_COMMERCIAL.md for commercial license information.


Did I made your life less painfull ?

Support my coffee addiction ;)
Buy me a Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tricoder-1.1.2.tar.gz (35.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tricoder-1.1.2-py3-none-any.whl (41.7 kB view details)

Uploaded Python 3

File details

Details for the file tricoder-1.1.2.tar.gz.

File metadata

  • Download URL: tricoder-1.1.2.tar.gz
  • Upload date:
  • Size: 35.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.2.tar.gz
Algorithm Hash digest
SHA256 0b0dbb5cc1a4f985af80ddd2bb0a8f04de336819066869251faf0101af1001ff
MD5 40483467f52783e3a61eeb5c4ec48e5d
BLAKE2b-256 24885771de2a5681514203988ce45fe6ed8d080ca252627007d3676c6175945c

See more details on using hashes here.

File details

Details for the file tricoder-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: tricoder-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 41.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e3eb0119459ef88f7c3cafda7ff25c7dad281fc1c88185944920a6089b5aed32
MD5 527296d0c7f9ebfe1f83c41008b5e460
BLAKE2b-256 eb685a7d9fca7b527925cc14e33c2bbf75ea4e6e0fc83621a9b8f7e8c29ee5a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page