Skip to main content

TriVector Code Intelligence - Multi-view code relationship model with advanced semantic embeddings

Project description

TriCoder Code Intelligence

image PyPI - Python Version

Build Status Downloads

TriCoder learns high-quality symbol-level embeddings from codebases using three complementary views:

  1. Graph View: Structural relationships via PPMI and SVD
  2. Context View: Semantic context via Node2Vec random walks and Word2Vec
  3. Typed View: Type information via type-token co-occurrence (optional)

Features

  • Subtoken Semantic Graph: Captures fine-grained semantic relationships through subtoken analysis
  • File & Module Hierarchy: Leverages file/directory structure for better clustering
  • Static Call-Graph Expansion: Propagates call relationships to depth 2-3
  • Type Semantic Expansion: Expands composite types into constructors and primitives
  • Context Window Co-occurrence: Captures lexical context within ±5 lines
  • Improved Negative Sampling: Biased sampling for better temperature calibration
  • Hybrid Similarity Scoring: Length-penalized cosine similarity
  • Iterative Embedding Smoothing: Diffusion-based smoothing for better clustering
  • Query-Time Semantic Expansion: Expands queries with subtokens and types

Installation

Using Poetry (Recommended)

poetry install

Using pip

pip install .

Usage

1. Extract Symbols from Codebase

tricoder-extract --input-dir /path/to/codebase --output-nodes nodes.jsonl --output-edges edges.jsonl --output-types types.jsonl

2. Train Model

tricoder-train --nodes nodes.jsonl --edges edges.jsonl --types types.jsonl --out model_output

3. Query Model

# Single query
tricoder-query --model-dir model_output --symbol sym_0001 --top-k 10

# Interactive mode
tricoder-query --model-dir model_output --interactive

Advanced Options

Training Options

  • --graph-dim: Graph view dimensionality (default: auto)
  • --context-dim: Context view dimensionality (default: auto)
  • --typed-dim: Typed view dimensionality (default: auto)
  • --final-dim: Final fused embedding dimensionality (default: auto)
  • --num-walks: Number of random walks per node (default: 10)
  • --walk-length: Length of each random walk (default: 80)
  • --train-ratio: Fraction of edges for training (default: 0.8)
  • --random-state: Random seed for reproducibility (default: 42)

Extraction Options

  • --include-dirs: Include only specific subdirectories
  • --exclude-dirs: Exclude specific directories
  • --no-gitignore: Disable .gitignore filtering

Requirements

  • Python 3.8+
  • numpy >= 1.21.0
  • scipy >= 1.7.0
  • scikit-learn >= 1.0.0
  • gensim >= 4.0.0
  • annoy >= 1.17.0
  • click >= 8.0.0
  • rich >= 13.0.0

License

TriCoder is available under a Non-Commercial License.

  • Free for non-commercial use: Personal projects, education, research, open-source
  • Commercial license required: Paid products, SaaS, commercial consulting, enterprise use

For commercial licensing inquiries, please contact: j.f.otoupal@gmail.com

See LICENSE for full terms and LICENSE_COMMERCIAL.md for commercial license information.


Did I made your life less painfull ?

Support my coffee addiction ;)
Buy me a Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tricoder-1.1.9.tar.gz (42.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tricoder-1.1.9-py3-none-any.whl (48.9 kB view details)

Uploaded Python 3

File details

Details for the file tricoder-1.1.9.tar.gz.

File metadata

  • Download URL: tricoder-1.1.9.tar.gz
  • Upload date:
  • Size: 42.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.9.tar.gz
Algorithm Hash digest
SHA256 fad57bc0f780a924b0097fd3492b49babdea7c52d2bb2646acfbf8ff0f66d151
MD5 8eb29063e578c7426a84b0926f838107
BLAKE2b-256 786bd5c52019eaf98ef86f17060596b4b593be683c342e07cff88d59d02edcd6

See more details on using hashes here.

File details

Details for the file tricoder-1.1.9-py3-none-any.whl.

File metadata

  • Download URL: tricoder-1.1.9-py3-none-any.whl
  • Upload date:
  • Size: 48.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f96d18c96c80b9734d12ae4e269bc9e1d1e0909f0fac07db19f1b3aa8121ba57
MD5 737bc6b186fe14df135fe69cc60a7405
BLAKE2b-256 a1fea82fd0f1cc43f3dc326d9e4dc17b431fa3048459ce5a98723413fe6bd462

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page