Skip to main content

TriVector Code Intelligence - Multi-view code relationship model with advanced semantic embeddings

Project description

TriCoder Code Intelligence

image PyPI - Python Version

Build Status Downloads

TriCoder learns high-quality symbol-level embeddings from codebases using three complementary views:

  1. Graph View: Structural relationships via PPMI and SVD
  2. Context View: Semantic context via Node2Vec random walks and Word2Vec
  3. Typed View: Type information via type-token co-occurrence (optional)

Features

  • Subtoken Semantic Graph: Captures fine-grained semantic relationships through subtoken analysis
  • File & Module Hierarchy: Leverages file/directory structure for better clustering
  • Static Call-Graph Expansion: Propagates call relationships to depth 2-3
  • Type Semantic Expansion: Expands composite types into constructors and primitives
  • Context Window Co-occurrence: Captures lexical context within ±5 lines
  • Improved Negative Sampling: Biased sampling for better temperature calibration
  • Hybrid Similarity Scoring: Length-penalized cosine similarity
  • Iterative Embedding Smoothing: Diffusion-based smoothing for better clustering
  • Query-Time Semantic Expansion: Expands queries with subtokens and types

Installation

Using Poetry (Recommended)

poetry install

Using pip

pip install .

Usage

1. Extract Symbols from Codebase

tricoder-extract --input-dir /path/to/codebase --output-nodes nodes.jsonl --output-edges edges.jsonl --output-types types.jsonl

2. Train Model

tricoder-train --nodes nodes.jsonl --edges edges.jsonl --types types.jsonl --out model_output

3. Query Model

# Single query
tricoder-query --model-dir model_output --symbol sym_0001 --top-k 10

# Interactive mode
tricoder-query --model-dir model_output --interactive

Advanced Options

Training Options

  • --graph-dim: Graph view dimensionality (default: auto)
  • --context-dim: Context view dimensionality (default: auto)
  • --typed-dim: Typed view dimensionality (default: auto)
  • --final-dim: Final fused embedding dimensionality (default: auto)
  • --num-walks: Number of random walks per node (default: 10)
  • --walk-length: Length of each random walk (default: 80)
  • --train-ratio: Fraction of edges for training (default: 0.8)
  • --random-state: Random seed for reproducibility (default: 42)

Extraction Options

  • --include-dirs: Include only specific subdirectories
  • --exclude-dirs: Exclude specific directories
  • --no-gitignore: Disable .gitignore filtering

Requirements

  • Python 3.8+
  • numpy >= 1.21.0
  • scipy >= 1.7.0
  • scikit-learn >= 1.0.0
  • gensim >= 4.0.0
  • annoy >= 1.17.0
  • click >= 8.0.0
  • rich >= 13.0.0

License

TriCoder is available under a Non-Commercial License.

  • Free for non-commercial use: Personal projects, education, research, open-source
  • Commercial license required: Paid products, SaaS, commercial consulting, enterprise use

For commercial licensing inquiries, please contact: j.f.otoupal@gmail.com

See LICENSE for full terms and LICENSE_COMMERCIAL.md for commercial license information.


Did I made your life less painfull ?

Support my coffee addiction ;)
Buy me a Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tricoder-1.2.2.tar.gz (51.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tricoder-1.2.2-py3-none-any.whl (57.6 kB view details)

Uploaded Python 3

File details

Details for the file tricoder-1.2.2.tar.gz.

File metadata

  • Download URL: tricoder-1.2.2.tar.gz
  • Upload date:
  • Size: 51.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.2.2.tar.gz
Algorithm Hash digest
SHA256 789b735db8e413c80a06789b099317eb14dde4c91bea4973af8dcdcb28fe7347
MD5 48e8d7be5cd2c576fadba1b8c013e256
BLAKE2b-256 b554d99ad47d940fed4c591cafbb72d7c2684926ff64130dbc80beb3c0aae114

See more details on using hashes here.

File details

Details for the file tricoder-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: tricoder-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 57.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2321c91e8aff12e460b915e62cde80d3edc36b64a32d50a6fe03871a53e30639
MD5 ec24f232ab17e87d5ce3dccead0aef64
BLAKE2b-256 9233cc13b3cae6d8d24c6de801fe1042ad61412c4a8b76b1d43f8cb3dc56bb9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page