Skip to main content

TriVector Code Intelligence - Multi-view code relationship model with advanced semantic embeddings

Project description

TriCoder Code Intelligence

image PyPI - Python Version

Build Status Downloads

TriCoder learns high-quality symbol-level embeddings from codebases using three complementary views:

  1. Graph View: Structural relationships via PPMI and SVD
  2. Context View: Semantic context via Node2Vec random walks and Word2Vec
  3. Typed View: Type information via type-token co-occurrence (optional)

Features

  • Subtoken Semantic Graph: Captures fine-grained semantic relationships through subtoken analysis
  • File & Module Hierarchy: Leverages file/directory structure for better clustering
  • Static Call-Graph Expansion: Propagates call relationships to depth 2-3
  • Type Semantic Expansion: Expands composite types into constructors and primitives
  • Context Window Co-occurrence: Captures lexical context within ±5 lines
  • Improved Negative Sampling: Biased sampling for better temperature calibration
  • Hybrid Similarity Scoring: Length-penalized cosine similarity
  • Iterative Embedding Smoothing: Diffusion-based smoothing for better clustering
  • Query-Time Semantic Expansion: Expands queries with subtokens and types

Installation

Using Poetry (Recommended)

poetry install

Using pip

pip install .

Usage

1. Extract Symbols from Codebase

tricoder-extract --input-dir /path/to/codebase --output-nodes nodes.jsonl --output-edges edges.jsonl --output-types types.jsonl

2. Train Model

tricoder-train --nodes nodes.jsonl --edges edges.jsonl --types types.jsonl --out model_output

3. Query Model

# Single query
tricoder-query --model-dir model_output --symbol sym_0001 --top-k 10

# Interactive mode
tricoder-query --model-dir model_output --interactive

Advanced Options

Training Options

  • --graph-dim: Graph view dimensionality (default: auto)
  • --context-dim: Context view dimensionality (default: auto)
  • --typed-dim: Typed view dimensionality (default: auto)
  • --final-dim: Final fused embedding dimensionality (default: auto)
  • --num-walks: Number of random walks per node (default: 10)
  • --walk-length: Length of each random walk (default: 80)
  • --train-ratio: Fraction of edges for training (default: 0.8)
  • --random-state: Random seed for reproducibility (default: 42)

Extraction Options

  • --include-dirs: Include only specific subdirectories
  • --exclude-dirs: Exclude specific directories
  • --no-gitignore: Disable .gitignore filtering

Requirements

  • Python 3.8+
  • numpy >= 1.21.0
  • scipy >= 1.7.0
  • scikit-learn >= 1.0.0
  • gensim >= 4.0.0
  • annoy >= 1.17.0
  • click >= 8.0.0
  • rich >= 13.0.0

License

TriCoder is available under a Non-Commercial License.

  • Free for non-commercial use: Personal projects, education, research, open-source
  • Commercial license required: Paid products, SaaS, commercial consulting, enterprise use

For commercial licensing inquiries, please contact: j.f.otoupal@gmail.com

See LICENSE for full terms and LICENSE_COMMERCIAL.md for commercial license information.


Did I made your life less painfull ?

Support my coffee addiction ;)
Buy me a Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tricoder-1.1.0.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tricoder-1.1.0-py3-none-any.whl (41.5 kB view details)

Uploaded Python 3

File details

Details for the file tricoder-1.1.0.tar.gz.

File metadata

  • Download URL: tricoder-1.1.0.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.0.tar.gz
Algorithm Hash digest
SHA256 d9b24e13611876529a521e5acf2132283884329c7493a4368308a67b0d667e9d
MD5 d0a3bc7e14228bcafd487c74819f4dd3
BLAKE2b-256 d61637c362a0a62174e27e227abcc2fedcf3435405f3e2a0e5985526b08204a4

See more details on using hashes here.

File details

Details for the file tricoder-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: tricoder-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 41.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tricoder-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b99375bdebc7495114aebb6672f74580d84b48383ad41c3e4bb4f4de1b7947f2
MD5 44402a5c19d0922e037efd514b1d03ee
BLAKE2b-256 9596685b8cf8996db742d8ab8743a89a4e3521ac4e10b490b33bf1280ba20602

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page