No project description provided

These details have not been verified by PyPI

Project links

Repository

Project description

Code Similarity Engine

Code Abstraction Opportunities Emerge Naturally

Code that performs similar functions can be abstracted, but traditional tools are not robust against syntactical differences, and therefore miss many abstraciton opportunities. But how is our system more robust? We present a simple CLI tool that performs semantic embedding, comparison, and then k-clustering to find regions of code showing high semantic similarity. This method finds code regions with similar meaning, and using reranking and setting minimums for similarity, this system can find regions of code that can be abstracted reliably and systematically. However, this only shows you where you can make abstractions, but not how. Using Qwen3 0.6B locally, we can go even further to provide a step-by-step guide to abstracting these similarities to reduce code duplication and provide clearer design and syntax throughout your project!

ALWAYS Private, NO TELEMERY!

We explicitly chose to use offline models for everything, and thus this program should never phone home or make any downloads after the initial download of the LLM, Embedder, and Reranking models.

Installation

pip install code-similarity-engine

After installation, download the required models (~1.4 GB):

cse --download-models

Or use the dedicated command:

cse-download-models

Usage

# Basic analysis
cse ./src

# With verbose output
cse ./src -v

# Focus on specific files
cse ./src --focus "*.swift" --focus "*.py"

# Higher precision threshold
cse ./src --threshold 0.90

# With LLM analysis (explains similarities)
cse ./src --analyze

# With reranking (improves cluster quality)
cse ./src --rerank

# Output as JSON
cse ./src -o json > report.json

# Output as Markdown
cse ./src -o markdown > report.md

# Check model status
cse --model-status

Options

cse <path> [options]

Core Options:
  -t, --threshold FLOAT      Similarity threshold 0.0-1.0 (default: 0.80)
  -m, --min-cluster INT      Minimum chunks per cluster (default: 2)
  -o, --output FORMAT        text | markdown | json (default: text)
  -v, --verbose              Show progress for all stages

Filtering:
  -f, --focus PATTERN        Only analyze matching paths (repeatable)
  -e, --exclude PATTERN      Glob patterns to exclude (repeatable)
  -l, --lang LANG            Force language detection

LLM Analysis:
  --analyze / --no-analyze   Use LLM to explain clusters
  --llm-model PATH           Path to LLM GGUF model (auto-detected)
  --max-analyze INT          Max clusters to analyze (default: 20)

Reranking:
  --rerank / --no-rerank     Use reranker to improve cluster quality
  --rerank-model PATH        Path to reranker GGUF model (auto-detected)
  --rerank-threshold FLOAT   Minimum score to keep (default: 0.5)

Model Management:
  --download-models          Download all required models and exit
  --model-status             Show status of all models and exit
  --clear-cache              Clear cached embeddings

Language Support

Language	Parser
Python	tree-sitter
Swift	tree-sitter
Rust	tree-sitter
JavaScript/TypeScript	tree-sitter
Go	tree-sitter
Other	sliding window

Models

CSE uses three Qwen3 GGUF models (~1.4 GB total):

Model	Size	Purpose
Qwen3-Embedding-0.6B	610 MB	Code embeddings
Qwen3-0.6B	378 MB	LLM analysis
Qwen3-Reranker-0.6B	378 MB	Cluster quality

Models are downloaded to ~/.cache/cse/models/ on first use or via --download-models.

License

GPL-3.0-or-later

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.3.1

Dec 11, 2025

0.3.0

Dec 11, 2025

0.2.8

Dec 10, 2025

0.2.7

Dec 10, 2025

0.2.6

Dec 10, 2025

0.2.5

Dec 10, 2025

0.2.4

Dec 10, 2025

0.2.2

Dec 10, 2025

0.2.1

Dec 10, 2025

0.2.0

Dec 10, 2025

0.1.2

Dec 9, 2025

0.1.1

Dec 9, 2025

This version

0.1.0

Dec 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_similarity_engine-0.1.0.tar.gz (42.9 kB view details)

Uploaded Dec 9, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

code_similarity_engine-0.1.0-py3-none-any.whl (51.9 kB view details)

Uploaded Dec 9, 2025 Python 3

File details

Details for the file code_similarity_engine-0.1.0.tar.gz.

File metadata

Download URL: code_similarity_engine-0.1.0.tar.gz
Upload date: Dec 9, 2025
Size: 42.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for code_similarity_engine-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`70bfc1696268c6912ba55606f43577a56436ac93badd344d99215df3f4c49864`
MD5	`1125f63e6547c5d0ba90c3fdd6a95bad`
BLAKE2b-256	`d239cdce05005c2d62321a032eb999d793c61ed5e95930fb121c1cf312a30251`

See more details on using hashes here.

File details

Details for the file code_similarity_engine-0.1.0-py3-none-any.whl.

File metadata

Download URL: code_similarity_engine-0.1.0-py3-none-any.whl
Upload date: Dec 9, 2025
Size: 51.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for code_similarity_engine-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b4fe244203d2057d25ca12b429d0d63ad079a7c7b98160386935adf12e46c2fe`
MD5	`ce15abf066fb8203b8f1888735bf51a7`
BLAKE2b-256	`723d8bfc0b45d48cb09a9d7432ef613ad16da59581f57a8771b8c377f2759d04`

See more details on using hashes here.

code-similarity-engine 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Code Similarity Engine

Code Abstraction Opportunities Emerge Naturally

ALWAYS Private, NO TELEMERY!

Installation

Usage

Options

Language Support

Models

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes