Skip to main content

A tool for analyzing Python code

Project description

biston

A structural clone detector for Python code. Written in Rust.

It parses Python files with tree-sitter, normalizes the AST, and finds functions that are structurally similar to each other.

Install

uv add biston

Or build from source:

cargo build --release

Usage

biston <COMMAND>

Commands

biston scan

Scan a directory for code clones.

Usage: biston scan [OPTIONS] [PATH]

Arguments:
  [PATH]  Directory to scan [default: .]

Options:
      --format <FORMAT>        Output format [possible values: text, json, sarif]
      --min-lines <MIN_LINES>  Minimum function length in lines
      --threshold <THRESHOLD>  Similarity threshold (0.0 - 1.0)
      --config <CONFIG>        Config file directory (looks for biston.toml or pyproject.toml)
      --tests-only             Restrict the scan to Python test files (overrides include/exclude)
      --suggest                Generate abstraction suggestions for similar pairs
      --files <FILE>           Only emit pairs involving this file (repeat for multiple)
      --files-from <PATH>      Read focus file list from PATH, or `-` for stdin
  -h, --help                   Print help

biston stats

Show statistics about scan findings.

Usage: biston stats [OPTIONS] [PATH]

Arguments:
  [PATH]  Directory to scan [default: .]

Options:
      --format <FORMAT>        Output format [possible values: text, json, sarif]
      --min-lines <MIN_LINES>  Minimum function length in lines
      --threshold <THRESHOLD>  Similarity threshold (0.0 - 1.0)
      --config <CONFIG>        Config file directory (looks for biston.toml or pyproject.toml)
      --tests-only             Restrict the scan to Python test files (overrides include/exclude)
      --files <FILE>           Only emit pairs involving this file (repeat for multiple)
      --files-from <PATH>      Read focus file list from PATH, or `-` for stdin
  -h, --help                   Print help
Scanning tests only

Test suites often accumulate duplication (near-identical cases that could be @pytest.mark.parametrize, copy-pasted arrange/act/assert blocks). By default biston excludes test files so production-code findings stay focused. Pass --tests-only to flip the scope and scan only test files:

biston scan --tests-only
biston stats --tests-only

The flag replaces include with common Python test patterns (**/test_*.py, **/*_test.py, **/conftest.py, tests/**/*.py) and clears exclude. Other knobs (min_lines, threshold, normalization) are left untouched — tune them separately in biston.toml if you want different defaults for a test run.

Commit-hook use (focus files)

--files / --files-from let you restrict reporting to pairs involving a specific set of files, while still scanning the whole repo so cross-file clones between those files and the rest of the tree are detected.

For a pre-commit hook, pipe git diff --name-only through --files-from -:

git diff --name-only --diff-filter=ACM -- '*.py' \
  | biston scan --files-from - .

An empty list (no Python files changed) correctly emits no pairs. Prefer --files-from over --files $(git diff --name-only) — the latter expands to an empty flag when nothing changed, which reverts to a full-repo scan.

Configuration

Settings can go in biston.toml or under [tool.biston] in pyproject.toml. If both files exist, biston.toml takes priority. CLI flags override config file settings.

[scan]

Setting Default Description
min_lines 10 Minimum function length in lines
threshold 0.7 Similarity threshold (0.0–1.0)
exclude ["tests/**", "**/conftest.py", "migrations/**"] File patterns to exclude
include ["**/*.py"] File patterns to include

[normalization]

Setting Default Description
anonymize_locals true Replace local variable names
anonymize_literals false Replace literal values
strip_decorators true Remove decorators from AST
strip_type_annotations true Remove type hints
sort_commutative false Sort commutative operations

[output]

Setting Default Description
format "text" Output format (text, json, or sarif)
group_overlapping true Group overlapping clones
max_results 50 Maximum number of results
show_source true Display source code in output
context_lines 3 Number of context lines around clones

[suggest]

Setting Default Description
enabled false Enable suggestion generation
min_quality 0.6 Minimum template coverage score (0.0–1.0)
max_holes 5 Maximum holes before suppressing
render_python true Render templates as Python source

[suppress]

Setting Default Description
files [] File glob patterns to suppress entirely

Example biston.toml

[scan]
min_lines = 15
threshold = 0.8
exclude = ["vendor/"]
include = ["src/**/*.py"]

[normalization]
anonymize_locals = false
anonymize_literals = true

[output]
format = "json"
max_results = 100

[suggest]
enabled = true
min_quality = 0.8

Inline suppression

You can also suppress findings with Python comments:

  • # biston: ignore-file — suppress the entire file (must appear in the first 5 lines)
  • # biston: ignore — suppress a single function (place in the function body or on the preceding line)

Documentation

Full docs at https://mojzis.github.io/biston/.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biston-0.5.0.tar.gz (882.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

biston-0.5.0-py3-none-win_amd64.whl (1.5 MB view details)

Uploaded Python 3Windows x86-64

biston-0.5.0-py3-none-manylinux_2_28_x86_64.whl (1.6 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ x86-64

biston-0.5.0-py3-none-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file biston-0.5.0.tar.gz.

File metadata

  • Download URL: biston-0.5.0.tar.gz
  • Upload date:
  • Size: 882.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for biston-0.5.0.tar.gz
Algorithm Hash digest
SHA256 289b1a9a6d8811b04fb049ef01cc4834f2c37434e8235bad01892012d01b79ea
MD5 a7bfa0056641bbe63ac941a6fc7a1945
BLAKE2b-256 e135ac5828c2744716da22aa52af700cd9521781b914045f264a9ab53654108d

See more details on using hashes here.

Provenance

The following attestation bundles were made for biston-0.5.0.tar.gz:

Publisher: release.yml on mojzis/biston

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biston-0.5.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: biston-0.5.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for biston-0.5.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e2cf9ebf029958eb1e78215df301cc6d5b8885e2d6b80f744ba47c9c0285c289
MD5 82acc4f799b3b37c553760cc8ef175ab
BLAKE2b-256 e2620da884c32b256b56ee61d91ae55de4082dff5841984bf6644431eed9b1f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for biston-0.5.0-py3-none-win_amd64.whl:

Publisher: release.yml on mojzis/biston

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biston-0.5.0-py3-none-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for biston-0.5.0-py3-none-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 6836af938703d5a1248204729432f9c42115eb4123ae838a405ac8e69e0897d8
MD5 80b2190e15000e12d6caf6480223ae71
BLAKE2b-256 aeaec5b9bcbdbd7eafa4001169275b4d4f2f1f96ef12bc35140c95a5dc9b3a2d

See more details on using hashes here.

Provenance

The following attestation bundles were made for biston-0.5.0-py3-none-manylinux_2_28_x86_64.whl:

Publisher: release.yml on mojzis/biston

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file biston-0.5.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for biston-0.5.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fb63fbca3d059c8bb750d79de016e1d606e577b83b5c513931679323c995756c
MD5 cbc8225ed112526020f3d79b46514822
BLAKE2b-256 4497e255f980ba62ff27cd2d212a66562eb20d804c3d3a5779aadfc583dab59a

See more details on using hashes here.

Provenance

The following attestation bundles were made for biston-0.5.0-py3-none-macosx_11_0_arm64.whl:

Publisher: release.yml on mojzis/biston

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page