Skip to main content

Find functions that work on the same resource through different code paths — the architectural-duplication detector that token-similarity tools miss.

Project description

parallax

PyPI version CI License: MIT

Find code that does the same logical job through different paths.

Token-similarity tools (jscpd, PMD CPD, pylint duplicate-code) detect copy-paste. parallax detects something different: two pieces of code that touch the same set of resources, regardless of how the code is written. Different filters, different return shapes, even different languages.

Model

A unit of code (function, method, file, module, microservice, ...) touches a set of resources (database tables, HTTP endpoints, Redis keys, env vars, file paths, ...). Units sharing the same resource set are clustered as duplication candidates.

Both unit detection and resource detection are pluggable per extractor.

Built-in extractors

Name Unit Resource
sqlalchemy Python function/method SQLAlchemy ORM model classes
django Python function/method Django ORM model classes
http-urls any text file HTTP URL paths
env-vars any text file Environment variable names
redis-keys any text file Redis key namespaces

Installation

pip install parallax-scan

Usage

parallax scan path/to/repo
parallax scan path/to/repo --extractor sqlalchemy
parallax scan path/to/repo -e sqlalchemy -e http-urls
parallax scan path/to/repo --min-resources 3 --top 20
parallax scan path/to/repo --cross-file-only
parallax scan path/to/repo --format html -o report.html
parallax scan path/to/repo --format sarif -o parallax.sarif

Configuration

Drop .parallax.toml at your repo root:

[scan]
min_resources = 3
min_cluster_size = 2

[ci]
max_cluster_size = 5

[[ignore]]
resources = ["User", "Place"]
reason = "Generic"

In CI:

parallax scan . --ci
parallax scan . --format sarif -o parallax.sarif

--ci exits non-zero only when a cluster meets ci.max_cluster_size. Without it, any cluster is non-zero.

Comparison

Tool Catches Doesn't catch
jscpd, PMD CPD, pylint duplicate-code Token-similar copy-paste Code with different surface shape
Sourcegraph Manual code search Automatic detection
pydeps, dependency-cruiser Module-level imports Same-resource overlap
semgrep Hand-written patterns Discovery
parallax Same-resource overlap regardless of shape Single-instance bad patterns

Status

Alpha. API and CLI are unstable until 1.0.

Writing an extractor

from pathlib import Path
from typing import Iterable

from parallax import Unit
from parallax.extractors.base import Extractor


class TerraformAwsExtractor(Extractor):
    name = "terraform-aws"

    def extract(self, root: Path) -> Iterable[Unit]:
        for tf in root.rglob("*.tf"):
            ...

Register in parallax.extractors.BUILTIN_EXTRACTORS.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallax_scan-0.3.0.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parallax_scan-0.3.0-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file parallax_scan-0.3.0.tar.gz.

File metadata

  • Download URL: parallax_scan-0.3.0.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for parallax_scan-0.3.0.tar.gz
Algorithm Hash digest
SHA256 bd0c4e8a7105154beef960bc45b23d6789173b729ed0444981120e781f9df784
MD5 82f63e1a1a075844325280c18cf899d1
BLAKE2b-256 5e6e024cc637745da961f31538917b24d90bf802c0fdb7d3974d853207cfe0c9

See more details on using hashes here.

File details

Details for the file parallax_scan-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: parallax_scan-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for parallax_scan-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 76a9af0f08fe29737db3f8e62ad7935a763fec61bb55816ef699d25ef86383e9
MD5 fd29397f5fae1d4f9a604f092b7d8553
BLAKE2b-256 3de6f910e87e45ead74da8942eb1ea71485d2f650a81143cdcb0ea3391f80f52

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page