Skip to main content

Code that reorganises code

Project description

spaghettree

Software complexity directly affects the maintainability of modern codebases. Most of the software lifecycle is spent maintaining production systems. High complexity leads to harder maintenance, slower feature delivery, and longer onboarding for new engineers.

What this tool does

This is a prototype tool for simplifying structural complexity of a codebase. It works by optimising the call-graph and is intended for integration as a CI/CD pipeline stage.

Why bother?

This tool hopes to:

  • Help manage and limit complexity growth during development.

  • Complements traditional linters and formatters by addressing architectural issues.

  • And also:

    • reduce technical debt
    • lower maintenance costs
    • speed up engineer onboarding

Notes

As this is a prototype and not ready for production use, the defaults are set to just report the current structures directed weighted modularity and the current call tree for the repo as it stands.

Installation

pip install spaghettree

Or

uv add spaghettree

Example usage:

uv run -m spaghettree "path/to/src_code"

This will calculate the directed weighted modularity (DWM) of the codebase, and make up to 5 suggestions for improvements for the structure of the code.

The output looks like this

spaghettree.domain.entities.EntityCST -> spaghettree.domain.processing +0.035

This is saying, move the EntityCST class in spaghettree.domain.entities to spaghettree.domain.processing for an increase of DWM for 0.035.

There are experimental features that will automatically refactor the entire codebase (--optimise-src-code) using a divide and conquer algorithm, the behaviour is much more aggressive than the suggestions which aim to keep the developer in the loop before larger restructuring changes.

Lastly it will print a representation of the call tree to the terminal to allow for further analysis the user may want to do. Each entry in the list is a call the function in the key calls.

{
    "some_package.mod_a.A": [],
    "some_package.mod_a.func_a": [],
    "some_package.mod_a.func_b": [
        "some_package.mod_a.func_a",
        "some_package.mod_a.func_a",
    ],
    "some_package.mod_b.B": [
        "some_package.mod_b.CONSTANT",
    ],
    "some_package.mod_b.C": [
        "some_package.mod_a.A",
        "some_package.mod_b.B",
    ],
    "some_package.mod_b.CONSTANT": [],
}

Args

Argument Type Required Default Description
positional src_root str Path to the root of the repository to scan
--new-root str '' Optional new root path for output (default: empty, meaning same as src_root if optimisation is enabled).
--call-tree-save-path str './call_tree.json' The location to save the generated call tree. Only used if --optimise-src-code isn't used. Defaults to ./call_tree.json.

| --optimise-src-code | Flag (no value) | ❌ | | Enable optimisation of the source code. |

How it works

  • All py files in the given directory are read in as strings
  • Each of those strings are parsed into libcst CST objects
    • This is so comments and other things are retained otherwise useful info would be lost
  • A list of locations of each of the entities (name, original module, line no) is collected and stored.
  • The CSTs are transformed into custom objects:
    • ModuleCST
    • ClassCST
    • FuncCST
    • GlobalCST
    • ImportCST
  • A ClassCST can have 0-n FuncCST methods on it, and each FuncCST has a list of fully qualified calls that the function calls. = With these structures we can create a call-graph. e.g. ClassA.method_a -> some_func
  • To ensure any refactoring is possible, a call from a classes methods is counted as a call to that class (so you don't split classes into separate methods).
  • From the call-graph, the non-native calls are filtered out, that means that only entities defined in the repo are considered for moving.
  • An adjacency matrix is created from the call graph where the x and y axes are the entities and then the co-ordinates are counts of calls from x to y
  • Each of the entities is considered as a single module at first, so that means you could have a single constant in a file by itself.
  • Then each pair-wise combination is considered to be merged
    • If the merge of the entities would result in a gain of the repo's directed weighted modularity then its added as a possible merge to consider.
  • All the possible merges are sorted by the largest gain it'd bring to the overall system, then each non-overlapping merge is applied
    • e.g. merge [(mod_a, mod_b), (mod_c, mod_d)]
    • merge for (mod_b, mod_c) is not considered as the mod_c and mod_d merge would result in a higher directed weighted modularity.
  • This is repeated until there are no more valid merges
  • Once this is done some extra modification is done, for example if you were writing a library of validators that didn't call eachother but all sat in the same module, then they are combined.
  • When writing the entities to their new files, the imports are updated, and the location of each of the entities are kept as close as they can be to where they were before.
# some_original_mod

T = TypeVar("T")

class SomeClass:
    def method(self, item: T) -> T:
        return item

class SomeOtherClass:
    def method(self, item: T) -> T:
        return item
    
SomeType = SomeClass | SomeOtherClass
  • This is to ensure that for an example like above the result is still valid, an initial idea was to always write globals, classes, funcs, but that would result in some_broken_mod
# some_broken_mod

T = TypeVar("T")
SomeType = SomeClass | SomeOtherClass # BROKEN as the classes aren't defined yet

class SomeClass:
    def method(self, item: T) -> T:
        return item

class SomeOtherClass:
    def method(self, item: T) -> T:
        return item
    
  • Lastly when the entities are all written to their new module location, ruff is called on the files to fix any formatting, because of how ruff is set up, it means it would respect the users own ruff.toml so would include or exclude rules they were interested in.

Repo map

├── .github
│   └── workflows
│       ├── ci_tests.yaml
│       └── publish.yaml
├── src
│   └── spaghettree
│       ├── adapters
│       │   ├── __init__.py
│       │   └── io_wrapper.py
│       ├── domain
│       │   ├── __init__.py
│       │   ├── entities.py
│       │   ├── optimisation.py
│       │   ├── parsing.py
│       │   ├── processing.py
│       │   └── visitors.py
│       ├── logger
│       │   └── __init__.py
│       ├── __init__.py
│       └── __main__.py
├── tests
│   ├── adapters
│   │   ├── __init__.py
│   │   └── test_adapter_apis.py
│   ├── domain
│   │   ├── __init__.py
│   │   ├── test_entities.py
│   │   ├── test_optimisation.py
│   │   └── test_processing.py
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_main.py
│   └── test_result.py
├── .pre-commit-config.yaml
├── README.md
├── pyproject.toml
├── ruff.toml
└── uv.lock
::

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spaghettree-0.3.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spaghettree-0.3.0-py3-none-any.whl (18.6 kB view details)

Uploaded Python 3

File details

Details for the file spaghettree-0.3.0.tar.gz.

File metadata

  • Download URL: spaghettree-0.3.0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spaghettree-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3923d76d9522aadcbeafa7d6d36126edb71c1d7c650533b6baeb850ddaecbe27
MD5 cda735c0f413bd13850c45ca819a3884
BLAKE2b-256 d077a348c57bb2ad15c70431ecacc8f8d66a4d3fc1a222e72bd80143ad970b41

See more details on using hashes here.

Provenance

The following attestation bundles were made for spaghettree-0.3.0.tar.gz:

Publisher: publish.yaml on second-ed/spaghettree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spaghettree-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: spaghettree-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 18.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spaghettree-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b68662cacfda7d5c81af7f975650d5db2ffa471ed864e47314b58e43d124601
MD5 f6c66fe84406eaa45ee9436148091216
BLAKE2b-256 7105979f97f135fb607d683cacb7b845f9199aba96e0e37011d675ba05f3409a

See more details on using hashes here.

Provenance

The following attestation bundles were made for spaghettree-0.3.0-py3-none-any.whl:

Publisher: publish.yaml on second-ed/spaghettree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page