Code that reorganises code

Project description

spaghettree

Software complexity directly affects the maintainability of modern codebases. Most of the software lifecycle is spent maintaining production systems. High complexity leads to harder maintenance, slower feature delivery, and longer onboarding for new engineers.

What this tool does

This is a prototype tool for simplifying structural complexity of a codebase. It works by optimising the call-graph and is intended for integration as a CI/CD pipeline stage.

Why bother?

This tool hopes to:

Help manage and limit complexity growth during development.
Complements traditional linters and formatters by addressing architectural issues.
And also:
- reduce technical debt
- lower maintenance costs
- speed up engineer onboarding

Notes

As this is a prototype and not ready for production use, the defaults are set to just report the current structures directed weighted modularity and the current call tree for the repo as it stands.

Installation

pip install spaghettree

uv add spaghettree

Example usage:

uv run -m spaghettree "path/to/src_code"

This will calculate the directed weighted modularity (DWM) of the codebase, and make up to 5 suggestions for improvements for the structure of the code.

The output looks like this

spaghettree.domain.entities.EntityCST -> spaghettree.domain.processing +0.035

This is saying, move the EntityCST class in spaghettree.domain.entities to spaghettree.domain.processing for an increase of DWM for 0.035.

There are experimental features that will automatically refactor the entire codebase (--optimise-src-code) using a divide and conquer algorithm, the behaviour is much more aggressive than the suggestions which aim to keep the developer in the loop before larger restructuring changes.

Lastly it will print a representation of the call tree to the terminal to allow for further analysis the user may want to do. Each entry in the list is a call the function in the key calls.

{
    "some_package.mod_a.A": [],
    "some_package.mod_a.func_a": [],
    "some_package.mod_a.func_b": [
        "some_package.mod_a.func_a",
        "some_package.mod_a.func_a",
    ],
    "some_package.mod_b.B": [
        "some_package.mod_b.CONSTANT",
    ],
    "some_package.mod_b.C": [
        "some_package.mod_a.A",
        "some_package.mod_b.B",
    ],
    "some_package.mod_b.CONSTANT": [],
}

Args

Argument	Type	Required	Default	Description
positional src_root	`str`	✅		Path to the root of the repository to scan
`--new-root`	`str`	❌	`''`	Optional new root path for output (default: empty, meaning same as src_root if optimisation is enabled).
`--call-tree-save-path`	`str`	❌	`'./call_tree.json'`	The location to save the generated call tree. Only used if `--optimise-src-code` isn't used. Defaults to `./call_tree.json`.

How it works

All py files in the given directory are read in as strings
Each of those strings are parsed into libcst CST objects
- This is so comments and other things are retained otherwise useful info would be lost
A list of locations of each of the entities (name, original module, line no) is collected and stored.
The CSTs are transformed into custom objects:
- ModuleCST
- ClassCST
- FuncCST
- GlobalCST
- ImportCST
A ClassCST can have 0-n FuncCST methods on it, and each FuncCST has a list of fully qualified calls that the function calls. = With these structures we can create a call-graph. e.g. ClassA.method_a -> some_func
To ensure any refactoring is possible, a call from a classes methods is counted as a call to that class (so you don't split classes into separate methods).
From the call-graph, the non-native calls are filtered out, that means that only entities defined in the repo are considered for moving.
An adjacency matrix is created from the call graph where the x and y axes are the entities and then the co-ordinates are counts of calls from x to y
Each of the entities is considered as a single module at first, so that means you could have a single constant in a file by itself.
Then each pair-wise combination is considered to be merged
- If the merge of the entities would result in a gain of the repo's directed weighted modularity then its added as a possible merge to consider.
All the possible merges are sorted by the largest gain it'd bring to the overall system, then each non-overlapping merge is applied
- e.g. merge [(mod_a, mod_b), (mod_c, mod_d)]
- merge for (mod_b, mod_c) is not considered as the mod_c and mod_d merge would result in a higher directed weighted modularity.
This is repeated until there are no more valid merges
Once this is done some extra modification is done, for example if you were writing a library of validators that didn't call eachother but all sat in the same module, then they are combined.
When writing the entities to their new files, the imports are updated, and the location of each of the entities are kept as close as they can be to where they were before.

# some_original_mod

T = TypeVar("T")

class SomeClass:
    def method(self, item: T) -> T:
        return item

class SomeOtherClass:
    def method(self, item: T) -> T:
        return item
    
SomeType = SomeClass | SomeOtherClass

This is to ensure that for an example like above the result is still valid, an initial idea was to always write globals, classes, funcs, but that would result in some_broken_mod

# some_broken_mod

T = TypeVar("T")
SomeType = SomeClass | SomeOtherClass # BROKEN as the classes aren't defined yet

class SomeClass:
    def method(self, item: T) -> T:
        return item

class SomeOtherClass:
    def method(self, item: T) -> T:
        return item

Lastly when the entities are all written to their new module location, ruff is called on the files to fix any formatting, because of how ruff is set up, it means it would respect the users own ruff.toml so would include or exclude rules they were interested in.

Repo map

├── .github
│   └── workflows
│       ├── ci_tests.yaml
│       └── publish.yaml
├── src
│   └── spaghettree
│       ├── adapters
│       │   ├── __init__.py
│       │   └── io_wrapper.py
│       ├── domain
│       │   ├── __init__.py
│       │   ├── entities.py
│       │   ├── optimisation.py
│       │   ├── parsing.py
│       │   ├── processing.py
│       │   └── visitors.py
│       ├── logger
│       │   └── __init__.py
│       ├── __init__.py
│       └── __main__.py
├── tests
│   ├── adapters
│   │   ├── __init__.py
│   │   └── test_adapter_apis.py
│   ├── domain
│   │   ├── __init__.py
│   │   ├── test_entities.py
│   │   ├── test_optimisation.py
│   │   └── test_processing.py
│   ├── __init__.py
│   ├── conftest.py
│   ├── test_main.py
│   └── test_result.py
├── .pre-commit-config.yaml
├── README.md
├── pyproject.toml
├── ruff.toml
└── uv.lock
::

Project details

Release history Release notifications | RSS feed

This version

0.3.0

Sep 27, 2025

0.2.1

Sep 22, 2025

0.2.0

Sep 21, 2025

0.1.1

Sep 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spaghettree-0.3.0.tar.gz (14.1 kB view details)

Uploaded Sep 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spaghettree-0.3.0-py3-none-any.whl (18.6 kB view details)

Uploaded Sep 27, 2025 Python 3

File details

Details for the file spaghettree-0.3.0.tar.gz.

File metadata

Download URL: spaghettree-0.3.0.tar.gz
Upload date: Sep 27, 2025
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spaghettree-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`3923d76d9522aadcbeafa7d6d36126edb71c1d7c650533b6baeb850ddaecbe27`
MD5	`cda735c0f413bd13850c45ca819a3884`
BLAKE2b-256	`d077a348c57bb2ad15c70431ecacc8f8d66a4d3fc1a222e72bd80143ad970b41`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spaghettree-0.3.0.tar.gz:

Publisher: publish.yaml on second-ed/spaghettree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spaghettree-0.3.0.tar.gz
- Subject digest: 3923d76d9522aadcbeafa7d6d36126edb71c1d7c650533b6baeb850ddaecbe27
- Sigstore transparency entry: 565512543
- Sigstore integration time: Sep 27, 2025
Source repository:
- Permalink: second-ed/spaghettree@dd386eab87e6d31fb4ff2828c151a094dd4b48dc
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/second-ed
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@dd386eab87e6d31fb4ff2828c151a094dd4b48dc
- Trigger Event: release

File details

Details for the file spaghettree-0.3.0-py3-none-any.whl.

File metadata

Download URL: spaghettree-0.3.0-py3-none-any.whl
Upload date: Sep 27, 2025
Size: 18.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for spaghettree-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0b68662cacfda7d5c81af7f975650d5db2ffa471ed864e47314b58e43d124601`
MD5	`f6c66fe84406eaa45ee9436148091216`
BLAKE2b-256	`7105979f97f135fb607d683cacb7b845f9199aba96e0e37011d675ba05f3409a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spaghettree-0.3.0-py3-none-any.whl:

Publisher: publish.yaml on second-ed/spaghettree

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spaghettree-0.3.0-py3-none-any.whl
- Subject digest: 0b68662cacfda7d5c81af7f975650d5db2ffa471ed864e47314b58e43d124601
- Sigstore transparency entry: 565512550
- Sigstore integration time: Sep 27, 2025
Source repository:
- Permalink: second-ed/spaghettree@dd386eab87e6d31fb4ff2828c151a094dd4b48dc
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/second-ed
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@dd386eab87e6d31fb4ff2828c151a094dd4b48dc
- Trigger Event: release

spaghettree 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

spaghettree

What this tool does

Why bother?

Notes

Installation

Example usage:

Args

How it works

Repo map

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance