Skip to main content

Java AST Structural Graph – static structural dependency analysis for Java codebases

Project description

JASTG – Java AST Structural Graph

Static structural dependency analysis for Java codebases.

CI PyPI version Python versions License: MIT Ruff

JASTG extracts class-level dependency graphs and object-oriented metrics from Java source code using AST parsing only — no JVM, no classpath, no compilation required. It is designed for reproducible software-engineering research and integrates directly with graph analysis tools such as NetworkX, Gephi, and community-detection algorithms.


Table of Contents

  1. What JASTG captures
  2. What JASTG does NOT capture (limitations)
  3. Installation
  4. Quick start
  5. CLI reference
  6. Python API
  7. Output formats
  8. Determinism and reproducibility
  9. Running tests
  10. How to cite
  11. License

What JASTG captures

JASTG extracts structural dependencies between classes based on typed syntactic signals in the source code. A dependency A → B with weight w means that class A references class B in w distinct typed positions.

Source of dependency Example
extends clause class A extends B
implements clause class A implements I
Field type private B field;
Method return type public B getB() { … }
Method parameter type void f(B param)
Constructor parameter type A(B param)
ClassCreator new B(…)
LocalVariableDeclaration B local = …;
Cast expression (B) value
MethodInvocation qualifier B.staticCall() (uppercase heuristic)

Inner classes are registered as independent nodes with $ notation: com.example.Outer$Inner, com.example.Outer$Inner$Deep.

Object-oriented metrics computed per class:

Metric Definition
LCOM4 Lack of Cohesion of Methods (v4): connected components in method-attribute graph
CBO Coupling Between Objects: number of distinct internal classes depended on
RFC Response For a Class: NOM + distinct invoked method names
NOM Number of Methods
NOA Number of Attributes (field declarators)

Limitations

JASTG performs syntactic analysis only — no type solving, no JVM, no classpath. The following are known limitations:

  • Type inference (var, generics inference, lambda return types) is not resolved.
  • Inner class multilevel dot-notation: pkg.Outer.Inner.Deep is not resolved. Only the two last parts are converted (Outer.InnerOuter$Inner; pkg.Outer.Innerpkg.Outer$Inner). Use $ notation in source code if you need these resolved.
  • Static imports are ignored (they refer to members, not classes).
  • Chained method calls (a.b().c()) are not type-traced.
  • RFC does not distinguish the class target of each method invocation (inherent limitation without type solving).
  • CBO counts only references to classes present in the analysed source tree (external library classes are not nodes).
  • Qualifier heuristic (default --qualifier-heuristic=upper): only MethodInvocation qualifiers starting with an uppercase letter are resolved as class references. Classes named with a lowercase first letter would be missed; use --qualifier-heuristic=off to disable.

Installation

From PyPI (once published):

pip install jastg

From source (editable install for development):

git clone https://github.com/MarcosCordeiro/jastg.git
cd jastg
pip install -e ".[dev]"

Requirements: Python ≥ 3.10, javalang ≥ 0.13.0, networkx ≥ 2.6.


Quick start

Single domain

jastg analyze --domain myapp --path /path/to/src

Multiple domains

jastg analyze \
    --domain backend  --path /path/to/backend/src \
    --domain frontend --path /path/to/frontend/src

Undirected, unweighted (for Louvain community detection)

jastg analyze --domain myapp --path /src --undirected --unweighted

From YAML config

jastg analyze --config analysis.yaml

analysis.yaml example:

domains:
  - name: backend
    path: /path/to/backend
  - name: frontend
    path: /path/to/frontend
weighted: true
directed: true
qualifier_heuristic: upper
output_dir: output

Check installation

jastg doctor
jastg --version

Try the bundled example

jastg analyze --domain example --path examples/mini_project

CLI reference

jastg analyze [OPTIONS]

Options:
  --domain NAME           Domain label (repeat for multiple domains)
  --path PATH             Root path to scan for .java files (paired with --domain)
  --config FILE           YAML configuration file (alternative to --domain/--path)
  --weighted              Export edge weights – third column (default: on)
  --unweighted            Omit edge weights – two-column output
  --directed              Directed graph (default: on)
  --undirected            Symmetrize edges (sum reciprocal weights)
  --out DIR               Output directory (default: output)
  --qualifier-heuristic   'upper' (default) or 'off'
  --fail-fast             Abort on first parse error
  -v, --verbose           DEBUG-level logging

Python API

from jastg.pipeline import run

metadata = run(
    dominios=["myapp"],
    caminhos=["/path/to/src"],
    ponderado=True,      # write edge weights
    direcionado=True,    # directed graph
    output_dir="output",
    qualifier_heuristic="upper",
    fail_fast=False,
)

print(metadata["numero_classes"])  # number of classes analysed
print(metadata["numero_arestas"])  # number of edges exported

Lower-level API:

import javalang
from jastg.ast.collect import coletar_classes_internas
from jastg.extract import extrair_dependencias_e_metricas

classes, index, domains, n_files = coletar_classes_internas(
    ["myapp"], ["/path/to/src"]
)

source = open("MyClass.java").read()
tree = javalang.parse.parse(source)
results = extrair_dependencias_e_metricas(tree, "MyClass.java", classes, index, "myapp")

Output formats

All files are written to --out/<domain>/ (default output/<domain>/).

metadata_{domain}.json

Run provenance for reproducibility (e.g. metadata_myapp.json):

{
  "project_url": "https://github.com/owner/repo",
  "jastg_version": "1.0.0",
  "python_version": "3.12.0 ...",
  "platform": "Linux-6.x...",
  "javalang_version": "0.13.0",
  "networkx_version": "3.3",
  "config_hash": "sha256hex...",
  "run_date": "2026-02-22T12:00:00+00:00",
  "commit_hash": "abc123...",
  "num_classes": 9,
  "num_edges": 12,
  "total_java_files": 7,
  "parse_errors": 0,
  "directed": true,
  "weighted": true
}

graph_{domain}.graphml

GraphML file ready for import into Gephi or any GraphML-compatible tool (e.g. graph_myapp.graphml).

  • Nodes – one per class, with attributes:
    • label: domain/package.Class string
    • LCOM4, CBO, RFC, NOM, NOA: OO metrics
  • Edges – one per dependency pair, with optional weight attribute (present when --weighted, absent when --unweighted). Undirected mode (--undirected) symmetrizes pairs as (min_id, max_id) and sums reciprocal weights.
  • Graph-level metadata – all fields from metadata_{domain}.json are embedded directly in the GraphML <graph> element.

Determinism and reproducibility

  • IDs are assigned by alphabetical sort of domain/class keys, so they are identical across runs given the same input.
  • config_hash in metadata_{domain}.json is a SHA-256 digest of the effective configuration (domains, paths, graph mode, qualifier heuristic). Two runs with the same config hash on the same source tree should produce identical graphs.
  • File traversal uses sorted order to eliminate OS-level non-determinism.
  • The commit_hash field (if in a git repository) further pins the exact source version analysed.

Running tests

# All tests (verbose)
pytest -v

# With coverage
pytest --cov=jastg --cov-report=term-missing

# Quick smoke test
pytest -q

The test suite requires no external Java installation. All Java source files are created as strings in memory during the test session.


Benchmark Dataset

The official structural graph benchmark generated using JASTG is available at: https://doi.org/10.5281/zenodo.18744313


How to cite

If you use JASTG in your research, please cite:

@software{jastg2026,
  author    = {Brito Jr, Marcos Cordeiro de},
  title     = {{JASTG}: {Java AST Structural Graph}},
  year      = {2026},
  version   = {1.0.0},
  url       = {https://github.com/MarcosCordeiro/jastg},
  license   = {MIT}
}

See also CITATION.cff in this repository.


License

MIT — see LICENSE.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jastg-1.0.0.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jastg-1.0.0-py3-none-any.whl (26.9 kB view details)

Uploaded Python 3

File details

Details for the file jastg-1.0.0.tar.gz.

File metadata

  • Download URL: jastg-1.0.0.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for jastg-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c095b0436ef6a7b392ab4aaa0fd2c98d30a745d26e10753a0329daf8a76d1b6c
MD5 0ac2bdb77add73eb3ec2305524f23938
BLAKE2b-256 87e87ee33d1045be247141b83049ab06a23d973efc7ee6a5016bbb72275964c8

See more details on using hashes here.

File details

Details for the file jastg-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: jastg-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 26.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for jastg-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebb2dae136ba028e5102a7a7b0fa1538a8b297b7c59de67fefd7ff09e9bcd4c1
MD5 dd43acd276ead7960b00d8cc2b94553e
BLAKE2b-256 2be417882adfa03f4a3f26f8717f484c8494b86473937a21461a01e108d17346

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page