Java AST Structural Graph – static structural dependency analysis for Java codebases
Project description
JASTG – Java AST Structural Graph
Static structural dependency analysis for Java codebases.
JASTG extracts class-level dependency graphs and object-oriented metrics from Java source code using AST parsing only — no JVM, no classpath, no compilation required. It is designed for reproducible software-engineering research and integrates directly with graph analysis tools such as NetworkX, Gephi, and community-detection algorithms.
Table of Contents
- What JASTG captures
- What JASTG does NOT capture (limitations)
- Installation
- Quick start
- CLI reference
- Python API
- Output formats
- Determinism and reproducibility
- Running tests
- How to cite
- License
What JASTG captures
JASTG extracts structural dependencies between classes based on typed
syntactic signals in the source code. A dependency A → B with weight w
means that class A references class B in w distinct typed positions.
| Source of dependency | Example |
|---|---|
extends clause |
class A extends B |
implements clause |
class A implements I |
| Field type | private B field; |
| Method return type | public B getB() { … } |
| Method parameter type | void f(B param) |
| Constructor parameter type | A(B param) |
ClassCreator |
new B(…) |
LocalVariableDeclaration |
B local = …; |
Cast expression |
(B) value |
MethodInvocation qualifier |
B.staticCall() (uppercase heuristic) |
Inner classes are registered as independent nodes with $ notation:
com.example.Outer$Inner, com.example.Outer$Inner$Deep.
Object-oriented metrics computed per class:
| Metric | Definition |
|---|---|
| LCOM4 | Lack of Cohesion of Methods (v4): connected components in method-attribute graph |
| CBO | Coupling Between Objects: number of distinct internal classes depended on |
| RFC | Response For a Class: NOM + distinct invoked method names |
| NOM | Number of Methods |
| NOA | Number of Attributes (field declarators) |
Limitations
JASTG performs syntactic analysis only — no type solving, no JVM, no classpath. The following are known limitations:
- Type inference (
var, generics inference, lambda return types) is not resolved. - Inner class multilevel dot-notation:
pkg.Outer.Inner.Deepis not resolved. Only the two last parts are converted (Outer.Inner→Outer$Inner;pkg.Outer.Inner→pkg.Outer$Inner). Use$notation in source code if you need these resolved. - Static imports are ignored (they refer to members, not classes).
- Chained method calls (
a.b().c()) are not type-traced. - RFC does not distinguish the class target of each method invocation (inherent limitation without type solving).
- CBO counts only references to classes present in the analysed source tree (external library classes are not nodes).
- Qualifier heuristic (default
--qualifier-heuristic=upper): onlyMethodInvocationqualifiers starting with an uppercase letter are resolved as class references. Classes named with a lowercase first letter would be missed; use--qualifier-heuristic=offto disable.
Installation
From PyPI (once published):
pip install jastg
From source (editable install for development):
git clone https://github.com/MarcosCordeiro/jastg.git
cd jastg
pip install -e ".[dev]"
Requirements: Python ≥ 3.10, javalang ≥ 0.13.0, networkx ≥ 2.6.
Quick start
Single domain
jastg analyze --domain myapp --path /path/to/src
Multiple domains
jastg analyze \
--domain backend --path /path/to/backend/src \
--domain frontend --path /path/to/frontend/src
Undirected, unweighted (for Louvain community detection)
jastg analyze --domain myapp --path /src --undirected --unweighted
From YAML config
jastg analyze --config analysis.yaml
analysis.yaml example:
domains:
- name: backend
path: /path/to/backend
- name: frontend
path: /path/to/frontend
weighted: true
directed: true
qualifier_heuristic: upper
output_dir: output
Check installation
jastg doctor
jastg --version
Try the bundled example
jastg analyze --domain example --path examples/mini_project
CLI reference
jastg analyze [OPTIONS]
Options:
--domain NAME Domain label (repeat for multiple domains)
--path PATH Root path to scan for .java files (paired with --domain)
--config FILE YAML configuration file (alternative to --domain/--path)
--weighted Export edge weights – third column (default: on)
--unweighted Omit edge weights – two-column output
--directed Directed graph (default: on)
--undirected Symmetrize edges (sum reciprocal weights)
--out DIR Output directory (default: output)
--qualifier-heuristic 'upper' (default) or 'off'
--fail-fast Abort on first parse error
-v, --verbose DEBUG-level logging
Python API
from jastg.pipeline import run
metadata = run(
dominios=["myapp"],
caminhos=["/path/to/src"],
ponderado=True, # write edge weights
direcionado=True, # directed graph
output_dir="output",
qualifier_heuristic="upper",
fail_fast=False,
)
print(metadata["numero_classes"]) # number of classes analysed
print(metadata["numero_arestas"]) # number of edges exported
Lower-level API:
import javalang
from jastg.ast.collect import coletar_classes_internas
from jastg.extract import extrair_dependencias_e_metricas
classes, index, domains, n_files = coletar_classes_internas(
["myapp"], ["/path/to/src"]
)
source = open("MyClass.java").read()
tree = javalang.parse.parse(source)
results = extrair_dependencias_e_metricas(tree, "MyClass.java", classes, index, "myapp")
Output formats
All files are written to --out/<domain>/ (default output/<domain>/).
metadata_{domain}.json
Run provenance for reproducibility (e.g. metadata_myapp.json):
{
"project_url": "https://github.com/owner/repo",
"jastg_version": "1.0.0",
"python_version": "3.12.0 ...",
"platform": "Linux-6.x...",
"javalang_version": "0.13.0",
"networkx_version": "3.3",
"config_hash": "sha256hex...",
"run_date": "2026-02-22T12:00:00+00:00",
"commit_hash": "abc123...",
"num_classes": 9,
"num_edges": 12,
"total_java_files": 7,
"parse_errors": 0,
"directed": true,
"weighted": true
}
graph_{domain}.graphml
GraphML file ready for import into Gephi or any GraphML-compatible tool
(e.g. graph_myapp.graphml).
- Nodes – one per class, with attributes:
label:domain/package.ClassstringLCOM4,CBO,RFC,NOM,NOA: OO metrics
- Edges – one per dependency pair, with optional
weightattribute (present when--weighted, absent when--unweighted). Undirected mode (--undirected) symmetrizes pairs as(min_id, max_id)and sums reciprocal weights. - Graph-level metadata – all fields from
metadata_{domain}.jsonare embedded directly in the GraphML<graph>element.
Determinism and reproducibility
- IDs are assigned by alphabetical sort of
domain/classkeys, so they are identical across runs given the same input. config_hashinmetadata_{domain}.jsonis a SHA-256 digest of the effective configuration (domains, paths, graph mode, qualifier heuristic). Two runs with the same config hash on the same source tree should produce identical graphs.- File traversal uses sorted order to eliminate OS-level non-determinism.
- The
commit_hashfield (if in a git repository) further pins the exact source version analysed.
Running tests
# All tests (verbose)
pytest -v
# With coverage
pytest --cov=jastg --cov-report=term-missing
# Quick smoke test
pytest -q
The test suite requires no external Java installation. All Java source files are created as strings in memory during the test session.
Benchmark Dataset
The official structural graph benchmark generated using JASTG is available at: https://doi.org/10.5281/zenodo.18744313
How to cite
If you use JASTG in your research, please cite:
@software{jastg2026,
author = {Brito Jr, Marcos Cordeiro de},
title = {{JASTG}: {Java AST Structural Graph}},
year = {2026},
version = {1.0.0},
url = {https://github.com/MarcosCordeiro/jastg},
license = {MIT}
}
See also CITATION.cff in this repository.
License
MIT — see LICENSE.txt.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jastg-1.0.0.tar.gz.
File metadata
- Download URL: jastg-1.0.0.tar.gz
- Upload date:
- Size: 27.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c095b0436ef6a7b392ab4aaa0fd2c98d30a745d26e10753a0329daf8a76d1b6c
|
|
| MD5 |
0ac2bdb77add73eb3ec2305524f23938
|
|
| BLAKE2b-256 |
87e87ee33d1045be247141b83049ab06a23d973efc7ee6a5016bbb72275964c8
|
File details
Details for the file jastg-1.0.0-py3-none-any.whl.
File metadata
- Download URL: jastg-1.0.0-py3-none-any.whl
- Upload date:
- Size: 26.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ebb2dae136ba028e5102a7a7b0fa1538a8b297b7c59de67fefd7ff09e9bcd4c1
|
|
| MD5 |
dd43acd276ead7960b00d8cc2b94553e
|
|
| BLAKE2b-256 |
2be417882adfa03f4a3f26f8717f484c8494b86473937a21461a01e108d17346
|