Skip to main content

Java-to-Python source translator with line-level structural correspondence for side-by-side review

Project description

j2py

j2py is a Java-to-Python source translator. It converts Java classes to Python that preserves line-level structural correspondence — same method order, same control flow, camelCase -> snake_case naming — so reviewers can audit output against the original Java side by side. The goal is reviewable equivalence, not a fully idiomatic rewrite. After deterministic conversion a file can be passed to an LLM for an additional conversion attempt.

How it works

Java source
  → parse (tree-sitter-java)
  → analyze (symbols, dependency graph)
  → translate (deterministic rule layer, then optional LLM completion)
  → validate (syntax, lint, types)
  → Python output

The rule layer handles common language constructs deterministically (~70% of typical code). Where rules stop, an optional configured LLM provider fills gaps using disk-cached prompts. Every file gets a confidence score based on rule-layer coverage, validation status, and semantic warnings, plus structured diagnostics for anything left unhandled.

Status

Beta. The library is usable for experimentation, fixture-driven development, and batch translation of real Java projects, but construct coverage is still incomplete, multi-file inheritance can require manual import fixups, and output on large enterprise codebases will contain review warnings and known correctness gaps.

Deterministic support today includes:

  • tree-sitter Java parsing and symbol extraction
  • class, nested class, local/anonymous class helpers, interface, enum, and record skeletons
  • interface abstract methods, default methods, and static methods
  • fields, constructors, methods, and overloads: chained constructor delegation, builder-style forwarding merged into default parameters, and type-dispatch overload groups via a vendored @overloaded runtime dispatcher (ADR 0009)
  • common expressions: literals, identifiers, field access, arrays, class literals, assignments, updates, ternaries, null checks, collection calls, string concat, and typed get(...) lowering for lists, maps, and common API receivers
  • stream pipelines: map, filter, flatMap, distinct, sorted, collectors such as toList, toSet, joining, groupingBy/mapping, toMap, and block lambdas
  • control flow: if/else, enhanced and classic for, while, do while, safe switch forms, try/catch/finally, throw, break, and continue
  • configured import emission, naming policy, type maps, exception maps, and comment flags
  • dependency-ordered directory translation
  • structured diagnostics, confidence scoring, validation, post-LLM structural verification, and optional Anthropic or Gemini completion
  • side-by-side Java/Python review via j2py compare

Known gaps include:

  • overload groups whose erased Python signatures collide (e.g. int vs long) and other ambiguous overload groups that still fall back to manual-dispatch TODOs
  • complex enum static initialization beyond translated enum constant class bodies
  • annotation semantics beyond syntactic metadata shells
  • runtime/framework behavior (dependency injection, persistence mappings, container lifecycle) — j2py translates source structure, not application frameworks

Quick start

Install the alpha from PyPI:

pip install --pre j2py-converter
j2py --help

The PyPI distribution is j2py-converter; the import package and CLI command are j2py. (The bare j2py name on PyPI is owned by an unrelated project.)

Local development:

uv sync --locked
make check

Translate a file without LLM completion:

uv run j2py translate tests/fixtures/java/HelloWorld.java --no-llm --no-validate --dry-run

Translate a directory in dependency order:

uv run j2py translate path/to/java/root --output translated_py --no-llm

Skip unchanged files on repeated directory runs:

uv run j2py translate path/to/java/root --output translated_py --incremental

Generate review reports:

uv run j2py translate path/to/java/root --output translated_py --dashboard dashboard.html
uv run j2py translate SomeClass.java --report review.html

Watch a source tree and incrementally re-translate changed Java files:

uv run j2py watch path/to/java/root --output translated_py --no-llm

Side-by-side review in VS Code:

uv run j2py compare tests/fixtures/java/HelloWorld.java --no-llm

Print compare paths without opening an editor:

uv run j2py compare tests/fixtures/java/HelloWorld.java --no-open --no-llm

LLM completion with the default Anthropic provider (requires ANTHROPIC_API_KEY):

ANTHROPIC_API_KEY=... uv run j2py translate SomeClass.java

LLM completion with Gemini Flash (requires GEMINI_API_KEY):

GEMINI_API_KEY=... uv run j2py translate SomeClass.java \
  --llm-provider gemini --model gemini-3.5-flash

Configuration can live in j2py.yaml, j2py.toml, [tool.j2py] in pyproject.toml, or j2py_config.py. Projects may set default llm_provider and model values there, while CLI flags override them for one command. See docs/configuration.md for the schema.

Quality gates

make check         # ruff + mypy strict + pytest (excludes behavior, live_llm, target_translation)
make test-behavior # Java/Python stdout/stderr/exit-code equivalence (requires JDK)
make test-targets  # future strict-xfail roadmap targets
make release-check # alpha release gate: release-test + dist-check (3.11+ in CI publish workflow)

Benchmark corpus

Translation quality is measured against a multi-library corpus: pinned checkouts of Spring Framework, Guava, Apache Commons Lang, Jackson, and Caffeine, plus small curated construct fixtures under tests/fixtures/corpus/. These libraries are open-source stress tests for the deterministic rule layer — not product scope or target runtime. Corpus-derived fast fixtures that should not affect committed baselines live under tests/fixtures/java/targets/ instead.

make corpus-list-presets              # show all pinned presets
make corpus-clone-all                 # one-time: clone all checkouts into .corpus/
make corpus-guava-dense-check         # Guava collect/base vs baseline
make corpus-commons-lang-dense-check  # Commons Lang utilities vs baseline
make corpus-jackson-dense-check       # Jackson databind vs baseline
make corpus-caffeine-dense-check      # Caffeine cache code vs baseline
make corpus-spring-dense-check        # Spring dense preset + construct fixtures
make corpus-hotspots                  # rank gaps across all committed baselines

Presets and baselines live in scripts/corpus/corpus_presets.py and tests/fixtures/corpus/. In git worktrees, set J2PY_CORPUS_ROOT to your main checkout so scripts reuse $J2PY_CORPUS_ROOT/.corpus/. Regenerate a baseline with make corpus-<name>-update-baseline only after comparison shows no regressions.

See docs/CORPUS_SCOREBOARD.md, docs/TRANSLATION_TARGETS.md, and the full documentation index.

On-demand live LLM evaluation and harvest (excluded from make check):

make test-llm-e2e              # Anthropic live probes; requires ANTHROPIC_API_KEY
make test-llm-gemini-e2e       # Gemini live probe; requires GEMINI_API_KEY
make harvest-promote-dry        # triage + draft pattern-family issues; no LLM
make harvest-promote            # queue → Gemini batch → triage → draft issues
make harvest-promote-issues     # same + gh issue create
make harvest-queue REFRESH=1    # rebuild Tier-A queue from corpus-reports/
make harvest-pipeline           # local probe harvest → triage → FUTURE_TARGETS drafts
make harvest-gemini             # batch Gemini harvest from .j2py/harvest/queue.txt
make harvest-triage             # summarize local .j2py/harvest/records.jsonl
# promote vars: LIMIT=2 ISSUES=3; harvest-gemini: OFFSET=0 LIMIT=10 SLEEP=6 FILE_LIST=...

Worktrees: set J2PY_CORPUS_ROOT to the main checkout so .env, queue, cache, and .j2py/harvest/ resolve correctly. See docs/LLM_HARVEST.md for queue tiers, content cache, state files, and the harvest-promote agent skill.

Adding translation rules

  1. Add or update a Java/Python fixture pair under tests/fixtures/.
  2. Implement the smallest deterministic rule in j2py/translate/.
  3. Graduate the behavior into normal tests once it passes.
  4. Run make check and relevant corpus checks, such as make corpus-guava-dense-check for generics/collections or make corpus-spring-dense-check when construct-mix behavior may shift.
  5. Update a corpus baseline only when comparison shows no regressions.

Material translation policy changes should get an ADR under docs/decisions/.

Beta release notes

j2py-converter is published as a beta package. Expect incomplete construct coverage, diagnostics for unsupported regions, known multi-file import limitations, and manual review on production-scale codebases. See docs/RELEASING.md for the release checklist and CHANGELOG.md for known limitations in 0.5.0b1.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

j2py_converter-0.5.0b1.tar.gz (363.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

j2py_converter-0.5.0b1-py3-none-any.whl (141.3 kB view details)

Uploaded Python 3

File details

Details for the file j2py_converter-0.5.0b1.tar.gz.

File metadata

  • Download URL: j2py_converter-0.5.0b1.tar.gz
  • Upload date:
  • Size: 363.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for j2py_converter-0.5.0b1.tar.gz
Algorithm Hash digest
SHA256 3b55efa36378bf87e1ce0dfd28ed59bfe7497b4b26fa7159f0ec010f609b7aa4
MD5 1a0e570af57ec48fbd7ae78720cf20de
BLAKE2b-256 8f27172edc764fe40dd66d70d318f5a160e296b633a43d15eb2d3608138f2333

See more details on using hashes here.

Provenance

The following attestation bundles were made for j2py_converter-0.5.0b1.tar.gz:

Publisher: publish.yml on tomanizer/j2py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file j2py_converter-0.5.0b1-py3-none-any.whl.

File metadata

File hashes

Hashes for j2py_converter-0.5.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 617e93366e0aab73de7a03b92a1e1f78016f266fd76d0c602df5ff6af7af5029
MD5 7a99d387f683c28fc431d1a8168e2e91
BLAKE2b-256 b6ded51416a71f3138303bc37880fb058a674169c3606904de8793d361caae9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for j2py_converter-0.5.0b1-py3-none-any.whl:

Publisher: publish.yml on tomanizer/j2py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page