Skip to main content

Enhanced Java parser with Java 9-22 support, fork of javalang

Project description

Ljavalang

感谢AI时代吧,停止维护的项目可以以非常极低成本的方式继续维护。

Ljavalang是一个完全由python实现的java源代码parser,用于将java源代码转化为AST结构,用于后续静态分析。 Ljavalang是javalang 的增强版本,修复上游 AST 构造缺陷并支持 Java 9-22 新语法,并修复了原项目中的所有已知bug。

PyPI Python GitHub Actions

安装

pip install ljavalang

代码中仍然 import javalang 使用,与上游完全兼容。

快速开始

>>> import javalang
>>> tree = javalang.parse.parse('package com.example; class Test {}')
>>> tree.package.name
'com.example'
>>> tree.types[0].name
'Test'

新语法示例

Java 14 switch expression:

>>> code = '''
... class T {
...     int m(int x) {
...         return switch(x) {
...             case 1 -> 10;
...             case 2 -> 20;
...             default -> 0;
...         };
...     }
... }'''
>>> tree = javalang.parse.parse(code)
>>> # return 语句中的表达式是 SwitchExpression
>>> tree.types[0].body[0].body[0].expression
SwitchExpression

Java 16 record:

>>> tree = javalang.parse.parse('record Point(int x, int y) {}')
>>> tree.types[0]
RecordDeclaration
>>> tree.types[0].name
'Point'

Java 21 record pattern:

>>> code = '''
... class T {
...     record Point(int x, int y) {}
...     void m(Object o) {
...         switch(o) {
...             case Point(int x, int y) -> System.out.println(x + y);
...             default -> {}
...         }
...     }
... }'''
>>> javalang.parse.parse(code)  # 正常解析

链式调用(核心 bug 修复):

>>> code = 'class T { void m(String cmd) { Runtime.getRuntime().exec(cmd); } }'
>>> tree = javalang.parse.parse(code)
>>> # 上游会把 exec 错误地放入 selectors 列表
>>> # Ljavalang 正确解析为嵌套的 MethodInvocation 限定符链

Visitor 模式遍历

from javalang.visitor import JavaVisitor

class MethodCollector(JavaVisitor):
    def __init__(self):
        self.methods = []

    def visit_MethodDeclaration(self, node):
        self.methods.append(node.name)
        self.generic_visit(node)

collector = MethodCollector()
collector.visit(tree)
print(collector.methods)  # ['foo', 'bar', ...]

Token 位置范围

from javalang.tokenizer import tokenize

code = 'int x = 42;'
for token in tokenize(code):
    r = token.position.range
    print(f'{token.value} -> code[{r.start}:{r.stop}] = {code[r]!r}')
# int -> code[0:3] = 'int'
# x -> code[4:5] = 'x'
# = -> code[6:7] = '='
# 42 -> code[8:10] = '42'

AST 节点 end_position

>>> code = 'class T { void m() { try { int x = 1; } catch (Exception e) {} } }'
>>> tree = javalang.parse.parse(code)
>>> tree.types[0].end_position
Position(line=1, column=66, range=slice(65, 66, None))
>>> tree.types[0].body[0].end_position  # MethodDeclaration
Position(line=1, column=64, range=slice(63, 64, None))

支持的 Java 语法特性

完整列表(点击展开)

Java 8(上游已支持)

  • Lambda 表达式
  • 方法引用
  • 类型注解
  • 接口 default/static 方法
  • 通用 try-with-resources
  • Receiver parameter(Inner.this 参数)

Java 9

  • try-with-resources effectively final 变量
  • module-info.java(module / open module / requires / exports / opens / uses / provides)
  • 接口 private 方法
  • 匿名类 diamond 操作符

Java 10-11

  • var 局部变量类型推断
  • var 在 for-each / try-with-resources 中
  • var 在 lambda 参数中

Java 14

  • Switch expression(case X -> 箭头语法)
  • Switch expression 表达式级别(return switch(...) / 赋值右值)
  • 多标签 case(case 1, 2, 3 ->
  • yield 语句
  • Pattern matching instanceofobj instanceof String s

Java 15

  • Text block("""...""" 三引号字符串)

Java 16

  • record 类声明
  • 局部 record / enum(方法体内)
  • record 作为类成员

Java 17

  • sealed class / interface
  • permits 子句
  • non-sealed 修饰符

Java 21

  • Pattern matching switch(case String s ->
  • Record pattern 解构(case Point(int x, int y) ->
  • 嵌套 record pattern
  • case null 匹配

Java 22

  • Unnamed variable _
  • Unnamed lambda 参数

项目结构

Ljavalang/
├── pyproject.toml    # 打包配置(PEP 621)
├── javalang/
│   ├── parse.py      # 入口:parse() / parse_expression() 等
│   ├── parser.py     # 递归下降解析器(~2800 行)
│   ├── tokenizer.py  # 词法分析器(~700 行)
│   ├── tree.py       # AST 节点定义(~340 行)
│   ├── visitor.py    # Visitor 模式遍历
│   └── test/         # 测试用例(112 个)
│       ├── test_java_9_syntax.py
│       ├── test_java_10_11_syntax.py
│       ├── test_java_14_15_syntax.py
│       ├── test_java_16_17_syntax.py
│       ├── test_java_21_syntax.py
│       ├── test_upstream_issues.py     # 上游 bug 回归测试
│       └── test_upstream_features.py   # 上游 feature 测试
└── docs/
    ├── changelog.md              # 版本变更记录
    ├── architecture.md           # 架构文档
    ├── java-version-roadmap.md   # 版本支持路线图
    ├── upstream-issues.md        # 151 个上游 issue 分类
    ├── upstream-prs.md           # 43 个上游 PR 分析
    └── issue-fix-progress.md    # 修复进度追踪

致谢

基于 c2nes/javalang(作者 Chris Thunes)开发

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ljavalang-2.0.2.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ljavalang-2.0.2-py3-none-any.whl (45.4 kB view details)

Uploaded Python 3

File details

Details for the file ljavalang-2.0.2.tar.gz.

File metadata

  • Download URL: ljavalang-2.0.2.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ljavalang-2.0.2.tar.gz
Algorithm Hash digest
SHA256 c9eb8bbdbc7f7bfd748d5da73c99c4c137bbc90499efa78cf4adda70def47171
MD5 ffd8d7fb41f672d69d646fd089a9e99a
BLAKE2b-256 7b83831db10074f0ea4082f079679300f5c75a726681fef423ce9674e06740af

See more details on using hashes here.

File details

Details for the file ljavalang-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: ljavalang-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 45.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for ljavalang-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2d1dfc88353600ba9b0cc6c2357fc95ae943f3b1af0296e4c855307d220ad18e
MD5 8ccab5a899be86937ed719d2b5a05f65
BLAKE2b-256 4a65952a1f037c9e81b8d18e148bec64f30eedec391a365a1bdf3478128e6dbf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page