Enhanced Java parser with Java 9-22 support, fork of javalang
Project description
Ljavalang
感谢AI时代吧,停止维护的项目可以以非常极低成本的方式继续维护。
Ljavalang是一个完全由python实现的java源代码parser,用于将java源代码转化为AST结构,用于后续静态分析。 Ljavalang是javalang 的增强版本,修复上游 AST 构造缺陷并支持 Java 9-22 新语法,并修复了原项目中的所有已知bug。
安装
pip install ljavalang
代码中仍然 import javalang 使用,与上游完全兼容。
快速开始
>>> import javalang
>>> tree = javalang.parse.parse('package com.example; class Test {}')
>>> tree.package.name
'com.example'
>>> tree.types[0].name
'Test'
新语法示例
Java 14 switch expression:
>>> code = '''
... class T {
... int m(int x) {
... return switch(x) {
... case 1 -> 10;
... case 2 -> 20;
... default -> 0;
... };
... }
... }'''
>>> tree = javalang.parse.parse(code)
>>> # return 语句中的表达式是 SwitchExpression
>>> tree.types[0].body[0].body[0].expression
SwitchExpression
Java 16 record:
>>> tree = javalang.parse.parse('record Point(int x, int y) {}')
>>> tree.types[0]
RecordDeclaration
>>> tree.types[0].name
'Point'
Java 21 record pattern:
>>> code = '''
... class T {
... record Point(int x, int y) {}
... void m(Object o) {
... switch(o) {
... case Point(int x, int y) -> System.out.println(x + y);
... default -> {}
... }
... }
... }'''
>>> javalang.parse.parse(code) # 正常解析
链式调用(核心 bug 修复):
>>> code = 'class T { void m(String cmd) { Runtime.getRuntime().exec(cmd); } }'
>>> tree = javalang.parse.parse(code)
>>> # 上游会把 exec 错误地放入 selectors 列表
>>> # Ljavalang 正确解析为嵌套的 MethodInvocation 限定符链
Visitor 模式遍历
from javalang.visitor import JavaVisitor
class MethodCollector(JavaVisitor):
def __init__(self):
self.methods = []
def visit_MethodDeclaration(self, node):
self.methods.append(node.name)
self.generic_visit(node)
collector = MethodCollector()
collector.visit(tree)
print(collector.methods) # ['foo', 'bar', ...]
Token 位置范围
from javalang.tokenizer import tokenize
code = 'int x = 42;'
for token in tokenize(code):
r = token.position.range
print(f'{token.value} -> code[{r.start}:{r.stop}] = {code[r]!r}')
# int -> code[0:3] = 'int'
# x -> code[4:5] = 'x'
# = -> code[6:7] = '='
# 42 -> code[8:10] = '42'
AST 节点 end_position
>>> code = 'class T { void m() { try { int x = 1; } catch (Exception e) {} } }'
>>> tree = javalang.parse.parse(code)
>>> tree.types[0].end_position
Position(line=1, column=66, range=slice(65, 66, None))
>>> tree.types[0].body[0].end_position # MethodDeclaration
Position(line=1, column=64, range=slice(63, 64, None))
支持的 Java 语法特性
完整列表(点击展开)
Java 8(上游已支持)
- Lambda 表达式
- 方法引用
- 类型注解
- 接口 default/static 方法
- 通用 try-with-resources
- Receiver parameter(
Inner.this参数)
Java 9
try-with-resources effectively final 变量module-info.java(module / open module / requires / exports / opens / uses / provides)- 接口 private 方法
- 匿名类 diamond 操作符
Java 10-11
var局部变量类型推断var在 for-each / try-with-resources 中var在 lambda 参数中
Java 14
- Switch expression(
case X ->箭头语法) - Switch expression 表达式级别(
return switch(...)/ 赋值右值) - 多标签 case(
case 1, 2, 3 ->) yield语句- Pattern matching
instanceof(obj instanceof String s)
Java 15
- Text block(
"""..."""三引号字符串)
Java 16
record类声明- 局部 record / enum(方法体内)
- record 作为类成员
Java 17
sealedclass / interfacepermits子句non-sealed修饰符
Java 21
- Pattern matching switch(
case String s ->) - Record pattern 解构(
case Point(int x, int y) ->) - 嵌套 record pattern
case null匹配
Java 22
- Unnamed variable
_ - Unnamed lambda 参数
项目结构
Ljavalang/
├── pyproject.toml # 打包配置(PEP 621)
├── javalang/
│ ├── parse.py # 入口:parse() / parse_expression() 等
│ ├── parser.py # 递归下降解析器(~2800 行)
│ ├── tokenizer.py # 词法分析器(~700 行)
│ ├── tree.py # AST 节点定义(~340 行)
│ ├── visitor.py # Visitor 模式遍历
│ └── test/ # 测试用例(112 个)
│ ├── test_java_9_syntax.py
│ ├── test_java_10_11_syntax.py
│ ├── test_java_14_15_syntax.py
│ ├── test_java_16_17_syntax.py
│ ├── test_java_21_syntax.py
│ ├── test_upstream_issues.py # 上游 bug 回归测试
│ └── test_upstream_features.py # 上游 feature 测试
└── docs/
├── changelog.md # 版本变更记录
├── architecture.md # 架构文档
├── java-version-roadmap.md # 版本支持路线图
├── upstream-issues.md # 151 个上游 issue 分类
├── upstream-prs.md # 43 个上游 PR 分析
└── issue-fix-progress.md # 修复进度追踪
致谢
基于 c2nes/javalang(作者 Chris Thunes)开发
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ljavalang-2.0.2.tar.gz.
File metadata
- Download URL: ljavalang-2.0.2.tar.gz
- Upload date:
- Size: 43.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9eb8bbdbc7f7bfd748d5da73c99c4c137bbc90499efa78cf4adda70def47171
|
|
| MD5 |
ffd8d7fb41f672d69d646fd089a9e99a
|
|
| BLAKE2b-256 |
7b83831db10074f0ea4082f079679300f5c75a726681fef423ce9674e06740af
|
File details
Details for the file ljavalang-2.0.2-py3-none-any.whl.
File metadata
- Download URL: ljavalang-2.0.2-py3-none-any.whl
- Upload date:
- Size: 45.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d1dfc88353600ba9b0cc6c2357fc95ae943f3b1af0296e4c855307d220ad18e
|
|
| MD5 |
8ccab5a899be86937ed719d2b5a05f65
|
|
| BLAKE2b-256 |
4a65952a1f037c9e81b8d18e148bec64f30eedec391a365a1bdf3478128e6dbf
|