Schema-driven AST parsing and semantic diff for 170+ languages
Project description
loraxMod-py
Python binding for LoraxMod - schema-driven AST parsing and semantic diff.
Installation
pip install loraxmod
Features
- 170 languages - Via tree-sitter-language-pack
- Schema-driven extraction - Reads node-types.json dynamically from GitHub
- Semantic diff - Detects renames, additions, modifications
- Portable core - schema.py, extractor.py, differ.py translate to JS/C#
Quick Start
from loraxmod import Parser
# Works for any of 170 languages
parser = Parser("javascript")
tree = parser.parse("function greet(name) { return name; }")
# S-expression output (corpus format with field names)
str(tree.root_node) # '(source_file (function_declaration name: (identifier) ...))'
# Extract by node type
functions = parser.extract_by_type(tree, ["function_declaration"])
for func in functions:
print(func.identity) # 'greet'
print(func.to_dict())
# Semantic diff
old_code = "function foo() { return 1; }"
new_code = "function bar() { return 1; }"
diff = parser.diff(old_code, new_code)
for change in diff.changes:
print(change.change_type.value, change.old_identity, change.new_identity)
# rename function_definition:foo function_definition:bar
# Full text diff (not truncated)
diff = parser.diff(old_code, new_code, include_full_text=True)
print(diff.changes[0].old_value) # Full function text
Schema Caching
Schemas are fetched from GitHub and cached locally:
from loraxmod import get_available_languages, list_cached_schemas, clear_schema_cache
# List all 170 available languages
get_available_languages()
# See what's cached
list_cached_schemas()
# Clear cache (re-fetches on next use)
clear_schema_cache()
Cache location: ~/.cache/loraxmod/{version}/
Schema API
from loraxmod import SchemaReader
schema = SchemaReader.from_file("node-types.json")
# Get fields for node type
schema.get_fields("function_declaration")
# {'name': {...}, 'parameters': {...}, 'body': {...}}
# Resolve semantic intent to field
schema.resolve_intent("function_declaration", "identifier")
# 'name'
# Full extraction plan
schema.get_extraction_plan("function_declaration")
# {'identifier': 'name', 'parameters': 'parameters', ...}
Development
# Clone repo
git clone https://github.com/jackyHardDisk/loraxMod
cd loraxMod/loraxMod-py
# Install dev mode
pip install -e ".[dev]"
# Run tests (fetches schemas from GitHub on first run)
pytest
Architecture
loraxmod/
schema.py PORTABLE - JSON schema reader
extractor.py PORTABLE - Schema-driven extraction
differ.py PORTABLE - Semantic diff engine
parser.py tree-sitter + language-pack wrapper
schema_cache.py GitHub schema fetcher with cache
Value Proposition
Schema-Driven Code Analysis Across 170 Languages
vs. regex/grep: Understands code structure, not just text patterns vs. language-specific tools: One API for 170 languages vs. AST libraries: No manual node traversal, schema does the work vs. text diffs: Reports semantic changes (rename, add, modify) not line changes
Use cases:
- Code analysis tools: Find functions/classes/imports across polyglot codebases
- ML/LLM feature extraction: Convert code → structured JSON
- Version control intelligence: "3 functions renamed, 2 added" vs "+50/-40 lines"
- Migration tools: Find deprecated API usages across languages
- Code search: "Find error handling blocks" without regex
Future Integration
Hybrid approach: Combine with code embeddings (jina-embeddings-v3) for:
- Refactor impact analysis (find similar code patterns)
- Smart merge conflicts (detect rename vs logic change)
- Cross-language consistency (Python service + JS client)
- "Explain this diff" with LLM context
See ../CLAUDE.md for roadmap.
License
MIT License - See LICENSE
Third-party licenses: THIRD-PARTY-LICENSES.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loraxmod-0.1.0.tar.gz.
File metadata
- Download URL: loraxmod-0.1.0.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a813d8abeeb5d5adf5c03ce11e6528a9d934e40bf7acafcb15bd24c9ce75326
|
|
| MD5 |
1d65b4dd1396d8f2cf05dc22999983e1
|
|
| BLAKE2b-256 |
17743ab796c2644ffb7bc72d955d1eac68f5d3f7417227066a6dd56f3a6e6ec9
|
File details
Details for the file loraxmod-0.1.0-py3-none-any.whl.
File metadata
- Download URL: loraxmod-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3d98a2cc9a5dfa2d80ac0dedd82ef24022349d03c4c7cdadc125120d61113e5
|
|
| MD5 |
87da2f05821838d2197a4a18d99e0bea
|
|
| BLAKE2b-256 |
3abfd54d71cd52f0d2789902ac98a3a4435661a50ccc0ccc0914f85fd6c95b71
|