Skip to main content

Schema-driven AST parsing and semantic diff for 170+ languages

Project description

loraxMod-py

Python binding for LoraxMod - schema-driven AST parsing and semantic diff.

Installation

pip install loraxmod

Features

  • 170 languages - Via tree-sitter-language-pack
  • Schema-driven extraction - Reads node-types.json dynamically from GitHub
  • Semantic diff - Detects renames, additions, modifications
  • Portable core - schema.py, extractor.py, differ.py translate to JS/C#

Quick Start

from loraxmod import Parser

# Works for any of 170 languages
parser = Parser("javascript")
tree = parser.parse("function greet(name) { return name; }")

# S-expression output (corpus format with field names)
str(tree.root_node)  # '(source_file (function_declaration name: (identifier) ...))'

# Extract by node type
functions = parser.extract_by_type(tree, ["function_declaration"])
for func in functions:
    print(func.identity)  # 'greet'
    print(func.to_dict())

# Semantic diff
old_code = "function foo() { return 1; }"
new_code = "function bar() { return 1; }"
diff = parser.diff(old_code, new_code)
for change in diff.changes:
    print(change.change_type.value, change.old_identity, change.new_identity)
    # rename function_definition:foo function_definition:bar

# Full text diff (not truncated)
diff = parser.diff(old_code, new_code, include_full_text=True)
print(diff.changes[0].old_value)  # Full function text

Schema Caching

Schemas are fetched from GitHub and cached locally:

from loraxmod import get_available_languages, list_cached_schemas, clear_schema_cache

# List all 170 available languages
get_available_languages()

# See what's cached
list_cached_schemas()

# Clear cache (re-fetches on next use)
clear_schema_cache()

Cache location: ~/.cache/loraxmod/{version}/

Schema API

from loraxmod import SchemaReader

schema = SchemaReader.from_file("node-types.json")

# Get fields for node type
schema.get_fields("function_declaration")
# {'name': {...}, 'parameters': {...}, 'body': {...}}

# Resolve semantic intent to field
schema.resolve_intent("function_declaration", "identifier")
# 'name'

# Full extraction plan
schema.get_extraction_plan("function_declaration")
# {'identifier': 'name', 'parameters': 'parameters', ...}

Development

# Clone repo
git clone https://github.com/jackyHardDisk/loraxMod
cd loraxMod/loraxMod-py

# Install dev mode
pip install -e ".[dev]"

# Run tests (fetches schemas from GitHub on first run)
pytest

Architecture

loraxmod/
  schema.py         PORTABLE - JSON schema reader
  extractor.py      PORTABLE - Schema-driven extraction
  differ.py         PORTABLE - Semantic diff engine
  parser.py         tree-sitter + language-pack wrapper
  schema_cache.py   GitHub schema fetcher with cache

Value Proposition

Schema-Driven Code Analysis Across 170 Languages

vs. regex/grep: Understands code structure, not just text patterns vs. language-specific tools: One API for 170 languages vs. AST libraries: No manual node traversal, schema does the work vs. text diffs: Reports semantic changes (rename, add, modify) not line changes

Use cases:

  • Code analysis tools: Find functions/classes/imports across polyglot codebases
  • ML/LLM feature extraction: Convert code → structured JSON
  • Version control intelligence: "3 functions renamed, 2 added" vs "+50/-40 lines"
  • Migration tools: Find deprecated API usages across languages
  • Code search: "Find error handling blocks" without regex

Future Integration

Hybrid approach: Combine with code embeddings (jina-embeddings-v3) for:

  • Refactor impact analysis (find similar code patterns)
  • Smart merge conflicts (detect rename vs logic change)
  • Cross-language consistency (Python service + JS client)
  • "Explain this diff" with LLM context

See ../CLAUDE.md for roadmap.

License

MIT License - See LICENSE

Third-party licenses: THIRD-PARTY-LICENSES.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loraxmod-0.1.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loraxmod-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file loraxmod-0.1.0.tar.gz.

File metadata

  • Download URL: loraxmod-0.1.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loraxmod-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8a813d8abeeb5d5adf5c03ce11e6528a9d934e40bf7acafcb15bd24c9ce75326
MD5 1d65b4dd1396d8f2cf05dc22999983e1
BLAKE2b-256 17743ab796c2644ffb7bc72d955d1eac68f5d3f7417227066a6dd56f3a6e6ec9

See more details on using hashes here.

File details

Details for the file loraxmod-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: loraxmod-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loraxmod-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c3d98a2cc9a5dfa2d80ac0dedd82ef24022349d03c4c7cdadc125120d61113e5
MD5 87da2f05821838d2197a4a18d99e0bea
BLAKE2b-256 3abfd54d71cd52f0d2789902ac98a3a4435661a50ccc0ccc0914f85fd6c95b71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page