Skip to main content

Code2Logic - Source code to logical representation converter for LLM analysis, featuring Tree-sitter parsing, dependency graph analysis, and multi-language support.

Project description

Code2Logic

alt text

PyPI version Python 3.9+ License: Apache-2.0

Convert source code to logical representation for LLM analysis.

Code2Logic analyzes codebases and generates compact, LLM-friendly representations with semantic understanding. Perfect for feeding project context to AI assistants, building code documentation, or analyzing code structure.

โœจ Features

  • ๐ŸŒณ Multi-language support - Python, JavaScript, TypeScript, Java, Go, Rust, and more
  • ๐ŸŽฏ Tree-sitter AST parsing - 99% accuracy with graceful fallback
  • ๐Ÿ“Š NetworkX dependency graphs - PageRank, hub detection, cycle analysis
  • ๐Ÿ” Rapidfuzz similarity - Find duplicate and similar functions
  • ๐Ÿง  NLP intent extraction - Human-readable function descriptions
  • ๐Ÿ“ฆ Zero dependencies - Core works without any external libs

๐Ÿš€ Installation

Basic (no dependencies)

pip install code2logic

Full (all features)

pip install code2logic[full]

Selective features

pip install code2logic[treesitter]  # High-accuracy AST parsing
pip install code2logic[graph]       # Dependency analysis
pip install code2logic[similarity]  # Similar function detection
pip install code2logic[nlp]         # Enhanced intents

๐Ÿ“– Quick Start

code2logic ./ -f yaml --compact --function-logic --with-schema -o project.yaml
code2logic ./ -f toon --ultra-compact --function-logic --with-schema -o project.toon

Command Line

# Standard Markdown output
code2logic /path/to/project

# If the `code2logic` entrypoint is not available (e.g. running from source without install):
python -m code2logic /path/to/project

# Compact YAML (14% smaller, meta.legend transparency)
code2logic /path/to/project -f yaml --compact -o analysis-compact.yaml

# Ultra-compact TOON (71% smaller, single-letter keys)
code2logic /path/to/project -f toon --ultra-compact -o analysis-ultra.toon


# Generate schema alongside output
code2logic /path/to/project -f yaml --compact --with-schema

# With detailed analysis
code2logic /path/to/project -d detailed

alt text

Python API

from code2logic import analyze_project, MarkdownGenerator

# Analyze a project
project = analyze_project("/path/to/project")

# Generate output
generator = MarkdownGenerator()
output = generator.generate(project, detail_level='standard')
print(output)

# Access analysis results
print(f"Files: {project.total_files}")
print(f"Lines: {project.total_lines}")
print(f"Languages: {project.languages}")

# Get hub modules (most important)
hubs = [p for p, n in project.dependency_metrics.items() if n.is_hub]
print(f"Key modules: {hubs}")

Organized Imports

# Core analysis
from code2logic import ProjectInfo, ProjectAnalyzer, analyze_project

# Format generators
from code2logic import (
    YAMLGenerator,
    JSONGenerator,
    TOONGenerator,
    LogicMLGenerator,
    GherkinGenerator,
)

# LLM clients
from code2logic import get_client, BaseLLMClient

# Development tools
from code2logic import run_benchmark, CodeReviewer

๐Ÿ“‹ Output Formats

Markdown (default)

Human-readable documentation with:

  • Project structure tree with hub markers (โ˜…)
  • Dependency graphs with PageRank scores
  • Classes with methods and intents
  • Functions with signatures and descriptions

Compact

Ultra-compact format optimized for LLM context:

# myproject | 102f 31875L | typescript:79/python:23
ENTRY: index.ts main.py
HUBS: evolution-manager llm-orchestrator

[core/evolution]
  evolution-manager.ts (3719L) C:EvolutionManager | F:createEvolutionManager
  task-queue.ts (139L) C:TaskQueue,Task

JSON

Machine-readable format for:

  • RAG (Retrieval-Augmented Generation)
  • Database storage
  • Further analysis

๐Ÿ”ง Configuration

Library Status

Check which features are available:

code2logic --status
Library Status:
  tree_sitter: โœ“
  networkx: โœ“
  rapidfuzz: โœ“
  nltk: โœ—
  spacy: โœ—

LLM Configuration

Manage LLM providers, models, API keys, and routing priorities:

code2logic llm status
code2logic llm set-provider auto
code2logic llm set-model openrouter nvidia/nemotron-3-nano-30b-a3b:free
code2logic llm key set openrouter <OPENROUTER_API_KEY>
code2logic llm priority set-provider openrouter 10
code2logic llm priority set-mode provider-first
code2logic llm priority set-llm-model nvidia/nemotron-3-nano-30b-a3b:free 5
code2logic llm priority set-llm-family nvidia/ 5
code2logic llm config list

Notes:

  • code2logic llm set-provider auto enables automatic fallback selection: providers are tried in priority order.
  • API keys should be stored in .env (or environment variables), not in litellm_config.yaml.
  • These commands write configuration files:
    • .env in the current working directory
    • litellm_config.yaml in the current working directory
    • ~/.code2logic/llm_config.json in your home directory

Priority modes

You can choose how automatic fallback ordering is computed:

  • provider-first providers are ordered by provider priority (defaults + overrides)
  • model-first providers are ordered by priority rules for the provider's configured model (exact/prefix)
  • mixed providers are ordered by the best (lowest) priority from either provider priority or model rules

Configure the mode:

code2logic llm priority set-mode provider-first
code2logic llm priority set-mode model-first
code2logic llm priority set-mode mixed

Model priority rules are stored in ~/.code2logic/llm_config.json.

Python API (Library Status)

from code2logic import get_library_status

status = get_library_status()
# {'tree_sitter': True, 'networkx': True, ...}

๐Ÿ“Š Analysis Features

Dependency Analysis

  • PageRank - Identifies most important modules
  • Hub detection - Central modules marked with โ˜…
  • Cycle detection - Find circular dependencies
  • Clustering - Group related modules

Intent Generation

Functions get human-readable descriptions:

methods:
  async findById(id:string) -> Promise<User>  # retrieves user by id
  async createUser(data:UserDTO) -> Promise<User>  # creates user
  validateEmail(email:string) -> boolean  # validates email

Similarity Detection

Find duplicate and similar functions:

Similar Functions:
  core/auth.ts::validateToken:
    - python/auth.py::validate_token (92%)
    - services/jwt.ts::verifyToken (85%)

๐Ÿ—๏ธ Architecture

code2logic/
โ”œโ”€โ”€ analyzer.py      # Main orchestrator
โ”œโ”€โ”€ parsers.py       # Tree-sitter + fallback parser
โ”œโ”€โ”€ dependency.py    # NetworkX dependency analysis
โ”œโ”€โ”€ similarity.py    # Rapidfuzz similar detection
โ”œโ”€โ”€ intent.py        # NLP intent generation
โ”œโ”€โ”€ generators.py    # Output generators (MD/Compact/JSON)
โ”œโ”€โ”€ models.py        # Data structures
โ””โ”€โ”€ cli.py           # Command-line interface

๐Ÿ”Œ Integration Examples

With Claude/ChatGPT

from code2logic import analyze_project, CompactGenerator

project = analyze_project("./my-project")
context = CompactGenerator().generate(project)

# Use in your LLM prompt
prompt = f"""
Analyze this codebase and suggest improvements:

{context}
"""

With RAG Systems

import json
from code2logic import analyze_project, JSONGenerator

project = analyze_project("./my-project")
data = json.loads(JSONGenerator().generate(project))

# Index in vector DB
for module in data['modules']:
    for func in module['functions']:
        embed_and_store(
            text=f"{func['name']}: {func['intent']}",
            metadata={'path': module['path'], 'type': 'function'}
        )

๐Ÿงช Development

Setup

git clone https://github.com/wronai/code2logic
cd code2logic
poetry install --with dev -E full
poetry run pre-commit install

# Alternatively, you can use Makefile targets (prefer Poetry if available)
make install-full

Tests

make test
make test-cov

# Or directly:
poetry run pytest
poetry run pytest --cov=code2logic --cov-report=html

Type Checking

make typecheck

# Or directly:
poetry run mypy code2logic

Linting

make lint
make format

# Or directly:
poetry run ruff check code2logic
poetry run black code2logic

๐Ÿ“ˆ Performance

Codebase Size Files Lines Time Output Size
Small 10 1K <1s ~5KB
Medium 100 30K ~2s ~50KB
Large 500 150K ~10s ~200KB

Compact format is ~10-15x smaller than Markdown.

๐Ÿ”ฌ Code Reproduction Benchmarks

Code2Logic can reproduce code from specifications using LLMs. Benchmark results:

Format Comparison (Token Efficiency)

Format Score Token Efficiency Spec Tokens Runs OK
YAML 71.1% 42.1 366 66.7%
Markdown 65.6% 48.7 385 100%
JSON 61.9% 23.7 605 66.7%
Gherkin 51.3% 19.1 411 66.7%

Key Findings

  • YAML is best for score - 71.1% reproduction accuracy
  • Markdown is best for token efficiency - 48.7 score/1000 tokens
  • YAML uses 39.6% fewer tokens than JSON with 9.2% higher score
  • Markdown has 100% runs OK - generated code always executes

Run Benchmarks

# Token-aware benchmark
python examples/11_token_benchmark.py --folder tests/samples/ --no-llm

# Async multi-format benchmark
python examples/09_async_benchmark.py --folder tests/samples/ --no-llm

# Function-level reproduction
python examples/10_function_reproduction.py --file tests/samples/sample_functions.py --no-llm

python examples/15_unified_benchmark.py --folder tests/samples/ --no-llm

# Terminal markdown rendering demo
python examples/16_terminal_demo.py --folder tests/samples/

๐Ÿค Contributing

Contributions welcome! Please read our Contributing Guide.

๐Ÿ“„ License

Apache 2 License - see LICENSE for details.

๐Ÿ”„ Companion Packages

logic2test - Generate Tests from Logic

Generate test scaffolds from Code2Logic output:

# Show what can be generated
python -m logic2test out/code2logic/project.c2l.yaml --summary

# Generate unit tests
python -m logic2test out/code2logic/project.c2l.yaml -o out/logic2test/tests/

# Generate all test types (unit, integration, property)
python -m logic2test out/code2logic/project.c2l.yaml -o out/logic2test/tests/ --type all
from logic2test import TestGenerator

generator = TestGenerator('out/code2logic/project.c2l.yaml')
result = generator.generate_unit_tests('out/logic2test/tests/')
print(f"Generated {result.tests_generated} tests")

logic2code - Generate Code from Logic

Generate source code from Code2Logic output:

# Show what can be generated
python -m logic2code out/code2logic/project.c2l.yaml --summary

# Generate Python code
python -m logic2code out/code2logic/project.c2l.yaml -o out/logic2code/generated_code/

# Generate stubs only
python -m logic2code out/code2logic/project.c2l.yaml -o out/logic2code/generated_code/ --stubs-only
from logic2code import CodeGenerator

generator = CodeGenerator('out/code2logic/project.c2l.yaml')
result = generator.generate('out/logic2code/generated_code/')
print(f"Generated {result.files_generated} files")

Full Workflow: Code โ†’ Logic โ†’ Tests/Code

# 1. Analyze existing codebase
code2logic src/ -f yaml -o out/code2logic/project.c2l.yaml

# 2. Generate tests for the codebase
python -m logic2test out/code2logic/project.c2l.yaml -o out/logic2test/tests/ --type all

# 3. Generate code scaffolds (for refactoring)
python -m logic2code out/code2logic/project.c2l.yaml -o out/logic2code/generated_code/ --stubs-only

๐Ÿ“š Documentation

๐Ÿงฉ Examples

๐Ÿ”— Links

License

Apache License 2.0 - see LICENSE for details.

Author

Created by Tom Sapletta - tom@sapletta.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code2logic-1.0.37.tar.gz (184.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code2logic-1.0.37-py3-none-any.whl (206.7 kB view details)

Uploaded Python 3

File details

Details for the file code2logic-1.0.37.tar.gz.

File metadata

  • Download URL: code2logic-1.0.37.tar.gz
  • Upload date:
  • Size: 184.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for code2logic-1.0.37.tar.gz
Algorithm Hash digest
SHA256 3b3b747f51ffa11673e515c10c9fb85e621deebfef5d3d30ea395623e9f137bc
MD5 c75ac4fa61f69964c139e52fe2a2c546
BLAKE2b-256 a8f53715b75d1c30f616754761c27d5c93670cac53cf5286936de6aaf3a6a764

See more details on using hashes here.

File details

Details for the file code2logic-1.0.37-py3-none-any.whl.

File metadata

  • Download URL: code2logic-1.0.37-py3-none-any.whl
  • Upload date:
  • Size: 206.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for code2logic-1.0.37-py3-none-any.whl
Algorithm Hash digest
SHA256 7172d37630ec0847a7a0f11abe4190fd5c7e2974a52b1ad1f9787993972be271
MD5 4d1d427a05c96f1a57f71a98294cf0d0
BLAKE2b-256 169c26eb7cf3ad69c76eea4368f69b192b415f029d8391c9a24d3cda2b9f5495

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page