A pragmatic multi-language code parser optimized for LLM applications

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

Code Chunker

A pragmatic multi-language code parser optimized for LLM applications and RAG systems.

Features

Multi-language support: Python, JavaScript, TypeScript, Solidity, Go, Rust
Optimized for LLMs: Provides structured output ideal for language models
Lightweight: Minimal dependencies, fast parsing
Configurable: Adjust chunk sizes, confidence thresholds, and more
Easy to use: Simple API with both file and directory parsing
Incremental parsing: Efficiently update parse results when code changes
Enhanced language support:
- TypeScript/React: Component, Hook, and Context detection
- Solidity: Smart contract metadata extraction (visibility, modifiers, payable)
- Go: Concurrency pattern detection (goroutines, channels, mutexes)

Installation

pip install code-chunker

Quick Start

from code_chunker import CodeChunker

# Initialize the chunker
chunker = CodeChunker()

# Parse a code string
code = """
def hello_world():
    print("Hello, World!")
"""

result = chunker.parse(code, language='python')

# Print the chunks
for chunk in result.chunks:
    print(f"{chunk.type.value}: {chunk.name} (lines {chunk.start_line}-{chunk.end_line})")

# Parse a file
result = chunker.parse_file('example.py')

# Parse a directory
results = chunker.parse_directory('src/')

Configuration

from code_chunker import CodeChunker, ChunkerConfig

config = ChunkerConfig(
    max_chunk_size=2000,
    min_chunk_size=100,
    include_comments=True,
    confidence_threshold=0.8
)

chunker = CodeChunker(config=config)

Incremental Parsing

Incremental parsing allows you to efficiently update parse results when code changes, without reparsing the entire file.

from code_chunker import CodeChunker, IncrementalParser

# Initialize the incremental parser
incremental_parser = IncrementalParser()

# First parse (full parse)
result1 = incremental_parser.full_parse("path/to/file.py")

# After file changes, perform an incremental parse
result2 = incremental_parser.incremental_parse("path/to/file.py")

# Compare the results
print(f"Full parse chunks: {len(result1.chunks)}")
print(f"Incremental parse chunks: {len(result2.chunks)}")

Enhanced Language Support

TypeScript/React Support

Code Chunker provides specialized support for React components, hooks, and contexts:

from code_chunker import CodeChunker, ChunkerConfig, get_config_for_use_case

# Get React-optimized configuration
config = ChunkerConfig(**get_config_for_use_case('typescript', 'react'))
chunker = CodeChunker(config=config)

# Parse React component
result = chunker.parse(react_code, language='typescript')

# Filter for React components
components = [chunk for chunk in result.chunks if chunk.type.value == 'component']
for component in components:
    print(f"Component: {component.name} (type: {component.metadata.get('component_type')})")

Solidity Smart Contract Support

Enhanced metadata extraction for smart contracts:

from code_chunker import CodeChunker, ChunkerConfig, get_config_for_use_case

# Get Solidity-optimized configuration
config = ChunkerConfig(**get_config_for_use_case('solidity', 'contract'))
chunker = CodeChunker(config=config)

# Parse Solidity contract
result = chunker.parse(contract_code, language='solidity')

# Find payable functions
payable_functions = [
    chunk for chunk in result.chunks 
    if chunk.type.value == 'function' and chunk.metadata.get('is_payable', False)
]

Go Concurrency Pattern Detection

Automatically detect concurrency patterns in Go code:

from code_chunker import CodeChunker, ChunkerConfig, get_config_for_use_case

# Get Go-optimized configuration
config = ChunkerConfig(**get_config_for_use_case('go', 'performance'))
chunker = CodeChunker(config=config)

# Parse Go code
result = chunker.parse(go_code, language='go')

# Find functions with goroutines
concurrent_funcs = [
    chunk for chunk in result.chunks 
    if chunk.type.value in ['function', 'method'] 
    and 'goroutines' in chunk.metadata.get('concurrency_patterns', {})
]

Supported Languages

Python (.py)
JavaScript (.js, .jsx)
TypeScript (.ts, .tsx)
Solidity (.sol)
Go (.go)
Rust (.rs)

Examples

The examples/ directory contains several examples demonstrating different features:

Basic Usage

Simple parsing examples:

python examples/basic_usage.py

Advanced Usage

Custom configuration and analysis:

python examples/advanced_usage.py

Incremental Parsing

Efficient parsing of code changes:

python examples/incremental_parsing.py

RAG Integration

Integration with RAG systems:

python examples/rag_integration.py

Edge Cases

Testing various edge cases across languages:

python examples/edge_cases.py

Performance Analysis

Analyze parsing performance:

python examples/performance_analysis.py

Code Quality Analysis

Analyze code quality metrics:

python examples/quality_analysis.py <file_path>

Visualization

Generate code structure visualization:

python examples/visualization.py <file_path>

API Reference

CodeChunker

The main class for parsing code.

chunker = CodeChunker(config=None)

Methods

parse(code: str, language: str) -> ParseResult: Parse a code string
parse_file(file_path: Union[str, Path]) -> ParseResult: Parse a file
parse_directory(directory: Union[str, Path], recursive: bool = True, extensions: Optional[List[str]] = None) -> List[ParseResult]: Parse a directory

IncrementalParser

For efficient incremental parsing.

parser = IncrementalParser(chunker=None)

Methods

full_parse(file_path: str) -> ParseResult: Perform a full parse and cache the result
parse_incremental(file_path: str, changes: List[Tuple[int, int, str]]) -> ParseResult: Parse incrementally based on changes
invalidate_cache(file_path: Optional[str] = None) -> None: Invalidate cache for a file or all files

How Incremental Parsing Works

Initial Parse: The first parse of a file is a full parse, which is cached
Change Detection: When changes are made, only affected code regions are identified
Selective Reparsing: Only affected chunks are reparsed, preserving the rest
Result Merging: Updated chunks are merged with unchanged chunks
Smart Caching: Results are cached for future incremental updates

ParseResult

The result of parsing code.

Attributes

language: str: The programming language
file_path: Optional[str]: Path to the source file
chunks: List[CodeChunk]: List of code chunks
imports: List[Import]: List of imports
exports: List[str]: List of exports
raw_code: str: The original code

CodeChunk

Represents a piece of code.

Attributes

type: ChunkType: The type of chunk (function, class, etc.)
name: Optional[str]: The name of the chunk
code: str: The actual code
start_line: int: Starting line number
end_line: int: Ending line number
language: str: Programming language
confidence: float: Confidence score (0-1)
metadata: Dict[str, Any]: Additional metadata

Dependencies

For basic usage: No external dependencies
For performance analysis: psutil
For visualization: Modern web browser to view generated HTML

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

Clone the repository
Install development dependencies:
```
pip install -e ".[dev]"
```
Run tests:
```
pytest
```
Format code:
```
black code_chunker/
```

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you find this project helpful, consider supporting its development:

⭐ Star this repository
🐛 Report bugs and suggest features
🤝 Submit pull requests
💰 EVM(ETH, ARB, BNB, OP..etc): 0x8f74959530dba14394b27faac92955aa96927e8b

Acknowledgments

Thanks to all contributors and the open-source community for their support.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

1.3.2

May 10, 2025

This version

1.3.1

May 10, 2025

1.1.2

May 10, 2025

1.1.1

May 10, 2025

1.1.0

May 10, 2025

0.2.1

May 10, 2025

0.1.2

May 10, 2025

0.1.1

May 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_chunker-1.3.1.tar.gz (33.9 kB view details)

Uploaded May 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

code_chunker-1.3.1-py3-none-any.whl (40.8 kB view details)

Uploaded May 10, 2025 Python 3

File details

Details for the file code_chunker-1.3.1.tar.gz.

File metadata

Download URL: code_chunker-1.3.1.tar.gz
Upload date: May 10, 2025
Size: 33.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-1.3.1.tar.gz
Algorithm	Hash digest
SHA256	`a7156fe98be8e0d0e6a9c4e9d5bceaf3a39c1541a073617d519408b49f8b96f1`
MD5	`d0f6f8b3d7be04d90279414244165d1d`
BLAKE2b-256	`8ad72ddafd3bac07a3cddf98b0eb4d9c36217fe27fc61086ee8208cd22d1f0c1`

See more details on using hashes here.

File details

Details for the file code_chunker-1.3.1-py3-none-any.whl.

File metadata

Download URL: code_chunker-1.3.1-py3-none-any.whl
Upload date: May 10, 2025
Size: 40.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-1.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b6e6c1162784242e533b08fb44dd345187c99e683274b283f4a5a2a235f80072`
MD5	`af9bde51e507e74ed26ba2bf90cb3844`
BLAKE2b-256	`15cb2474b6c1cc75742b4538fcce94b322f571f02fcec0c2d9de351c05f61711`

See more details on using hashes here.

code-chunker 1.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Code Chunker

Features

Installation

Quick Start

Configuration

Incremental Parsing

Enhanced Language Support

TypeScript/React Support

Solidity Smart Contract Support

Go Concurrency Pattern Detection

Supported Languages

Examples

Basic Usage

Advanced Usage

Incremental Parsing

RAG Integration

Edge Cases

Performance Analysis

Code Quality Analysis

Visualization

API Reference

CodeChunker

Methods

IncrementalParser

Methods

How Incremental Parsing Works

ParseResult

Attributes

CodeChunk

Attributes

Dependencies

Contributing

Development Setup

License

Support

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes