A pragmatic multi-language code parser optimized for LLM applications
Project description
Code Chunker
A pragmatic multi-language code parser optimized for LLM applications and RAG systems.
Features
- Multi-language support: Python, JavaScript, TypeScript, Solidity, Go, Rust
- Optimized for LLMs: Provides structured output ideal for language models
- Lightweight: Minimal dependencies, fast parsing
- Configurable: Adjust chunk sizes, confidence thresholds, and more
- Easy to use: Simple API with both file and directory parsing
Installation
pip install code-chunker
Quick Start
from code_chunker import CodeChunker
# Initialize the chunker
chunker = CodeChunker()
# Parse a code string
code = """
def hello_world():
print("Hello, World!")
"""
result = chunker.parse(code, language='python')
# Print the chunks
for chunk in result.chunks:
print(f"{chunk.type}: {chunk.name}")
# Parse a file
result = chunker.parse_file('example.py')
# Parse a directory
results = chunker.parse_directory('src/')
Configuration
from code_chunker import CodeChunker, ChunkerConfig
config = ChunkerConfig(
max_chunk_size=2000,
min_chunk_size=100,
include_comments=True,
confidence_threshold=0.8
)
chunker = CodeChunker(config=config)
Supported Languages
- Python (.py)
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Solidity (.sol)
- Go (.go)
- Rust (.rs)
Examples
The examples/ directory contains several examples demonstrating different features:
Basic Usage
Simple parsing examples:
python examples/basic_usage.py
Advanced Usage
Custom configuration and analysis:
python examples/advanced_usage.py
RAG Integration
Integration with RAG systems:
python examples/rag_integration.py
Edge Cases
Testing various edge cases across languages:
python examples/edge_cases.py
Performance Analysis
Analyze parsing performance:
python examples/performance_analysis.py
Code Quality Analysis
Analyze code quality metrics:
python examples/quality_analysis.py <file_path>
Visualization
Generate code structure visualization:
python examples/visualization.py <file_path>
API Reference
CodeChunker
The main class for parsing code.
chunker = CodeChunker(config=None)
Methods
parse(code: str, language: str) -> ParseResult: Parse a code stringparse_file(file_path: Union[str, Path]) -> ParseResult: Parse a fileparse_directory(directory: Union[str, Path], recursive: bool = True, extensions: Optional[List[str]] = None) -> List[ParseResult]: Parse a directory
ParseResult
The result of parsing code.
Attributes
language: str: The programming languagefile_path: Optional[str]: Path to the source filechunks: List[CodeChunk]: List of code chunksimports: List[Import]: List of importsexports: List[str]: List of exportsraw_code: str: The original code
CodeChunk
Represents a piece of code.
Attributes
type: ChunkType: The type of chunk (function, class, etc.)name: Optional[str]: The name of the chunkcode: str: The actual codestart_line: int: Starting line numberend_line: int: Ending line numberlanguage: str: Programming languageconfidence: float: Confidence score (0-1)metadata: Dict[str, Any]: Additional metadata
Dependencies
- For basic usage: No external dependencies
- For performance analysis:
psutil - For visualization: Modern web browser to view generated HTML
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Development Setup
- Clone the repository
- Install development dependencies:
pip install -e ".[dev]"
- Run tests:
pytest
- Format code:
black code_chunker/
License
This project is licensed under the MIT License - see the LICENSE file for details.
Support
If you find this project helpful, consider supporting its development:
- ⭐ Star this repository
- 🐛 Report bugs and suggest features
- 🤝 Submit pull requests
- 💰 EVM(ETH, ARB, BNB, OP..etc):
0x8f74959530dba14394b27faac92955aa96927e8b
Acknowledgments
Thanks to all contributors and the open-source community for their support.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file code_chunker-1.1.0.tar.gz.
File metadata
- Download URL: code_chunker-1.1.0.tar.gz
- Upload date:
- Size: 18.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
debe0a0a39cccc4642d57f78280eceeb719f79843c5c099fb4f614be35f24c5d
|
|
| MD5 |
294541fe6180b6884d00af0c2cab5e7a
|
|
| BLAKE2b-256 |
280b91109707f98ae51045b7dc2a298ee60e3b092c13ae229cc5c337710fa4ad
|
File details
Details for the file code_chunker-1.1.0-py3-none-any.whl.
File metadata
- Download URL: code_chunker-1.1.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b03ac1c564227a566e474dbae4c8a5da6fc28ce9c4d16471eb2e3f9c19e11c8
|
|
| MD5 |
e13711fd7f885acde2b526be0139bec7
|
|
| BLAKE2b-256 |
da6e73eb501d2483f357fe06b6010c467d5d9e4c4ffb6f8f3aa48235e74648ac
|