Skip to main content

A pragmatic multi-language code parser optimized for LLM applications

Project description

Code Chunker

A pragmatic multi-language code parser optimized for LLM applications and RAG systems.

Features

  • Multi-language support: Python, JavaScript, TypeScript, Solidity, Go, Rust
  • Optimized for LLMs: Provides structured output ideal for language models
  • Lightweight: Minimal dependencies, fast parsing
  • Configurable: Adjust chunk sizes, confidence thresholds, and more
  • Easy to use: Simple API with both file and directory parsing

Installation

pip install code-chunker

Quick Start

from code_chunker import CodeChunker

# Initialize the chunker
chunker = CodeChunker()

# Parse a code string
code = """
def hello_world():
    print("Hello, World!")
"""

result = chunker.parse(code, language='python')

# Print the chunks
for chunk in result.chunks:
    print(f"{chunk.type}: {chunk.name}")

# Parse a file
result = chunker.parse_file('example.py')

# Parse a directory
results = chunker.parse_directory('src/')

Configuration

from code_chunker import CodeChunker, ChunkerConfig

config = ChunkerConfig(
    max_chunk_size=2000,
    min_chunk_size=100,
    include_comments=True,
    confidence_threshold=0.8
)

chunker = CodeChunker(config=config)

Supported Languages

  • Python (.py)
  • JavaScript (.js, .jsx)
  • TypeScript (.ts, .tsx)
  • Solidity (.sol)
  • Go (.go)
  • Rust (.rs)

Examples

The examples/ directory contains several examples demonstrating different features:

Basic Usage

Simple parsing examples:

python examples/basic_usage.py

Advanced Usage

Custom configuration and analysis:

python examples/advanced_usage.py

RAG Integration

Integration with RAG systems:

python examples/rag_integration.py

Edge Cases

Testing various edge cases across languages:

python examples/edge_cases.py

Performance Analysis

Analyze parsing performance:

python examples/performance_analysis.py

Code Quality Analysis

Analyze code quality metrics:

python examples/quality_analysis.py <file_path>

Visualization

Generate code structure visualization:

python examples/visualization.py <file_path>

API Reference

CodeChunker

The main class for parsing code.

chunker = CodeChunker(config=None)

Methods

  • parse(code: str, language: str) -> ParseResult: Parse a code string
  • parse_file(file_path: Union[str, Path]) -> ParseResult: Parse a file
  • parse_directory(directory: Union[str, Path], recursive: bool = True, extensions: Optional[List[str]] = None) -> List[ParseResult]: Parse a directory

ParseResult

The result of parsing code.

Attributes

  • language: str: The programming language
  • file_path: Optional[str]: Path to the source file
  • chunks: List[CodeChunk]: List of code chunks
  • imports: List[Import]: List of imports
  • exports: List[str]: List of exports
  • raw_code: str: The original code

CodeChunk

Represents a piece of code.

Attributes

  • type: ChunkType: The type of chunk (function, class, etc.)
  • name: Optional[str]: The name of the chunk
  • code: str: The actual code
  • start_line: int: Starting line number
  • end_line: int: Ending line number
  • language: str: Programming language
  • confidence: float: Confidence score (0-1)
  • metadata: Dict[str, Any]: Additional metadata

Dependencies

  • For basic usage: No external dependencies
  • For performance analysis: psutil
  • For visualization: Modern web browser to view generated HTML

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install development dependencies:
    pip install -e ".[dev]"
    
  3. Run tests:
    pytest
    
  4. Format code:
    black code_chunker/
    

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you find this project helpful, consider supporting its development:

  • ⭐ Star this repository
  • 🐛 Report bugs and suggest features
  • 🤝 Submit pull requests
  • 💰 EVM(ETH, ARB, BNB, OP..etc): 0x8f74959530dba14394b27faac92955aa96927e8b

Acknowledgments

Thanks to all contributors and the open-source community for their support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_chunker-1.1.0.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_chunker-1.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file code_chunker-1.1.0.tar.gz.

File metadata

  • Download URL: code_chunker-1.1.0.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-1.1.0.tar.gz
Algorithm Hash digest
SHA256 debe0a0a39cccc4642d57f78280eceeb719f79843c5c099fb4f614be35f24c5d
MD5 294541fe6180b6884d00af0c2cab5e7a
BLAKE2b-256 280b91109707f98ae51045b7dc2a298ee60e3b092c13ae229cc5c337710fa4ad

See more details on using hashes here.

File details

Details for the file code_chunker-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: code_chunker-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1b03ac1c564227a566e474dbae4c8a5da6fc28ce9c4d16471eb2e3f9c19e11c8
MD5 e13711fd7f885acde2b526be0139bec7
BLAKE2b-256 da6e73eb501d2483f357fe06b6010c467d5d9e4c4ffb6f8f3aa48235e74648ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page