Skip to main content

A pragmatic multi-language code parser optimized for LLM applications

Project description

Code Chunker

A pragmatic multi-language code parser optimized for LLM applications and RAG systems.

Features

  • Multi-language support: Python, JavaScript, TypeScript, Solidity, Go, Rust
  • Optimized for LLMs: Provides structured output ideal for language models
  • Lightweight: Minimal dependencies, fast parsing
  • Configurable: Adjust chunk sizes, confidence thresholds, and more
  • Easy to use: Simple API with both file and directory parsing

Installation

pip install code-chunker

Quick Start

from code_chunker import CodeChunker

# Initialize the chunker
chunker = CodeChunker()

# Parse a code string
code = """
def hello_world():
    print("Hello, World!")
"""

result = chunker.parse(code, language='python')

# Print the chunks
for chunk in result.chunks:
    print(f"{chunk.type}: {chunk.name}")

# Parse a file
result = chunker.parse_file('example.py')

# Parse a directory
results = chunker.parse_directory('src/')

Configuration

from code_chunker import CodeChunker, ChunkerConfig

config = ChunkerConfig(
    max_chunk_size=2000,
    min_chunk_size=100,
    include_comments=True,
    confidence_threshold=0.8
)

chunker = CodeChunker(config=config)

Supported Languages

  • Python (.py)
  • JavaScript (.js, .jsx)
  • TypeScript (.ts, .tsx)
  • Solidity (.sol)
  • Go (.go)
  • Rust (.rs)

Examples

The examples/ directory contains several examples demonstrating different features:

Basic Usage

Simple parsing examples:

python examples/basic_usage.py

Advanced Usage

Custom configuration and analysis:

python examples/advanced_usage.py

RAG Integration

Integration with RAG systems:

python examples/rag_integration.py

Edge Cases

Testing various edge cases across languages:

python examples/edge_cases.py

Performance Analysis

Analyze parsing performance:

python examples/performance_analysis.py

Code Quality Analysis

Analyze code quality metrics:

python examples/quality_analysis.py <file_path>

Visualization

Generate code structure visualization:

python examples/visualization.py <file_path>

API Reference

CodeChunker

The main class for parsing code.

chunker = CodeChunker(config=None)

Methods

  • parse(code: str, language: str) -> ParseResult: Parse a code string
  • parse_file(file_path: Union[str, Path]) -> ParseResult: Parse a file
  • parse_directory(directory: Union[str, Path], recursive: bool = True, extensions: Optional[List[str]] = None) -> List[ParseResult]: Parse a directory

ParseResult

The result of parsing code.

Attributes

  • language: str: The programming language
  • file_path: Optional[str]: Path to the source file
  • chunks: List[CodeChunk]: List of code chunks
  • imports: List[Import]: List of imports
  • exports: List[str]: List of exports
  • raw_code: str: The original code

CodeChunk

Represents a piece of code.

Attributes

  • type: ChunkType: The type of chunk (function, class, etc.)
  • name: Optional[str]: The name of the chunk
  • code: str: The actual code
  • start_line: int: Starting line number
  • end_line: int: Ending line number
  • language: str: Programming language
  • confidence: float: Confidence score (0-1)
  • metadata: Dict[str, Any]: Additional metadata

Dependencies

  • For basic usage: No external dependencies
  • For performance analysis: psutil
  • For visualization: Modern web browser to view generated HTML

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install development dependencies:
    pip install -e ".[dev]"
    
  3. Run tests:
    pytest
    
  4. Format code:
    black code_chunker/
    

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you find this project helpful, consider supporting its development:

  • ⭐ Star this repository
  • 🐛 Report bugs and suggest features
  • 🤝 Submit pull requests
  • 💰 EVM(ETH, ARB, BNB, OP..etc): 0x8f74959530dba14394b27faac92955aa96927e8b

Acknowledgments

Thanks to all contributors and the open-source community for their support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_chunker-1.1.1.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_chunker-1.1.1-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file code_chunker-1.1.1.tar.gz.

File metadata

  • Download URL: code_chunker-1.1.1.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-1.1.1.tar.gz
Algorithm Hash digest
SHA256 c22b233fc233210a3f4b91ad7e61d918a250710b9d9d91fc3e69592aad4bc5e9
MD5 d1472fca7b36b23b0892b978349377f1
BLAKE2b-256 7cf54c428c0a1c90dee5ac5fdc780441b8c1d92f1734cb5b8c6a0e0ed99341da

See more details on using hashes here.

File details

Details for the file code_chunker-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: code_chunker-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b0a58ced6b8e322d6f23ab582367570926064589a712137b3f27aa28c4b4f86f
MD5 25dfdf81d515bb4e6b4595f2a9ec9ed4
BLAKE2b-256 e6d3cd310bd2e1f10f22897171bf4ee065dfa9092c5ab01736cdd8f57f44aca5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page