Skip to main content

A pragmatic multi-language code parser optimized for LLM applications

Project description

Code Chunker

A pragmatic multi-language code parser optimized for LLM applications and RAG systems.

Features

  • Multi-language support: Python, JavaScript, TypeScript, Solidity, Go, Rust
  • Optimized for LLMs: Provides structured output ideal for language models
  • Lightweight: Minimal dependencies, fast parsing
  • Configurable: Adjust chunk sizes, confidence thresholds, and more
  • Easy to use: Simple API with both file and directory parsing

Installation

pip install code-chunker

Quick Start

from code_chunker import CodeChunker

# Initialize the chunker
chunker = CodeChunker()

# Parse a code string
code = """
def hello_world():
    print("Hello, World!")
"""

result = chunker.parse(code, language='python')

# Print the chunks
for chunk in result.chunks:
    print(f"{chunk.type}: {chunk.name}")

# Parse a file
result = chunker.parse_file('example.py')

# Parse a directory
results = chunker.parse_directory('src/')

Configuration

from code_chunker import CodeChunker, ChunkerConfig

config = ChunkerConfig(
    max_chunk_size=2000,
    min_chunk_size=100,
    include_comments=True,
    confidence_threshold=0.8
)

chunker = CodeChunker(config=config)

Supported Languages

  • Python (.py)
  • JavaScript (.js, .jsx)
  • TypeScript (.ts, .tsx)
  • Solidity (.sol)
  • Go (.go)
  • Rust (.rs)

Advanced Usage

For more advanced usage examples, check out the examples/ directory:

  • basic_usage.py: Simple parsing examples
  • advanced_usage.py: Custom configuration and analysis
  • rag_integration.py: Integration with RAG systems

API Reference

CodeChunker

The main class for parsing code.

chunker = CodeChunker(config=None)

Methods

  • parse(code: str, language: str) -> ParseResult: Parse a code string
  • parse_file(file_path: Union[str, Path]) -> ParseResult: Parse a file
  • parse_directory(directory: Union[str, Path], recursive: bool = True, extensions: Optional[List[str]] = None) -> List[ParseResult]: Parse a directory

ParseResult

The result of parsing code.

Attributes

  • language: str: The programming language
  • file_path: Optional[str]: Path to the source file
  • chunks: List[CodeChunk]: List of code chunks
  • imports: List[Import]: List of imports
  • exports: List[str]: List of exports
  • raw_code: str: The original code

CodeChunk

Represents a piece of code.

Attributes

  • type: ChunkType: The type of chunk (function, class, etc.)
  • name: Optional[str]: The name of the chunk
  • code: str: The actual code
  • start_line: int: Starting line number
  • end_line: int: Ending line number
  • language: str: Programming language
  • confidence: float: Confidence score (0-1)
  • metadata: Dict[str, Any]: Additional metadata

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install development dependencies:
    pip install -e ".[dev]"
    
  3. Run tests:
    pytest
    
  4. Format code:
    black code_chunker/
    

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you find this project helpful, consider supporting its development:

  • ⭐ Star this repository
  • 🐛 Report bugs and suggest features
  • 🤝 Submit pull requests
  • 💰 Support on PayPal
  • ₿ BTC: bc1qxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • Ξ ETH: 0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Acknowledgments

Thanks to all contributors and the open-source community for their support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_chunker-0.1.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_chunker-0.1.1-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file code_chunker-0.1.1.tar.gz.

File metadata

  • Download URL: code_chunker-0.1.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f6b00e9bcf736788ba4c442fd46d0c4ea8a65b1b15760a3996eef7a7107739cc
MD5 15d6a8476a40fb784692163c878e8d3e
BLAKE2b-256 63d78e632ccb98624037ea236d5adc7fb783741cd243b72963e7a04f2f7397fe

See more details on using hashes here.

File details

Details for the file code_chunker-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: code_chunker-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 edbc0f86c249bbddff4ff68a2180a9d65a1fb2625b40565cbb7c2c1cf7db9377
MD5 6a4b5e8d5a92840967354f622ae628e8
BLAKE2b-256 ebe33acb5cdba21baab0a3c1fa38d4ab42055a882be0fe3c632c4997cbf96f3d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page