Skip to main content

A pragmatic multi-language code parser optimized for LLM applications

Project description

Code Chunker

A pragmatic multi-language code parser optimized for LLM applications and RAG systems.

Features

  • Multi-language support: Python, JavaScript, TypeScript, Solidity, Go, Rust
  • Optimized for LLMs: Provides structured output ideal for language models
  • Lightweight: Minimal dependencies, fast parsing
  • Configurable: Adjust chunk sizes, confidence thresholds, and more
  • Easy to use: Simple API with both file and directory parsing

Installation

pip install code-chunker

Quick Start

from code_chunker import CodeChunker

# Initialize the chunker
chunker = CodeChunker()

# Parse a code string
code = """
def hello_world():
    print("Hello, World!")
"""

result = chunker.parse(code, language='python')

# Print the chunks
for chunk in result.chunks:
    print(f"{chunk.type}: {chunk.name}")

# Parse a file
result = chunker.parse_file('example.py')

# Parse a directory
results = chunker.parse_directory('src/')

Configuration

from code_chunker import CodeChunker, ChunkerConfig

config = ChunkerConfig(
    max_chunk_size=2000,
    min_chunk_size=100,
    include_comments=True,
    confidence_threshold=0.8
)

chunker = CodeChunker(config=config)

Supported Languages

  • Python (.py)
  • JavaScript (.js, .jsx)
  • TypeScript (.ts, .tsx)
  • Solidity (.sol)
  • Go (.go)
  • Rust (.rs)

Advanced Usage

For more advanced usage examples, check out the examples/ directory:

  • basic_usage.py: Simple parsing examples
  • advanced_usage.py: Custom configuration and analysis
  • rag_integration.py: Integration with RAG systems

API Reference

CodeChunker

The main class for parsing code.

chunker = CodeChunker(config=None)

Methods

  • parse(code: str, language: str) -> ParseResult: Parse a code string
  • parse_file(file_path: Union[str, Path]) -> ParseResult: Parse a file
  • parse_directory(directory: Union[str, Path], recursive: bool = True, extensions: Optional[List[str]] = None) -> List[ParseResult]: Parse a directory

ParseResult

The result of parsing code.

Attributes

  • language: str: The programming language
  • file_path: Optional[str]: Path to the source file
  • chunks: List[CodeChunk]: List of code chunks
  • imports: List[Import]: List of imports
  • exports: List[str]: List of exports
  • raw_code: str: The original code

CodeChunk

Represents a piece of code.

Attributes

  • type: ChunkType: The type of chunk (function, class, etc.)
  • name: Optional[str]: The name of the chunk
  • code: str: The actual code
  • start_line: int: Starting line number
  • end_line: int: Ending line number
  • language: str: Programming language
  • confidence: float: Confidence score (0-1)
  • metadata: Dict[str, Any]: Additional metadata

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install development dependencies:
    pip install -e ".[dev]"
    
  3. Run tests:
    pytest
    
  4. Format code:
    black code_chunker/
    

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you find this project helpful, consider supporting its development:

  • ⭐ Star this repository
  • 🐛 Report bugs and suggest features
  • 🤝 Submit pull requests
  • 💰 EVM(ETH, ARB, BNB, OP..etc): 0x8f74959530dba14394b27faac92955aa96927e8b
  • ₿ BTC: bc1qxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • Ξ ETH: 0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Acknowledgments

Thanks to all contributors and the open-source community for their support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_chunker-0.1.2.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_chunker-0.1.2-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file code_chunker-0.1.2.tar.gz.

File metadata

  • Download URL: code_chunker-0.1.2.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-0.1.2.tar.gz
Algorithm Hash digest
SHA256 0cce5d211f677a31cb220cf058e6cd4f76d187679f9d848236ba9133516b0bc1
MD5 8e344e726be7607c1f24a51fcfbf2630
BLAKE2b-256 d020aa71f1c205194495c76f577affcdea187d281f1c99731dd2bfbaa8d4d534

See more details on using hashes here.

File details

Details for the file code_chunker-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: code_chunker-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cc9fac07a2e1b24441eab9bf45fd1f76eb4405bcda6878811f10032e936940f4
MD5 9ec81da2b93e4b0013616011fef49f4d
BLAKE2b-256 3235007e983f35d4c6b463d8db662f058006c2c0510dfdd9e95ba7c8ab24fd09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page