Skip to main content

A pragmatic multi-language code parser optimized for LLM applications

Project description

Code Chunker

A pragmatic multi-language code parser optimized for LLM applications and RAG systems.

Features

  • Multi-language support: Python, JavaScript, TypeScript, Solidity, Go, Rust
  • Optimized for LLMs: Provides structured output ideal for language models
  • Lightweight: Minimal dependencies, fast parsing
  • Configurable: Adjust chunk sizes, confidence thresholds, and more
  • Easy to use: Simple API with both file and directory parsing

Installation

pip install code-chunker

Quick Start

from code_chunker import CodeChunker

# Initialize the chunker
chunker = CodeChunker()

# Parse a code string
code = """
def hello_world():
    print("Hello, World!")
"""

result = chunker.parse(code, language='python')

# Print the chunks
for chunk in result.chunks:
    print(f"{chunk.type}: {chunk.name}")

# Parse a file
result = chunker.parse_file('example.py')

# Parse a directory
results = chunker.parse_directory('src/')

Configuration

from code_chunker import CodeChunker, ChunkerConfig

config = ChunkerConfig(
    max_chunk_size=2000,
    min_chunk_size=100,
    include_comments=True,
    confidence_threshold=0.8
)

chunker = CodeChunker(config=config)

Supported Languages

  • Python (.py)
  • JavaScript (.js, .jsx)
  • TypeScript (.ts, .tsx)
  • Solidity (.sol)
  • Go (.go)
  • Rust (.rs)

Advanced Usage

For more advanced usage examples, check out the examples/ directory:

  • basic_usage.py: Simple parsing examples
  • advanced_usage.py: Custom configuration and analysis
  • rag_integration.py: Integration with RAG systems

API Reference

CodeChunker

The main class for parsing code.

chunker = CodeChunker(config=None)

Methods

  • parse(code: str, language: str) -> ParseResult: Parse a code string
  • parse_file(file_path: Union[str, Path]) -> ParseResult: Parse a file
  • parse_directory(directory: Union[str, Path], recursive: bool = True, extensions: Optional[List[str]] = None) -> List[ParseResult]: Parse a directory

ParseResult

The result of parsing code.

Attributes

  • language: str: The programming language
  • file_path: Optional[str]: Path to the source file
  • chunks: List[CodeChunk]: List of code chunks
  • imports: List[Import]: List of imports
  • exports: List[str]: List of exports
  • raw_code: str: The original code

CodeChunk

Represents a piece of code.

Attributes

  • type: ChunkType: The type of chunk (function, class, etc.)
  • name: Optional[str]: The name of the chunk
  • code: str: The actual code
  • start_line: int: Starting line number
  • end_line: int: Ending line number
  • language: str: Programming language
  • confidence: float: Confidence score (0-1)
  • metadata: Dict[str, Any]: Additional metadata

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Setup

  1. Clone the repository
  2. Install development dependencies:
    pip install -e ".[dev]"
    
  3. Run tests:
    pytest
    
  4. Format code:
    black code_chunker/
    

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

If you find this project helpful, consider supporting its development:

  • ⭐ Star this repository
  • 🐛 Report bugs and suggest features
  • 🤝 Submit pull requests
  • 💰 EVM(ETH, ARB, BNB, OP..etc): 0x8f74959530dba14394b27faac92955aa96927e8b
  • ₿ BTC: bc1qxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  • Ξ ETH: 0xXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Acknowledgments

Thanks to all contributors and the open-source community for their support.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

code_chunker-0.2.1.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

code_chunker-0.2.1-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file code_chunker-0.2.1.tar.gz.

File metadata

  • Download URL: code_chunker-0.2.1.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-0.2.1.tar.gz
Algorithm Hash digest
SHA256 5fbede26afa12dc6bc2efef8b15c0f2705c403a3ffc312080493364d1712e553
MD5 60683ca48ce49124b2e1c7ff3884ab9c
BLAKE2b-256 a7633f26bb8db916b618974e1546706d0cc245b9352f71850026ed091349b1c3

See more details on using hashes here.

File details

Details for the file code_chunker-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: code_chunker-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for code_chunker-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 15ef00377d81ef1c4b7a4e4cb2e9b9d2cc3840621a26b10e56416a226dbbbbff
MD5 2f40f0363352604d22503dd8924fb0e9
BLAKE2b-256 3e4d702d67fa77d82db94216f2b4d2f5efcb3b5c710f673a21545c304497d22c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page