Merger is a tool that scans a directory, filters files using customizable patterns, and merges readable content into a single output file.
Project description
Merger CLI
Merger is a command-line utility for developers that scans a directory, filters files using customizable ignore patterns, and merges all readable content into a single structured JSON output file. It supports custom file parsers, making it easily extendable for formats such as .pdf or any domain-specific format.
Summary
- Core Features
- Dependencies
- Installation with PyPI
- Build and Install Locally
- Usage
- Custom Parsers
- CLI Options
- License
Core Features
- Recursive merge of all readable files under a root directory.
- Glob-based ignore patterns using
.gitignore-style syntax. - Automatic binary validation and parsing.
- Modular parser system for custom formats.
- CLI support for installation, removal, and listing of custom parsers.
- Structured JSON merged output, including a file tree.
Dependencies
| Component | Version | Notes |
|---|---|---|
| Python | ≥ 3.8 | Required |
All dependencies are listed in requirements.txt.
Installation with PyPI
pip install merger-cli
Build and Install Locally
1. Clone the repository
git clone https://github.com/diogotoporcov/merger-cli.git
cd merger-cli
2. Create and activate a virtual environment
Linux / macOS
python -m venv .venv
source .venv/bin/activate
Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1
3. Install dependencies
pip install -r requirements.txt
4. Install as CLI tool
pip install .
Usage
Basic merge
merger ./src
Custom ignore patterns
merger ./project ./output.json --ignore "*.log" "__pycache__" "*.tmp"
Custom ignore file
merger . ./output.json --merger-ignore "C:\Users\USER\Desktop\merger.ignore"
Verbose output
merger ./src ./merger.json --log-level DEBUG
Custom Parsers
Merger uses parser strategies to support parsing of non-text file formats.
Parser Abstract Class
All parsers must inherit from Parser:
from merger.parsing.parser import Parser
Required structure:
EXTENSIONS: Set[str]CHUNK_BYTES_FOR_VALIDATION: Optional[int]validate(cls, file_chunk_bytes, *, file_path=None, logger=None) -> boolparse(cls, file_bytes, *, file_path=None, logger=None) -> str
Installing a Custom Parser
merger --install-module path/to/parser.py
To uninstall a module:
merger --uninstall-module <module_id>
To remove all modules:
merger --uninstall-module *
To list installed modules:
merger --list-modules
Custom Parser Example (PDF)
import logging
from pathlib import Path
from typing import Union, Optional, Any, Set, Type
import fitz
from merger.parsing.parser import Parser
class PdfParser(Parser):
EXTENSIONS: Set[str] = {".pdf"}
CHUNK_BYTES_FOR_VALIDATION: Optional[int] = None
@classmethod
def validate(
cls,
file_chunk_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None
) -> bool:
"""
Validate that the given file represents a readable PDF document.
Args:
file_chunk_bytes: Binary contents of the file being validated, sufficient to perform validation.
file_path: Path of the file being validated.
logger: Optional logger instance for logging.
Returns:
bool: True if the file is a readable PDF, False otherwise.
"""
try:
with fitz.open(file_path) as doc:
_ = doc[0]
return True
except Exception:
return False
@classmethod
def parse(
cls,
file_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None,
) -> str:
"""
Extracts and concatenates text from all pages of a PDF file.
Args:
file_bytes: Binary contents of the file being parsed.
file_path: Path of the file being parsed.
logger: ptional logger instance for logging.
Returns:
"""
texts = []
with fitz.open(stream=file_bytes) as doc:
for page in doc:
text = page.get_text()
if text:
text = text.replace("\n\n", "")
texts.append(text)
full_text = " ".join(texts)
return full_text
parser_cls: Type[Parser] = PdfParser
The module must expose a
parser_clsobject referencing the parser class.
This implementation is available at examples/custom_parsers/pdf_parser.py.
Output Format
The merged result is a single JSON file containing:
- Directory tree
- File and directory names
- Relative paths to the root
- Extracted text content
CLI Options
| Option | Description |
|---|---|
input_dir |
Root directory to scan for files |
output_path |
Path to save merged JSON output (default: ./merger.json) |
-i, --install-module |
Install a custom parser module |
-u, --uninstall-module |
Uninstall a parser module by ID (* removes all) |
-l, --list-modules |
List installed parser modules |
--ignore |
Glob-style ignore patterns |
-mi, --merger-ignore |
Ignore file (default: ./merger.ignore) |
--version |
Show installed version |
-ll, --log-level |
Set logging verbosity |
License
This project is licensed under the MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merger_cli-2.0.1.tar.gz.
File metadata
- Download URL: merger_cli-2.0.1.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a3a223acf88120d312f053e1a70ae86e4f3061a8cd86d274e33b1f4a4f697032
|
|
| MD5 |
665cad61ebc3faf699a63413e2bd026f
|
|
| BLAKE2b-256 |
bd67af17c8e9005bf1c7630ce6331ef3ab9cb97a19b118c577303655a703835a
|
Provenance
The following attestation bundles were made for merger_cli-2.0.1.tar.gz:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-2.0.1.tar.gz -
Subject digest:
a3a223acf88120d312f053e1a70ae86e4f3061a8cd86d274e33b1f4a4f697032 - Sigstore transparency entry: 763951142
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@4de88a38696429d89eade973765c8a0e992449a1 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4de88a38696429d89eade973765c8a0e992449a1 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file merger_cli-2.0.1-py3-none-any.whl.
File metadata
- Download URL: merger_cli-2.0.1-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8544c08d640bfc606033f64ea091d0ededeb9fd83a289377206b08cd682aada9
|
|
| MD5 |
839f6e8df011bf35365bcc1dfec3a364
|
|
| BLAKE2b-256 |
83cf1e6f843d9735f243c7b1803c93d66ab81ad9b9d3c4f7055307aff5fefee3
|
Provenance
The following attestation bundles were made for merger_cli-2.0.1-py3-none-any.whl:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-2.0.1-py3-none-any.whl -
Subject digest:
8544c08d640bfc606033f64ea091d0ededeb9fd83a289377206b08cd682aada9 - Sigstore transparency entry: 763951145
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@4de88a38696429d89eade973765c8a0e992449a1 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@4de88a38696429d89eade973765c8a0e992449a1 -
Trigger Event:
workflow_dispatch
-
Statement type: