Merger is a tool that scans a directory, filters files using customizable patterns, and merges readable content into a single output file.
Project description
Merger CLI
Merger is a command-line utility for developers that scans a directory, filters files using customizable ignore patterns, and merges all readable content into a single output file. It supports multiple output formats (e.g., JSON, directory tree, plain text with file delimiters), and can be extended with custom file parsers for formats, such as .pdf.
Summary
- Core Features
- Dependencies
- Installation with PyPI
- Build and Install Locally
- Usage
- Output Formats
- Custom Parsers
- CLI Options
- License
Core Features
- Recursive merge of all readable files under a root directory.
- Glob-based ignore patterns using
.gitignore-style syntax. - Automatic binary validation and parsing.
- Modular parser system for custom formats.
- CLI support for installation, removal, and listing of custom parsers.
- Multiple export formats.
Dependencies
| Component | Version | Notes |
|---|---|---|
| Python | ≥ 3.8 | Required |
All dependencies are listed in requirements.txt.
Installation with PyPI
pip install merger-cli
Build and Install Locally
1. Clone the repository
git clone https://github.com/diogotoporcov/merger-cli.git
cd merger-cli
2. Create and activate a virtual environment
Linux / macOS
python -m venv .venv
source .venv/bin/activate
Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1
3. Install dependencies
pip install -r requirements.txt
4. Install as CLI tool
pip install .
Usage
Basic merge
merger .
This writes a file named merger.txt in the current directory.
Save output to a specific directory
merger ./project ./out
This writes ./out/merger.txt (or ./out/merger.json, depending on the exporter).
Pick an output format
Use -e or --exporter to select the output format:
merger ./src --exporter JSON
merger ./src --exporter DIRECTORY_TREE
merger ./src --exporter PLAIN_TEXT
merger ./src --exporter TREE_PLAIN_TEXT
Custom ignore patterns
merger ./project --ignore "*.log" "__pycache__" "*.tmp"
Custom ignore file
merger . --merger-ignore "C:\Users\USER\Desktop\ignore.txt"
Verbose output
merger ./src --log-level DEBUG
Output Formats
Merger writes one output file to the output directory, named merger.<ext> based on the selected exporter.
| Exporter Name | File Extension | Description |
|---|---|---|
PLAIN_TEXT |
.txt |
Plain-text merged file contents with <<FILE_START>> / <<FILE_END>> file delimiter. |
DIRECTORY_TREE |
.txt |
Directory tree only. |
TREE_PLAIN_TEXT |
.txt |
Directory tree + plain-text merged file contents (default). |
JSON |
.json |
Structured JSON representing the directory tree and file contents. |
Custom Parsers
Merger uses parser strategies to support parsing of non-text file formats.
Parser Abstract Class
All parsers must inherit from Parser:
from merger.parsing.parser import Parser
Required structure:
EXTENSIONS: Set[str]MAX_BYTES_FOR_VALIDATION: Optional[int]validate(cls, file_chunk_bytes, *, file_path=None, logger=None) -> boolparse(cls, file_bytes, *, file_path=None, logger=None) -> str
Installing a Custom Parser
merger --install-module path/to/parser.py
To uninstall a module:
merger --uninstall-module <module_id>
To remove all modules:
merger --uninstall-module *
To list installed modules:
merger --list-modules
Custom Parser Implementation Example (PDF)
import logging
from pathlib import Path
from typing import Union, Optional, Any, Set, Type
import fitz
from merger.parsing.parser import Parser
class PdfParser(Parser):
EXTENSIONS: Set[str] = {".pdf"}
MAX_BYTES_FOR_VALIDATION: Optional[int] = None
@classmethod
def validate(
cls,
file_chunk_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None
) -> bool:
"""
Validate that the given file represents a readable PDF document.
Args:
file_chunk_bytes: Binary contents of the file being validated, sufficient to perform validation.
file_path: Path of the file being validated.
logger: Optional logger instance for logging.
Returns:
bool: True if the file is a readable PDF, False otherwise.
"""
try:
with fitz.open(file_path) as doc:
_ = doc[0]
return True
except Exception:
return False
@classmethod
def parse(
cls,
file_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None,
) -> str:
"""
Extracts and concatenates text from all pages of a PDF file.
Args:
file_bytes: Binary contents of the file being parsed.
file_path: Path of the file being parsed.
logger: ptional logger instance for logging.
Returns:
"""
texts = []
with fitz.open(stream=file_bytes) as doc:
for page in doc:
text = page.get_text()
if text:
text = text.replace("\n\n", "")
texts.append(text)
full_text = " ".join(texts)
return full_text
parser_cls: Type[Parser] = PdfParser
The module must expose a
parser_clsobject referencing the parser class.
This implementation is available at examples/custom_parsers/pdf_parser.py.
CLI Options
| Option | Description |
|---|---|
input_dir |
Root directory to scan for files. |
output_path |
Output directory where the tool writes merger.<ext> (default: current directory). |
-e, --exporter |
Output exporter strategy (e.g., TREE_PLAIN_TEXT, PLAIN_TEXT, DIRECTORY_TREE, JSON). |
-i, --install-module |
Install a custom parser module. |
-u, --uninstall-module |
Uninstall a parser module by ID (* removes all). |
-l, --list-modules |
List installed parser modules. |
--ignore |
Glob-style ignore patterns. |
--merger-ignore |
File containing glob-style patterns to ignore (default: ./merger.ignore). |
--version |
Show installed version. |
--log-level |
Set logging verbosity. |
License
This project is licensed under the MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merger_cli-2.1.0.tar.gz.
File metadata
- Download URL: merger_cli-2.1.0.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35cd8706c58fe7d2939615ae496c8a19bf6ec454d3b3fdefc7acedef18e35ce3
|
|
| MD5 |
1fc0e18292b77e80ac60a05af1e6fa61
|
|
| BLAKE2b-256 |
e2fae8d21365eb27e056ec4544569fbb66345bd1dcb5d8dd5dc67dfa62ab3b33
|
Provenance
The following attestation bundles were made for merger_cli-2.1.0.tar.gz:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-2.1.0.tar.gz -
Subject digest:
35cd8706c58fe7d2939615ae496c8a19bf6ec454d3b3fdefc7acedef18e35ce3 - Sigstore transparency entry: 766388575
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@c6b3480f0aac16ab0bbcdbb9a68e2091596cbb23 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c6b3480f0aac16ab0bbcdbb9a68e2091596cbb23 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file merger_cli-2.1.0-py3-none-any.whl.
File metadata
- Download URL: merger_cli-2.1.0-py3-none-any.whl
- Upload date:
- Size: 22.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1908ac6eccc517b068b484429222ccf9a4e4a32ab0731db68f6efcab8a4a7601
|
|
| MD5 |
bcf970562c7ad1d9ea731b3d52b8d161
|
|
| BLAKE2b-256 |
f9554ad930e3354175ff3066debdd7af94510875e88cb191a3606d32c66860bc
|
Provenance
The following attestation bundles were made for merger_cli-2.1.0-py3-none-any.whl:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-2.1.0-py3-none-any.whl -
Subject digest:
1908ac6eccc517b068b484429222ccf9a4e4a32ab0731db68f6efcab8a4a7601 - Sigstore transparency entry: 766388601
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@c6b3480f0aac16ab0bbcdbb9a68e2091596cbb23 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c6b3480f0aac16ab0bbcdbb9a68e2091596cbb23 -
Trigger Event:
workflow_dispatch
-
Statement type: