Merger is a tool that scans a directory, filters files using customizable patterns, and merges readable content into a single output file.
Project description
Merger CLI
Merger is a command-line utility for developers that scans a directory, filters files using customizable ignore patterns, and merges all readable content into a single output file, suitable both for human reading and for use by AI models.
It supports multiple output formats (e.g., JSON, directory tree, plain text with file delimiters), and can be extended with custom file parsers for formats, such as .pdf.
Summary
- Core Features
- Dependencies
- Installation with PyPI
- Build and Install Locally
- Usage
- Ignore Pattern Syntax
- Output Formats
- Custom Parsers
- CLI Options
- License
Core Features
- Recursive merge of all readable files under a root directory.
- Custom glob-like ignore patterns for filtering (supports
*,**, anchoring, and file/dir qualifiers). - Automatic file encoding detection.
- Modular parser system for custom formats.
- CLI support for installation, removal, and listing of custom parsers.
- Multiple export formats.
Dependencies
| Component | Version | Notes |
|---|---|---|
| Python | ≥ 3.8 | Required |
All dependencies are listed in requirements.txt.
Installation with PyPI
pip install merger-cli
Build and Install Locally
1. Clone the repository
git clone https://github.com/diogotoporcov/merger-cli.git
cd merger-cli
2. Create and activate a virtual environment
Linux / macOS
python -m venv .venv
source .venv/bin/activate
Windows (PowerShell)
python -m venv .venv
.venv\Scripts\Activate.ps1
3. Install dependencies
pip install -r requirements.txt
4. Install as CLI tool
pip install .
Usage
Basic merge
merger .
This writes a file named merger.txt in the current directory.
Save output to a specific directory
merger ./project ./out
This writes ./out/merger.txt (or ./out/merger.json, depending on the exporter).
Pick an output format
Use -e or --exporter to select the output format:
merger ./src --exporter JSON
merger ./src --exporter DIRECTORY_TREE
merger ./src --exporter PLAIN_TEXT
merger ./src --exporter TREE_PLAIN_TEXT
Custom ignore patterns
Provide one or more ignore patterns with --ignore (see Ignore Pattern Syntax):
merger ./project --ignore "*.log" "__pycache__/**" "*.tmp"
Custom ignore file
Provide a file containing ignore patterns (one per line) with --merger-ignore (see Ignore Pattern Syntax):
merger . --merger-ignore "C:\Users\USER\Desktop\ignore.txt"
Verbose output
merger ./src --log-level DEBUG
Ignore Pattern Syntax
Ignore patterns are evaluated relative to the input directory (the directory you ask merger to scan). If a path is not located under that root, it will not match.
Segment matching
The pattern is split into segments and matched against the scanned path’s relative segments.
Supported segments:
- Literal segments (e.g.
src,tests,README.md) *matches exactly one path segment**matches zero or more path segments- Embedded
*inside a segment matchesprefix*suffix(e.g.foo*.py,*cache*)
Anchoring
-
Leading
/anchors the pattern to the scan root- Example:
/src/*.pymatchessrc/main.pybut notproject/src/main.py
- Example:
-
Leading
./anchors the pattern to the start of the relative path (equivalent anchoring behavior)- Example:
./src/*.pymatchessrc/main.pybut notproject/src/main.py
- Example:
-
Without anchoring, the pattern may match starting at any segment boundary within the relative path
- Example:
src/*.pymatches bothsrc/main.pyandproject/src/main.py
- Example:
Type qualifiers
-
Trailing
/requires the matched path to be a directory- Example:
build/matches thebuilddirectory entry
- Example:
-
Trailing
:requires the matched path to be a file- Example:
README.md:matches theREADME.mdfile
- Example:
-
Trailing
!:-
This is a special escape suffix that disables type qualification and preserves any trailing
/or:as literal characters in the final path segment- Examples:
data:!matches any file or directory literally nameddata:data::matches any file literally nameddata:data:/matches any directory literally nameddata:data!!matches any file or directory literally nameddata!data!/matches any directory literally nameddata!data!:matches any file literally nameddata!
- Examples:
-
Examples
Ignore all files or directorys that contains .log prefix:
*.log
Ignore all dist directories:
dist/
Ignore a file named config.json at the scan root:
/config.json:
Ignore all .py file directly under any src directory (but not deeper):
src/*.py:
Ignore all file or directories that contains cache and is only one level deep inside any directory named src:
src/*/*cache*
Ignore all __pycache__ directories inside the src directory at the scan root:
./src/**/__pycache__/
Ignore all files data::
data::
Ignore all directories data::
data:/
Ignore all files or directories data::
data:!
Output Formats
Merger writes one output file to the output directory, named merger.<extension> based on the selected exporter.
| Exporter Name | File Extension | Description |
|---|---|---|
PLAIN_TEXT |
.txt |
Plain-text merged file contents with <<FILE_START>> / <<FILE_END>> file delimiter. |
DIRECTORY_TREE |
.txt |
Directory tree only. |
TREE_PLAIN_TEXT |
.txt |
Directory tree + plain-text merged file contents (default). |
JSON |
.json |
Structured JSON representing the directory tree and file contents. |
Custom Parsers
Merger uses parser strategies to support parsing of non-text file formats.
Parser Abstract Class
All parsers must inherit from Parser:
from merger.parsing.parser import Parser
Required structure:
EXTENSIONS: Set[str]MAX_BYTES_FOR_VALIDATION: Optional[int]validate(cls, file_chunk_bytes, *, file_path=None, logger=None) -> boolparse(cls, file_bytes, *, file_path=None, logger=None) -> str
Installing a Custom Parser
merger --install-module path/to/parser.py
To uninstall a module:
merger --uninstall-module <module_id>
To remove all modules:
merger --uninstall-module *
To list installed modules:
merger --list-modules
Custom Parser Implementation Example (PDF)
import logging
from pathlib import Path
from typing import Union, Optional, Any, Set, Type
import fitz
from merger.parsing.parser import Parser
class PdfParser(Parser):
EXTENSIONS: Set[str] = {".pdf"}
MAX_BYTES_FOR_VALIDATION: Optional[int] = None
@classmethod
def validate(
cls,
file_chunk_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None
) -> bool:
"""
Validate that the given file represents a readable PDF document.
Args:
file_chunk_bytes: Binary contents of the file being validated, sufficient to perform validation.
file_path: Path of the file being validated.
logger: Optional logger instance for logging.
Returns:
bool: True if the file is a readable PDF, False otherwise.
"""
try:
with fitz.open(file_path) as doc:
_ = doc[0]
return True
except Exception:
return False
@classmethod
def parse(
cls,
file_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None,
) -> str:
"""
Extracts and concatenates text from all pages of a PDF file.
Args:
file_bytes: Binary contents of the file being parsed.
file_path: Path of the file being parsed.
logger: ptional logger instance for logging.
Returns:
"""
texts = []
with fitz.open(stream=file_bytes) as doc:
for page in doc:
text = page.get_text()
if text:
text = text.replace("\n\n", "")
texts.append(text)
full_text = " ".join(texts)
return full_text
parser_cls: Type[Parser] = PdfParser
The module must expose a
parser_clsobject referencing the parser class.
This implementation is available at examples/custom_parsers/pdf_parser.py.
CLI Options
| Option | Description |
|---|---|
input_dir |
Root directory to scan for files. |
output_path |
Output directory where the tool writes merger.<ext> (default: current directory). |
-e, --exporter |
Output exporter strategy (e.g., TREE_PLAIN_TEXT, PLAIN_TEXT, DIRECTORY_TREE, JSON). |
-i, --install-module |
Install a custom parser module. |
-u, --uninstall-module |
Uninstall a parser module by ID (* removes all). |
-l, --list-modules |
List installed parser modules. |
--ignore |
One or more ignore patterns (see Ignore Pattern Syntax). |
--merger-ignore |
File containing ignore patterns (default: ./merger.ignore). |
--version |
Show installed version. |
--log-level |
Set logging verbosity. |
License
This project is licensed under the MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merger_cli-2.1.1.tar.gz.
File metadata
- Download URL: merger_cli-2.1.1.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f70ebf43703f5c261a6f57f5656e1beda1f20c536b2161c4038792bb731e00ba
|
|
| MD5 |
3543bae3ea58b5ba1c28a1ab24b297ee
|
|
| BLAKE2b-256 |
acda239759869c1837f757eaaedba7d684c9516e64d6b0237477a855e69ed8b0
|
Provenance
The following attestation bundles were made for merger_cli-2.1.1.tar.gz:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-2.1.1.tar.gz -
Subject digest:
f70ebf43703f5c261a6f57f5656e1beda1f20c536b2161c4038792bb731e00ba - Sigstore transparency entry: 766652704
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@1282759d2399ea8a05696890d8d8b8414057a27f -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1282759d2399ea8a05696890d8d8b8414057a27f -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file merger_cli-2.1.1-py3-none-any.whl.
File metadata
- Download URL: merger_cli-2.1.1-py3-none-any.whl
- Upload date:
- Size: 25.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
33b1dcbc3e81bf62f65edebb5f3fef4b754634fab0fc4e74663af538a53ccc32
|
|
| MD5 |
c1f3d4655724f7347a730b5a42a53977
|
|
| BLAKE2b-256 |
d919c5099f00ed9890ad7f2f24200ee063a575d3761b4ef1a432558bd9bd09b1
|
Provenance
The following attestation bundles were made for merger_cli-2.1.1-py3-none-any.whl:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-2.1.1-py3-none-any.whl -
Subject digest:
33b1dcbc3e81bf62f65edebb5f3fef4b754634fab0fc4e74663af538a53ccc32 - Sigstore transparency entry: 766652711
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@1282759d2399ea8a05696890d8d8b8414057a27f -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1282759d2399ea8a05696890d8d8b8414057a27f -
Trigger Event:
workflow_dispatch
-
Statement type: