Merger is a tool that scans a directory, filters files using customizable patterns, and merges readable content into a single output file.
Project description
Merger CLI
Merger is a command-line utility for developers that scans a directory, filters files using customizable ignore patterns, and merges all readable content into a single output file, suitable both for human reading and for use by AI models.
It supports multiple output formats (e.g., JSON, directory tree, plain text with file delimiters), and can be extended with custom file parsers (e.g., .pdf) and custom exporters (e.g., .xml, .md).
TLDR
- Install Python 3.8 or newer
- Create and activate a virtual environment: (If you want the CLI to be available globally, see Global Installation)
- Windows:
python -m venv .venv && .venv\Scripts\activate - Linux/macOS:
python3 -m venv .venv && source .venv/bin/activate
- Windows:
- Install libmagic if not installed:
- Windows: Automatically downloaded
- Linux:
sudo apt-get update && sudo apt-get install libmagic1 - macOS:
brew install libmagic
- Install the package:
pip install merger-cli - Verify the installation:
merger --version - Navigate to your project folder:
cd path/to/your/project - Create a merger ignore file: Manually or with
merger -c [TEMPLATE](See Custom Ignore Templates) - Execute merger-cli:
merger .to create a single combined file calledmerger.txt
For more options, refer to the Usage section below.
Summary
- Features
- Dependencies
- Installation
- Usage
- Ignore Pattern Syntax
- Output Formats
- Custom Parsers
- Custom Exporters
- CLI Options
- License
Features
- Recursive merge of all readable files under a root directory.
- Custom glob-like ignore patterns for filtering.
- Automatic file encoding detection.
- Modular parser & exporter system for custom formats and outputs with easy CLI management.
- Multiple export formats (built-in and custom).
- Modern CLI interface.
Dependencies
- Python (3.8+)
- libmagic
- Windows: Automatically downloaded
- Linux:
sudo apt-get update && sudo apt-get install libmagic1 - macOS:
brew install libmagic
All Python package requirements are listed in requirements.txt.
Installation
Virtual Environment (Recommended)
-
Create a virtual environment:
python3 -m venv .venv
-
Activate the virtual environment:
- Windows:
.venv\Scripts\activate - Linux/macOS:
source .venv/bin/activate
- Windows:
-
Install the package:
pip install merger-cli
-
Verify the installation:
merger --version
Global Installation
If you want the CLI to be available globally, it is recommended to use pipx:
-
Install pipx (if you don't have it):
- Windows:
python -m pip install --user pipx - Linux:
sudo apt update && sudo apt install pipx - macOS:
brew install pipx
- Windows:
-
Install merger-cli:
pipx install merger-cli
-
Ensure path:
pipx ensurepath -
Restart your terminal.
-
Verify the installation:
merger --version
Note: If you want to use custom modules that require external libraries (e.g.,
pymupdf), you need to inject them into themerger-clienvironment:pipx inject merger-cli pymupdf
Usage
Basic merge
merger .
Note: A
merger.ignorefile is required in the current directory for the tool to run. You can create one quickly usingmerger --create-ignore.
This writes a file named merger.txt in the current directory.
Save output to a specific directory
merger ./project ./out
This writes ./out/merger.txt (or ./out/merger.json, depending on the exporter).
Pick an output format
Use -e or --exporter to select the output format:
merger ./src --exporter JSON
merger ./src --exporter DIRECTORY_TREE
merger ./src --exporter PLAIN_TEXT
merger ./src --exporter TREE_PLAIN_TEXT
Custom ignore patterns
Provide one or more ignore patterns with --ignore (see Ignore Pattern Syntax):
merger ./project --ignore "*.log" "__pycache__/**" "*.tmp"
Custom ignore file
Provide a file containing ignore patterns (one per line) with --merger-ignore (see Ignore Pattern Syntax):
merger . --merger-ignore "C:\Users\USER\Desktop\ignore.txt"
Custom ignore templates
Quickly create a merger.ignore file using built-in templates:
merger -c PYTHON
Supported templates: DEFAULT, PYTHON, JAVASCRIPT, TYPESCRIPT, JAVA, GO, RUST, CPP, CSHARP, RUBY, PHP, KOTLIN.
Custom modules (Parsers & Exporters)
List all installed custom modules (parsers and exporters):
merger --list
Verbose output
merger ./src --log-level DEBUG
Ignore Pattern Syntax
Ignore patterns are evaluated relative to the input directory (the scan root). merger-cli uses standard Git-style matching (via pathspec), with some additional custom qualifiers.
Recursive vs. Anchored
- Recursive: Patterns with no slashes (or starting with
**/) match anywhere in the directory tree.- Example:
*.logmatchesroot/app.logandroot/logs/app.log.
- Example:
- Anchored: Patterns with at least one internal slash or a leading slash are anchored to the scan root.
- Example:
src/*.pymatchesroot/src/main.pybut notroot/project/src/main.py. - Example:
/config.jsonmatchesroot/config.jsonbut notroot/subdir/config.json.
- Example:
- Leading
./: Normalized to/and treated as an anchored pattern.
Pattern Components
*matches any number of characters within a single path segment.**matches zero or more directories.- Example:
**/node_modules/matchesnode_modulesat any depth.
- Example:
?matches exactly one character.[seq]matches any character in seq.
Type qualifiers
-
Trailing
/requires the matched path to be a directory- Example:
build/matches thebuilddirectory entry
- Example:
-
Trailing
:requires the matched path to be a file- Example:
README.md:matches theREADME.mdfile
- Example:
-
Trailing
!:-
This is a special escape suffix that disables type qualification and preserves any trailing
/or:as literal characters in the final path segment- Examples:
data:!matches any file or directory literally nameddata:data::matches any file literally nameddata:data:/matches any directory literally nameddata:data!!matches any file or directory literally nameddata!data!/matches any directory literally nameddata!data!:matches any file literally nameddata!
- Examples:
-
Examples
Ignore all files or directories that end with .log:
*.log(Recursive)
Ignore the dist directory at the scan root:
dist/(Anchored because it has a slash)
Ignore all node_modules directories anywhere:
**/node_modules/(Recursive)
Ignore a file named config.json at the scan root:
/config.json:
Ignore all .py files directly under the root src directory:
src/*.py:
Ignore all __pycache__ directories inside the root src directory:
src/**/__pycache__/
Ignore all files data::
data::
Ignore all directories data::
data:/
Ignore all files or directories data::
data:!
Output Formats
Merger writes one output file to the output directory, named merger.<extension> based on the selected exporter.
| Exporter Name | File Extension | Description |
|---|---|---|
TREE_PLAIN_TEXT |
.txt |
Directory tree + plain-text merged file contents (default). |
PLAIN_TEXT |
.txt |
Plain-text merged file contents with <<FILE_START>> / <<FILE_END>> file delimiter. |
TREE |
.txt |
Directory tree only. |
JSON |
.json |
JSON mapping file paths to parsed file contents (path: content). |
JSON_TREE |
.json |
Structured JSON representing the directory tree and file contents with hierarchy and metadata. |
Custom Parsers
Merger uses parser strategies to support parsing of non-text file formats (e.g., PDF, images with OCR, etc.).
Parser Abstract Class
All parsers must inherit from Parser:
from merger.parsing.parser import Parser
Required structure:
EXTENSIONS: Set[str](e.g.,{".pdf"})MAX_BYTES_FOR_VALIDATION: Optional[int]validate(cls, file_chunk_bytes, *, file_path=None, logger=None) -> boolparse(cls, file_bytes, *, file_path=None, logger=None) -> str
Managing Custom Parsers
To install a module:
merger --install path/to/parser.py
To uninstall a module (* removes all modules including parsers and exporters):
merger --uninstall <module_id>
To list installed modules:
merger --list
Custom Parser Example (PDF)
import logging
from pathlib import Path
from typing import Union, Optional, Set, Type
import pymupdf
from merger.parsing.parser import Parser
class PdfParser(Parser):
EXTENSIONS: Set[str] = {".pdf"}
MAX_BYTES_FOR_VALIDATION: Optional[int] = None
@classmethod
def validate(
cls,
file_chunk_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None
) -> bool:
"""
Validate that the given file represents a readable PDF document.
Args:
file_chunk_bytes: Binary contents of the file being validated, sufficient to perform validation.
file_path: Path of the file being validated.
logger: Optional logger instance for logging.
Returns:
bool: True if the file is a readable PDF, False otherwise.
"""
try:
with pymupdf.open(stream=file_chunk_bytes) as doc:
_ = doc[0]
return True
except Exception:
return False
@classmethod
def parse(
cls,
file_bytes: Union[bytes, bytearray],
*,
file_path: Optional[Path] = None,
logger: Optional[logging.Logger] = None,
) -> str:
"""
Extracts and concatenates text from all pages of a PDF file.
Args:
file_bytes: Binary contents of the file being parsed.
file_path: Path of the file being parsed.
logger: optional logger instance for logging.
Returns:
"""
texts = []
with pymupdf.open(stream=file_bytes) as doc:
for page in doc:
text = page.get_text()
if text:
text = text.replace("\n\n", "")
texts.append(text)
full_text = " ".join(texts)
return full_text
parser_cls: Type[Parser] = PdfParser
Available at examples/custom_parsers/pdf_parser.py.
The module must expose a
parser_clsobject referencing the parser class.
Custom Exporters
You can also extend Merger with custom export strategies to output data in any format (e.g., XML, Markdown, CSV).
Exporter Abstract Class
All exporters must inherit from TreeExporter:
from merger.exporters.tree_exporter import TreeExporter
Required structure:
NAME: str(The name used in--exporter)FILE_EXTENSION: str(The output file extension)export(cls, tree: FileTree) -> bytes
Managing Custom Exporters
To install an exporter:
merger --install path/to/exporter.py
To uninstall an exporter (* removes all modules including parsers and exporters):
merger --uninstall <exporter_id>
To list installed exporters:
merger --list
Custom Exporter Example (XML)
import xml.etree.ElementTree as ET
from merger.file_tree.entries import FileEntry, DirectoryEntry, FileTreeEntry
from merger.exporters.tree_exporter import TreeExporter
from merger.file_tree.tree import FileTree
class XmlExporter(TreeExporter):
"""
A custom exporter that generates an XML representation of the file tree.
"""
NAME = "XML"
FILE_EXTENSION = ".xml"
@classmethod
def export(cls, tree: FileTree) -> bytes:
root = ET.Element("filetree")
cls._to_xml(tree.root, root)
cls._indent(root)
return ET.tostring(root, encoding="utf-8", xml_declaration=True)
@classmethod
def _to_xml(cls, entry: FileTreeEntry, parent: ET.Element):
if isinstance(entry, FileEntry):
file_el = ET.SubElement(parent, "file", {
"name": entry.name,
"path": entry.path.as_posix()
})
content_el = ET.SubElement(file_el, "content")
content_el.text = entry.content
elif isinstance(entry, DirectoryEntry):
dir_el = ET.SubElement(parent, "directory", {
"name": entry.name,
"path": entry.path.as_posix()
})
for child in sorted(entry.children.values(), key=lambda e: e.name.lower()):
cls._to_xml(child, dir_el)
@classmethod
def _indent(cls, elem: ET.Element, level: int = 0):
"""
Recursive function to indent XML elements while preserving text content.
"""
i = "\n" + level * " "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for child in elem:
cls._indent(child, level + 1)
if len(elem) > 0:
last_child = elem[-1]
if not last_child.tail or not last_child.tail.strip():
last_child.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
exporter_cls = XmlExporter
Available at examples/custom_exporters/xml_exporter.py.
The module must expose an
exporter_clsobject referencing the exporter class.
CLI Options
| Option | Description |
|---|---|
input_dir |
Root directory to scan for files. |
output_path |
Output directory where the tool writes merger.<ext> (default: current directory). |
-e, --exporter |
Output exporter strategy (e.g., TREE_PLAIN_TEXT, PLAIN_TEXT, JSON, XML). |
-i, --install |
Install a custom module (parser or exporter). |
-u, --uninstall |
Uninstall a module by ID (* removes all modules including parsers and exporters). |
-l, --list |
List all installed custom modules. |
--ignore |
One or more ignore patterns (see Ignore Pattern Syntax). |
--merger-ignore |
File containing ignore patterns (default: ./merger.ignore). |
-c, --create-ignore |
Create a merger.ignore file using a built-in template (e.g., DEFAULT, PYTHON). |
--version |
Show installed version. |
--log-level |
Set logging verbosity. |
License
This project is licensed under the GPLv3 License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merger_cli-3.3.0.tar.gz.
File metadata
- Download URL: merger_cli-3.3.0.tar.gz
- Upload date:
- Size: 73.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dae7d6ad6dca6289bf8f638f818a480733882dfa43bedc55d225cac1a2648edb
|
|
| MD5 |
a4487beaacd27fb971f35e83ce44e104
|
|
| BLAKE2b-256 |
29ab023290b0254b6b854490b0ecdbef8a6ed4c921b571ac21051ef1a7eaa3f0
|
Provenance
The following attestation bundles were made for merger_cli-3.3.0.tar.gz:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-3.3.0.tar.gz -
Subject digest:
dae7d6ad6dca6289bf8f638f818a480733882dfa43bedc55d225cac1a2648edb - Sigstore transparency entry: 1197672127
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@503f0d887785d791bfd9786b00087ac4b87e1888 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@503f0d887785d791bfd9786b00087ac4b87e1888 -
Trigger Event:
push
-
Statement type:
File details
Details for the file merger_cli-3.3.0-py3-none-any.whl.
File metadata
- Download URL: merger_cli-3.3.0-py3-none-any.whl
- Upload date:
- Size: 59.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5258782ff42838c67800fe66b708a1085939c2d63f0a3940f165229d60a592ec
|
|
| MD5 |
a5fb1e3d5198e9ae14ae0a4c5dfac232
|
|
| BLAKE2b-256 |
a99d03b1a5db692d5bbe7cf8ad528b52f2fc1687f01f34f5b34d1b5c3d893ad6
|
Provenance
The following attestation bundles were made for merger_cli-3.3.0-py3-none-any.whl:
Publisher:
publish.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_cli-3.3.0-py3-none-any.whl -
Subject digest:
5258782ff42838c67800fe66b708a1085939c2d63f0a3940f165229d60a592ec - Sigstore transparency entry: 1197672144
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@503f0d887785d791bfd9786b00087ac4b87e1888 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@503f0d887785d791bfd9786b00087ac4b87e1888 -
Trigger Event:
push
-
Statement type: