Interfaces and data models for the Merger CLI plugin system.
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
Merger Plugin API
Interfaces and data models for extending the merger-cli tool with custom parsers and exporters.
This package provides:
- Abstract base classes for custom Parsers and Exporters.
- Data models for the File Tree structure.
- Type definitions for seamless integration with
merger-cli.
Compatibility
The merger-plugin-api is designed to be highly compatible to allow plugin developers to use a variety of environments.
- Supported Python Versions: 3.8, 3.9, 3.10, and 3.11.
Installation
pip install merger-plugin-api
Creating Plugins
Plugins are standalone Python modules that define a Parser or TreeExporter class.
Custom Parsers
To support non-text file formats (e.g., PDF, Images), implement a custom parser. More complete examples like this one are available in the examples/parsers/ directory.
Here is an example of a PDF parser using pymupdf:
from pathlib import Path
from typing import Union, Optional, Set, Type
import pymupdf
from merger_plugin_api import Parser
# Optional: List of Python packages required for this plugin
REQUIREMENTS = ["pymupdf"]
# File extensions this parser supports
EXTENSIONS: Set[str] = {".pdf"}
class PdfParser(Parser):
MAX_BYTES_FOR_VALIDATION: Optional[int] = None
@classmethod
def validate(
cls,
file_chunk_bytes: Union[bytes, bytearray],
file_path: Path
) -> bool:
"""
Validate that the given file bytes represent a readable PDF document.
"""
try:
with pymupdf.open(stream=file_chunk_bytes) as doc:
_ = doc[0]
return True
except Exception:
return False
@classmethod
def parse(
cls,
file_bytes: Union[bytes, bytearray],
file_path: Path,
) -> str:
"""
Extracts and concatenates text from all pages of a PDF file.
"""
texts = []
with pymupdf.open(stream=file_bytes) as doc:
for page in doc:
text = page.get_text()
if text:
text = text.replace("\n\n", "")
texts.append(text)
full_text = " ".join(texts)
return full_text
# Export the parser class
parser_cls: Type[Parser] = PdfParser
Custom Exporters
To output the merged data in a custom format (e.g., XML, Markdown), implement a TreeExporter. More complete examples like this one are available in the examples/exporters/ directory.
Here is an example of an XML exporter:
import xml.etree.ElementTree as ET
from typing import Type
from merger_plugin_api import FileEntry, DirectoryEntry, FileTreeEntry, TreeExporter, FileTree
# The name of the exporter (used in --exporter argument)
NAME = "XML"
# The extension of the output file
FILE_EXTENSION = ".xml"
class XmlExporter(TreeExporter):
"""
A custom exporter that generates an XML representation of the file tree.
"""
@classmethod
def export(cls, tree: FileTree) -> bytes:
"""
Export the file tree into an XML representation.
"""
root = ET.Element("filetree")
cls._to_xml(tree.root, root)
cls._indent(root)
return ET.tostring(root, encoding="utf-8", xml_declaration=True)
@classmethod
def _to_xml(cls, entry: FileTreeEntry, parent: ET.Element):
if isinstance(entry, FileEntry):
file_el = ET.SubElement(parent, "file", {
"name": entry.name,
"path": entry.path.as_posix()
})
content_el = ET.SubElement(file_el, "content")
content_el.text = entry.content
elif isinstance(entry, DirectoryEntry):
dir_el = ET.SubElement(parent, "directory", {
"name": entry.name,
"path": entry.path.as_posix()
})
for child in sorted(entry.children.values(), key=lambda e: e.name.lower()):
cls._to_xml(child, dir_el)
@classmethod
def _indent(cls, elem: ET.Element, level: int = 0):
"""
Recursive function to indent XML elements while preserving text content.
"""
i = "\n" + level * " "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
if not elem.tail or not elem.tail.strip():
elem.tail = i
for child in elem:
cls._indent(child, level + 1)
if len(elem) > 0:
last_child = elem[-1]
if not last_child.tail or not last_child.tail.strip():
last_child.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = i
# Export the exporter class
exporter_cls: Type[TreeExporter] = XmlExporter
Data Models
The FileTree object represents the hierarchical structure of the scanned directory.
FileTree
root: ADirectoryEntryrepresenting the scan root.
DirectoryEntry
name: Name of the directory.path:pathlib.Pathrelative to the scan root.children: A dictionary mapping names toFileTreeEntry(FileEntryorDirectoryEntry).
FileEntry
name: Name of the file.path:pathlib.Pathrelative to the scan root.content: The parsed text content of the file.extension: File extension (including the dot).
Using Your Plugins
Once you have implemented your plugin, install it via the CLI:
merger --install-plugin path/to/your_plugin.py
Merger will automatically detect if it's a parser or exporter and install any listed REQUIREMENTS using its internal uv manager.
For more information, visit the main repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merger_plugin_api-1.0.0.tar.gz.
File metadata
- Download URL: merger_plugin_api-1.0.0.tar.gz
- Upload date:
- Size: 44.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c540ea8539a91ef69ab10a8b3d932a7dffa3e430afa89e445cd758eb02c2a68b
|
|
| MD5 |
a7ea7786f32986694c18839e4bb99dd7
|
|
| BLAKE2b-256 |
f7755146ea0fa36940bbbda44cc44715bc3ceeb658a95cdfe71d3f9926f8e158
|
Provenance
The following attestation bundles were made for merger_plugin_api-1.0.0.tar.gz:
Publisher:
publish-plugin-api.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_plugin_api-1.0.0.tar.gz -
Subject digest:
c540ea8539a91ef69ab10a8b3d932a7dffa3e430afa89e445cd758eb02c2a68b - Sigstore transparency entry: 1214843240
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@e7a8a4e3c0b47b3dfa998d5ef42e9320b3de448b -
Branch / Tag:
refs/tags/api-v1.0.0 - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-plugin-api.yml@e7a8a4e3c0b47b3dfa998d5ef42e9320b3de448b -
Trigger Event:
push
-
Statement type:
File details
Details for the file merger_plugin_api-1.0.0-py3-none-any.whl.
File metadata
- Download URL: merger_plugin_api-1.0.0-py3-none-any.whl
- Upload date:
- Size: 30.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb7a09d42d68b63cc503ac9a5d09f50f315eaa3aab26b70aa175ca9bc8b0d4fb
|
|
| MD5 |
818fd97f7dbe137ab10749d5937c0131
|
|
| BLAKE2b-256 |
bd4914521b79623e86dd5e7525b5587738c65d1e00bd09d1f2b14f0e50490f3e
|
Provenance
The following attestation bundles were made for merger_plugin_api-1.0.0-py3-none-any.whl:
Publisher:
publish-plugin-api.yml on diogotoporcov/merger-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
merger_plugin_api-1.0.0-py3-none-any.whl -
Subject digest:
bb7a09d42d68b63cc503ac9a5d09f50f315eaa3aab26b70aa175ca9bc8b0d4fb - Sigstore transparency entry: 1214843348
- Sigstore integration time:
-
Permalink:
diogotoporcov/merger-cli@e7a8a4e3c0b47b3dfa998d5ef42e9320b3de448b -
Branch / Tag:
refs/tags/api-v1.0.0 - Owner: https://github.com/diogotoporcov
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-plugin-api.yml@e7a8a4e3c0b47b3dfa998d5ef42e9320b3de448b -
Trigger Event:
push
-
Statement type: