Skip to main content

Interfaces and data models for the Merger CLI plugin system.

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

Merger Plugin API

Python PyPI

Interfaces and data models for extending the merger-cli tool with custom parsers and exporters.

This package provides:

  • Abstract base classes for custom Parsers and Exporters.
  • Data models for the File Tree structure.
  • Type definitions for seamless integration with merger-cli.

Compatibility

The merger-plugin-api is designed to be highly compatible to allow plugin developers to use a variety of environments.

  • Supported Python Versions: 3.8, 3.9, 3.10, and 3.11.

Installation

pip install merger-plugin-api

Creating Plugins

Plugins are standalone Python modules that define a Parser or TreeExporter class.

Custom Parsers

To support non-text file formats (e.g., PDF, Images), implement a custom parser. More complete examples like this one are available in the examples/parsers/ directory.

Here is an example of a PDF parser using pymupdf:

from pathlib import Path
from typing import Union, Optional, Set, Type

import pymupdf
from merger_plugin_api import Parser

# Optional: List of Python packages required for this plugin
REQUIREMENTS = ["pymupdf"]

# File extensions this parser supports
EXTENSIONS: Set[str] = {".pdf"}


class PdfParser(Parser):
    MAX_BYTES_FOR_VALIDATION: Optional[int] = None

    @classmethod
    def validate(
        cls,
        file_chunk_bytes: Union[bytes, bytearray],
        file_path: Path
    ) -> bool:
        """
        Validate that the given file bytes represent a readable PDF document.
        """
        try:
            with pymupdf.open(stream=file_chunk_bytes) as doc:
                _ = doc[0]
            return True

        except Exception:
            return False

    @classmethod
    def parse(
        cls,
        file_bytes: Union[bytes, bytearray],
        file_path: Path,
    ) -> str:
        """
        Extracts and concatenates text from all pages of a PDF file.
        """
        texts = []
        with pymupdf.open(stream=file_bytes) as doc:
            for page in doc:
                text = page.get_text()
                if text:
                    text = text.replace("\n\n", "")
                    texts.append(text)

        full_text = " ".join(texts)
        return full_text


# Export the parser class
parser_cls: Type[Parser] = PdfParser

Custom Exporters

To output the merged data in a custom format (e.g., XML, Markdown), implement a TreeExporter. More complete examples like this one are available in the examples/exporters/ directory.

Here is an example of an XML exporter:

import xml.etree.ElementTree as ET
from typing import Type
from merger_plugin_api import FileEntry, DirectoryEntry, FileTreeEntry, TreeExporter, FileTree

# The name of the exporter (used in --exporter argument)
NAME = "XML"
# The extension of the output file
FILE_EXTENSION = ".xml"

class XmlExporter(TreeExporter):
    """
    A custom exporter that generates an XML representation of the file tree.
    """

    @classmethod
    def export(cls, tree: FileTree) -> bytes:
        """
        Export the file tree into an XML representation.
        """
        root = ET.Element("filetree")
        cls._to_xml(tree.root, root)

        cls._indent(root)

        return ET.tostring(root, encoding="utf-8", xml_declaration=True)

    @classmethod
    def _to_xml(cls, entry: FileTreeEntry, parent: ET.Element):
        if isinstance(entry, FileEntry):
            file_el = ET.SubElement(parent, "file", {
                "name": entry.name,
                "path": entry.path.as_posix()
            })
            content_el = ET.SubElement(file_el, "content")
            content_el.text = entry.content

        elif isinstance(entry, DirectoryEntry):
            dir_el = ET.SubElement(parent, "directory", {
                "name": entry.name,
                "path": entry.path.as_posix()
            })
            for child in sorted(entry.children.values(), key=lambda e: e.name.lower()):
                cls._to_xml(child, dir_el)

    @classmethod
    def _indent(cls, elem: ET.Element, level: int = 0):
        """
        Recursive function to indent XML elements while preserving text content.
        """
        i = "\n" + level * "  "
        if len(elem):
            if not elem.text or not elem.text.strip():
                elem.text = i + "  "

            if not elem.tail or not elem.tail.strip():
                elem.tail = i

            for child in elem:
                cls._indent(child, level + 1)

            if len(elem) > 0:
                last_child = elem[-1]
                if not last_child.tail or not last_child.tail.strip():
                    last_child.tail = i

        else:
            if level and (not elem.tail or not elem.tail.strip()):
                elem.tail = i

# Export the exporter class
exporter_cls: Type[TreeExporter] = XmlExporter

Data Models

The FileTree object represents the hierarchical structure of the scanned directory.

FileTree

  • root: A DirectoryEntry representing the scan root.

DirectoryEntry

  • name: Name of the directory.
  • path: pathlib.Path relative to the scan root.
  • children: A dictionary mapping names to FileTreeEntry (FileEntry or DirectoryEntry).

FileEntry

  • name: Name of the file.
  • path: pathlib.Path relative to the scan root.
  • content: The parsed text content of the file.
  • extension: File extension (including the dot).

Using Your Plugins

Once you have implemented your plugin, install it via the CLI:

merger --install-plugin path/to/your_plugin.py

Merger will automatically detect if it's a parser or exporter and install any listed REQUIREMENTS using its internal uv manager.

For more information, visit the main repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

merger_plugin_api-1.0.0.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

merger_plugin_api-1.0.0-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file merger_plugin_api-1.0.0.tar.gz.

File metadata

  • Download URL: merger_plugin_api-1.0.0.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for merger_plugin_api-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c540ea8539a91ef69ab10a8b3d932a7dffa3e430afa89e445cd758eb02c2a68b
MD5 a7ea7786f32986694c18839e4bb99dd7
BLAKE2b-256 f7755146ea0fa36940bbbda44cc44715bc3ceeb658a95cdfe71d3f9926f8e158

See more details on using hashes here.

Provenance

The following attestation bundles were made for merger_plugin_api-1.0.0.tar.gz:

Publisher: publish-plugin-api.yml on diogotoporcov/merger-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file merger_plugin_api-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for merger_plugin_api-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb7a09d42d68b63cc503ac9a5d09f50f315eaa3aab26b70aa175ca9bc8b0d4fb
MD5 818fd97f7dbe137ab10749d5937c0131
BLAKE2b-256 bd4914521b79623e86dd5e7525b5587738c65d1e00bd09d1f2b14f0e50490f3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for merger_plugin_api-1.0.0-py3-none-any.whl:

Publisher: publish-plugin-api.yml on diogotoporcov/merger-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page