A flexible plugin system for audio transcription intended to make it easy to add support for multiple backends.

These details have not been verified by PyPI

Project links

Homepage

Project description

cjm-transcription-plugin-system

Install

pip install cjm_transcription_plugin_system

Project Structure

nbs/
├── core.ipynb             # DTOs for audio transcription with FileBackedDTO support for zero-copy transfer
├── plugin_interface.ipynb # Domain-specific plugin interface for audio transcription
└── storage.ipynb          # Standardized SQLite storage for transcription results with content hashing

Total: 3 notebooks

Module Dependencies

graph LR
    core[core<br/>Core Data Structures]
    plugin_interface[plugin_interface<br/>Transcription Plugin Interface]
    storage[storage<br/>Transcription Storage]

    plugin_interface --> core

1 cross-module dependencies detected

CLI Reference

No CLI commands found in this project.

Module Overview

Detailed documentation for each module in the project:

Core Data Structures (`core.ipynb`)

DTOs for audio transcription with FileBackedDTO support for zero-copy transfer

Import

from cjm_transcription_plugin_system.core import (
    AudioData,
    TranscriptionResult
)

Classes

@dataclass
class AudioData:
    """
    Container for raw audio data.
    Implements FileBackedDTO for zero-copy transfer between Host and Worker processes.
    """
    
    samples: np.ndarray  # Audio sample data as numpy array
    sample_rate: int  # Sample rate in Hz (e.g., 16000, 44100)
    
    def to_temp_file(self) -> str: # Absolute path to temporary WAV file
            """Save audio to a temp file for zero-copy transfer to Worker process."""
            # Create temp file (delete=False so Worker can read it)
            tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
            
            # Ensure float32 format
            audio = self.samples
            if audio.dtype != np.float32
        "Save audio to a temp file for zero-copy transfer to Worker process."
    
    def to_dict(self) -> Dict[str, Any]: # Serialized representation
            """Convert to dictionary for smaller payloads."""
            return {
                "samples": self.samples.tolist(),
        "Convert to dictionary for smaller payloads."
    
    def from_file(
            cls,
            filepath: str # Path to audio file
        ) -> "AudioData": # AudioData instance
        "Load audio from a file."

@dataclass
class TranscriptionResult:
    "Standardized output for all transcription plugins."
    
    text: str  # The transcribed text
    confidence: Optional[float]  # Overall confidence (0.0 to 1.0)
    segments: Optional[List[Dict[str, Any]]]  # Timestamped segments
    metadata: Dict[str, Any] = field(...)  # Additional metadata

Transcription Plugin Interface (`plugin_interface.ipynb`)

Domain-specific plugin interface for audio transcription

Import

from cjm_transcription_plugin_system.plugin_interface import (
    TranscriptionPlugin
)

Classes

class TranscriptionPlugin(PluginInterface):
    """
    Abstract base class for all transcription plugins.
    
    Extends PluginInterface with transcription-specific requirements:
    - `supported_formats`: List of audio file extensions this plugin can handle
    - `execute`: Accepts audio path (str) or AudioData, returns TranscriptionResult
    
    NOTE: When running via RemotePluginProxy, AudioData objects are automatically
    serialized to temp files via FileBackedDTO, so the Worker receives a file path.
    """
    
    def supported_formats(self) -> List[str]: # e.g., ['wav', 'mp3', 'flac']
            """List of supported audio file extensions (without the dot)."""
            ...
    
        @abstractmethod
        def execute(
            self,
            audio: Union[AudioData, str, Path], # Audio data or file path
            **kwargs
        ) -> TranscriptionResult: # Transcription result with text, confidence, segments
        "List of supported audio file extensions (without the dot)."
    
    def execute(
            self,
            audio: Union[AudioData, str, Path], # Audio data or file path
            **kwargs
        ) -> TranscriptionResult: # Transcription result with text, confidence, segments
        "Transcribe audio to text.

When called via Proxy, AudioData is auto-converted to a file path string
before reaching this method in the Worker process."

Transcription Storage (`storage.ipynb`)

Standardized SQLite storage for transcription results with content hashing

Import

from cjm_transcription_plugin_system.storage import (
    TranscriptionRow,
    TranscriptionStorage
)

Classes

@dataclass
class TranscriptionRow:
    "A single row from the transcriptions table."
    
    job_id: str  # Unique job identifier
    audio_path: str  # Path to the source audio file
    audio_hash: str  # Hash of source audio in "algo:hexdigest" format
    text: str  # Transcribed text output
    text_hash: str  # Hash of transcribed text in "algo:hexdigest" format
    segments: Optional[List[Dict[str, Any]]]  # Timestamped segments
    metadata: Optional[Dict[str, Any]]  # Plugin metadata
    created_at: Optional[float]  # Unix timestamp

class TranscriptionStorage:
    def __init__(
        self,
        db_path: str  # Absolute path to the SQLite database file
    )
    "Standardized SQLite storage for transcription results."
    
    def __init__(
            self,
            db_path: str  # Absolute path to the SQLite database file
        )
        "Initialize storage and create table if needed."
    
    def save(
            self,
            job_id: str,        # Unique job identifier
            audio_path: str,    # Path to the source audio file
            audio_hash: str,    # Hash of source audio in "algo:hexdigest" format
            text: str,          # Transcribed text output
            text_hash: str,     # Hash of transcribed text in "algo:hexdigest" format
            segments: Optional[List[Dict[str, Any]]] = None,  # Timestamped segments
            metadata: Optional[Dict[str, Any]] = None         # Plugin metadata
        ) -> None
        "Save a transcription result to the database."
    
    def get_by_job_id(
            self,
            job_id: str  # Job identifier to look up
        ) -> Optional[TranscriptionRow]:  # Row or None if not found
        "Retrieve a transcription result by job ID."
    
    def list_jobs(
            self,
            limit: int = 100  # Maximum number of rows to return
        ) -> List[TranscriptionRow]:  # List of transcription rows
        "List transcription jobs ordered by creation time (newest first)."
    
    def verify_audio(
            self,
            job_id: str  # Job identifier to verify
        ) -> Optional[bool]:  # True if audio matches, False if tampered, None if job not found
        "Verify the source audio file still matches its stored hash."
    
    def verify_text(
            self,
            job_id: str  # Job identifier to verify
        ) -> Optional[bool]:  # True if text matches, False if tampered, None if job not found
        "Verify the transcription text still matches its stored hash."

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.14

Mar 20, 2026

This version

0.0.13

Feb 7, 2026

0.0.12

Dec 24, 2025

0.0.11

Dec 11, 2025

0.0.10

Dec 11, 2025

0.0.9

Oct 25, 2025

0.0.8

Oct 22, 2025

0.0.7

Oct 22, 2025

0.0.6

Oct 21, 2025

0.0.5

Sep 28, 2025

0.0.4

Sep 27, 2025

0.0.3

Sep 1, 2025

0.0.2

Aug 31, 2025

0.0.1

Aug 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cjm_transcription_plugin_system-0.0.13.tar.gz (13.5 kB view details)

Uploaded Feb 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cjm_transcription_plugin_system-0.0.13-py3-none-any.whl (13.5 kB view details)

Uploaded Feb 7, 2026 Python 3

File details

Details for the file cjm_transcription_plugin_system-0.0.13.tar.gz.

File metadata

Download URL: cjm_transcription_plugin_system-0.0.13.tar.gz
Upload date: Feb 7, 2026
Size: 13.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cjm_transcription_plugin_system-0.0.13.tar.gz
Algorithm	Hash digest
SHA256	`75b27e774ec1667d48e1eca5c497ceb2a56b04c761eb1905bd0a7d10507c6c88`
MD5	`e1263c12f93beeb5d028d5e0153c201f`
BLAKE2b-256	`1b78cce94a4bf9e9c103680a85fa07dae0da6695c17a75c91396d3c81dceb025`

See more details on using hashes here.

File details

Details for the file cjm_transcription_plugin_system-0.0.13-py3-none-any.whl.

File metadata

Download URL: cjm_transcription_plugin_system-0.0.13-py3-none-any.whl
Upload date: Feb 7, 2026
Size: 13.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cjm_transcription_plugin_system-0.0.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9129c982d839f6b1ee644a775fc4dc7678e898b58d851e9927ae00c1a7b9ef9b`
MD5	`922e4743e940250ad32e350e9368ef51`
BLAKE2b-256	`0b2ea3516121b2afe334fe7714de4af0206655de7c9f8909960edb3acbf6f539`

See more details on using hashes here.

cjm-transcription-plugin-system 0.0.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cjm-transcription-plugin-system

Install

Project Structure

Module Dependencies

CLI Reference

Module Overview

Core Data Structures (core.ipynb)

Import

Classes

Transcription Plugin Interface (plugin_interface.ipynb)

Import

Classes

Transcription Storage (storage.ipynb)

Import

Classes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Core Data Structures (`core.ipynb`)

Transcription Plugin Interface (`plugin_interface.ipynb`)

Transcription Storage (`storage.ipynb`)