A flexible plugin system for audio transcription intended to make it easy to add support for multiple backends.
Project description
cjm-transcription-plugin-system
Install
pip install cjm_transcription_plugin_system
Project Structure
nbs/
├── core.ipynb # DTOs for audio transcription with FileBackedDTO support for zero-copy transfer
├── forced_alignment_core.ipynb # Data structures for word-level forced alignment results
├── forced_alignment_interface.ipynb # Domain-specific plugin interface for word-level audio-text alignment
├── forced_alignment_storage.ipynb # Standardized SQLite storage for forced alignment results with content hashing
├── plugin_interface.ipynb # Domain-specific plugin interface for audio transcription
└── storage.ipynb # Standardized SQLite storage for transcription results with content hashing
Total: 6 notebooks
Module Dependencies
graph LR
core[core<br/>Core Data Structures]
forced_alignment_core[forced_alignment_core<br/>Forced Alignment Core]
forced_alignment_interface[forced_alignment_interface<br/>Forced Alignment Plugin Interface]
forced_alignment_storage[forced_alignment_storage<br/>Forced Alignment Storage]
plugin_interface[plugin_interface<br/>Transcription Plugin Interface]
storage[storage<br/>Transcription Storage]
forced_alignment_interface --> forced_alignment_core
forced_alignment_interface --> core
plugin_interface --> core
3 cross-module dependencies detected
CLI Reference
No CLI commands found in this project.
Module Overview
Detailed documentation for each module in the project:
Core Data Structures (core.ipynb)
DTOs for audio transcription with FileBackedDTO support for zero-copy transfer
Import
from cjm_transcription_plugin_system.core import (
AudioData,
TranscriptionResult
)
Classes
@dataclass
class AudioData:
"""
Container for raw audio data.
Implements FileBackedDTO for zero-copy transfer between Host and Worker processes.
"""
samples: np.ndarray # Audio sample data as numpy array
sample_rate: int # Sample rate in Hz (e.g., 16000, 44100)
def to_temp_file(self) -> str: # Absolute path to temporary WAV file
"""Save audio to a temp file for zero-copy transfer to Worker process."""
# Create temp file (delete=False so Worker can read it)
tmp = tempfile.NamedTemporaryFile(suffix=".wav", delete=False)
# Ensure float32 format
audio = self.samples
if audio.dtype != np.float32
"Save audio to a temp file for zero-copy transfer to Worker process."
def to_dict(self) -> Dict[str, Any]: # Serialized representation
"""Convert to dictionary for smaller payloads."""
return {
"samples": self.samples.tolist(),
"Convert to dictionary for smaller payloads."
def from_file(
cls,
filepath: str # Path to audio file
) -> "AudioData": # AudioData instance
"Load audio from a file."
@dataclass
class TranscriptionResult:
"Standardized output for all transcription plugins."
text: str # The transcribed text
confidence: Optional[float] # Overall confidence (0.0 to 1.0)
segments: Optional[List[Dict[str, Any]]] # Timestamped segments
metadata: Dict[str, Any] = field(...) # Additional metadata
Forced Alignment Core (forced_alignment_core.ipynb)
Data structures for word-level forced alignment results
Import
from cjm_transcription_plugin_system.forced_alignment_core import (
ForcedAlignItem,
ForcedAlignResult
)
Classes
@dataclass
class ForcedAlignItem:
"A single word-level alignment result."
text: str # The aligned word (punctuation typically stripped by model)
start_time: float # Start time in seconds
end_time: float # End time in seconds
@dataclass
class ForcedAlignResult:
"Standardized output for all forced alignment plugins."
items: List[ForcedAlignItem] # Word-level alignments
metadata: Dict[str, Any] = field(...) # Plugin-specific metadata
Forced Alignment Plugin Interface (forced_alignment_interface.ipynb)
Domain-specific plugin interface for word-level audio-text alignment
Import
from cjm_transcription_plugin_system.forced_alignment_interface import (
ForcedAlignmentPlugin
)
Classes
class ForcedAlignmentPlugin(PluginInterface):
"""
Abstract base class for all forced alignment plugins.
Extends PluginInterface with forced-alignment-specific requirements:
- `supported_formats`: List of audio file extensions this plugin can handle
- `execute`: Accepts audio path and transcript text, returns ForcedAlignResult
NOTE: When running via RemotePluginProxy, AudioData objects are automatically
serialized to temp files via FileBackedDTO, so the Worker receives a file path.
"""
def supported_formats(self) -> List[str]: # e.g., ['wav', 'mp3', 'flac']
"""List of supported audio file extensions (without the dot)."""
...
@abstractmethod
def execute(
self,
audio: Union[AudioData, str, Path], # Audio data or file path
text: str, # Transcript text to align against
**kwargs
) -> ForcedAlignResult: # Word-level alignment result
"List of supported audio file extensions (without the dot)."
def execute(
self,
audio: Union[AudioData, str, Path], # Audio data or file path
text: str, # Transcript text to align against
**kwargs
) -> ForcedAlignResult: # Word-level alignment result
"Perform forced alignment of text against audio.
When called via Proxy, AudioData is auto-converted to a file path string
before reaching this method in the Worker process."
Forced Alignment Storage (forced_alignment_storage.ipynb)
Standardized SQLite storage for forced alignment results with content hashing
Import
from cjm_transcription_plugin_system.forced_alignment_storage import (
ForcedAlignmentRow,
ForcedAlignmentStorage
)
Classes
@dataclass
class ForcedAlignmentRow:
"A single row from the forced_alignments table."
job_id: str # Unique job identifier
audio_path: str # Path to the source audio file
audio_hash: str # Hash of source audio in "algo:hexdigest" format
text: str # Input transcript text that was aligned
text_hash: str # Hash of input text in "algo:hexdigest" format
items: Optional[List[Dict[str, Any]]] # Serialized ForcedAlignItems
metadata: Optional[Dict[str, Any]] # Plugin metadata
created_at: Optional[float] # Unix timestamp
class ForcedAlignmentStorage:
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Standardized SQLite storage for forced alignment results."
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Initialize storage and create table if needed."
def save(
self,
job_id: str, # Unique job identifier
audio_path: str, # Path to the source audio file
audio_hash: str, # Hash of source audio in "algo:hexdigest" format
text: str, # Input transcript text
text_hash: str, # Hash of input text in "algo:hexdigest" format
items: Optional[List[Dict[str, Any]]] = None, # Serialized ForcedAlignItems
metadata: Optional[Dict[str, Any]] = None # Plugin metadata
) -> None
"Save a forced alignment result to the database."
def get_by_job_id(
self,
job_id: str # Job identifier to look up
) -> Optional[ForcedAlignmentRow]: # Row or None if not found
"Retrieve a forced alignment result by job ID."
def list_jobs(
self,
limit: int = 100 # Maximum number of rows to return
) -> List[ForcedAlignmentRow]: # List of forced alignment rows
"List forced alignment jobs ordered by creation time (newest first)."
def verify_audio(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if audio matches, False if tampered, None if job not found
"Verify the source audio file still matches its stored hash."
def verify_text(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if text matches, False if tampered, None if job not found
"Verify the input text still matches its stored hash."
Transcription Plugin Interface (plugin_interface.ipynb)
Domain-specific plugin interface for audio transcription
Import
from cjm_transcription_plugin_system.plugin_interface import (
TranscriptionPlugin
)
Classes
class TranscriptionPlugin(PluginInterface):
"""
Abstract base class for all transcription plugins.
Extends PluginInterface with transcription-specific requirements:
- `supported_formats`: List of audio file extensions this plugin can handle
- `execute`: Accepts audio path (str) or AudioData, returns TranscriptionResult
NOTE: When running via RemotePluginProxy, AudioData objects are automatically
serialized to temp files via FileBackedDTO, so the Worker receives a file path.
"""
def supported_formats(self) -> List[str]: # e.g., ['wav', 'mp3', 'flac']
"""List of supported audio file extensions (without the dot)."""
...
@abstractmethod
def execute(
self,
audio: Union[AudioData, str, Path], # Audio data or file path
**kwargs
) -> TranscriptionResult: # Transcription result with text, confidence, segments
"List of supported audio file extensions (without the dot)."
def execute(
self,
audio: Union[AudioData, str, Path], # Audio data or file path
**kwargs
) -> TranscriptionResult: # Transcription result with text, confidence, segments
"Transcribe audio to text.
When called via Proxy, AudioData is auto-converted to a file path string
before reaching this method in the Worker process."
Transcription Storage (storage.ipynb)
Standardized SQLite storage for transcription results with content hashing
Import
from cjm_transcription_plugin_system.storage import (
TranscriptionRow,
TranscriptionStorage
)
Classes
@dataclass
class TranscriptionRow:
"A single row from the transcriptions table."
job_id: str # Unique job identifier
audio_path: str # Path to the source audio file
audio_hash: str # Hash of source audio in "algo:hexdigest" format
text: str # Transcribed text output
text_hash: str # Hash of transcribed text in "algo:hexdigest" format
segments: Optional[List[Dict[str, Any]]] # Timestamped segments
metadata: Optional[Dict[str, Any]] # Plugin metadata
created_at: Optional[float] # Unix timestamp
class TranscriptionStorage:
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Standardized SQLite storage for transcription results."
def __init__(
self,
db_path: str # Absolute path to the SQLite database file
)
"Initialize storage and create table if needed."
def save(
self,
job_id: str, # Unique job identifier
audio_path: str, # Path to the source audio file
audio_hash: str, # Hash of source audio in "algo:hexdigest" format
text: str, # Transcribed text output
text_hash: str, # Hash of transcribed text in "algo:hexdigest" format
segments: Optional[List[Dict[str, Any]]] = None, # Timestamped segments
metadata: Optional[Dict[str, Any]] = None # Plugin metadata
) -> None
"Save a transcription result to the database."
def get_by_job_id(
self,
job_id: str # Job identifier to look up
) -> Optional[TranscriptionRow]: # Row or None if not found
"Retrieve a transcription result by job ID."
def list_jobs(
self,
limit: int = 100 # Maximum number of rows to return
) -> List[TranscriptionRow]: # List of transcription rows
"List transcription jobs ordered by creation time (newest first)."
def verify_audio(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if audio matches, False if tampered, None if job not found
"Verify the source audio file still matches its stored hash."
def verify_text(
self,
job_id: str # Job identifier to verify
) -> Optional[bool]: # True if text matches, False if tampered, None if job not found
"Verify the transcription text still matches its stored hash."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cjm_transcription_plugin_system-0.0.14.tar.gz.
File metadata
- Download URL: cjm_transcription_plugin_system-0.0.14.tar.gz
- Upload date:
- Size: 16.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
46443ac8a7b953c01b442f1b8b7a2e73e33c8e28d71b806057ba646a41b213a7
|
|
| MD5 |
682875f4ae0d793a83dcb7be046c58f5
|
|
| BLAKE2b-256 |
f222e349bd2b44f457b4e11173597ee949a0093010e1c8ce0e96df6115dcd471
|
File details
Details for the file cjm_transcription_plugin_system-0.0.14-py3-none-any.whl.
File metadata
- Download URL: cjm_transcription_plugin_system-0.0.14-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98e5546dd973c426a061b558aebf97e2ea5a7936776e0c8627c088b18c327280
|
|
| MD5 |
d5da39d51de8c25f2d2959a777a81d70
|
|
| BLAKE2b-256 |
cd00b5c698a63a6b536040a8deaa614f4603ada200266eacf06a75c7aa0bf09a
|