A self-contained single-file transcription workflow for FastHTML applications.
Project description
cjm-fasthtml-workflow-transcription-single-file
Install
pip install cjm_fasthtml_workflow_transcription_single_file
Project Structure
nbs/
├── components/ (3)
│ ├── processor.ipynb # UI component for displaying transcription in-progress state
│ ├── results.ipynb # UI components for displaying transcription results and errors
│ └── steps.ipynb # UI components for workflow step rendering (plugin selection, file selection, confirmation)
├── core/ (4)
│ ├── adapters.ipynb # Adapter implementations for integrating with plugin registries
│ ├── config.ipynb # Configuration dataclass for single-file transcription workflow
│ ├── html_ids.ipynb # Centralized HTML ID constants for single-file transcription workflow components
│ └── protocols.ipynb # Protocol definitions for external dependencies and plugin integration
├── media/ (9)
│ ├── components.ipynb # UI components for media browser views (grid, list, preview modal)
│ ├── config.ipynb # Configuration for media file discovery and browser settings
│ ├── file_selection_pagination.ipynb # Factory function for creating paginated file selection table with radio buttons
│ ├── library.ipynb # Unified facade for media file discovery, mounting, and access
│ ├── models.ipynb # Data models for media files
│ ├── mounter.ipynb # Mounts media directories as static files for serving through the web server
│ ├── pagination.ipynb # Factory function for creating paginated media browser views
│ ├── scanner.ipynb # Scans directories for media files with caching support
│ └── utils.ipynb # Formatting utilities for media files
├── settings/ (2)
│ ├── components.ipynb # UI components for workflow settings modal and forms
│ └── schemas.ipynb # JSON schemas and utilities for workflow settings
├── storage/ (2)
│ ├── config.ipynb # Configuration for transcription result storage
│ └── file_storage.ipynb # File-based storage for transcription results
└── workflow/ (3)
├── job_handler.ipynb # Functions for starting transcription jobs and handling SSE streaming
├── routes.ipynb # Route initialization and handlers for the single-file transcription workflow
└── workflow.ipynb # Main workflow class orchestrating all subsystems for single-file transcription
Total: 23 notebooks across 6 directories
Module Dependencies
graph LR
components_processor[components.processor<br/>Processor Component]
components_results[components.results<br/>Results Components]
components_steps[components.steps<br/>Step Components]
core_adapters[core.adapters<br/>Adapters]
core_config[core.config<br/>Configuration]
core_html_ids[core.html_ids<br/>HTML IDs]
core_protocols[core.protocols<br/>Protocols]
media_components[media.components<br/>Media Components]
media_config[media.config<br/>Media Configuration]
media_file_selection_pagination[media.file_selection_pagination<br/>File Selection Pagination]
media_library[media.library<br/>Media Library]
media_models[media.models<br/>Media Models]
media_mounter[media.mounter<br/>Media Mounter]
media_pagination[media.pagination<br/>Media Pagination]
media_scanner[media.scanner<br/>Media Scanner]
media_utils[media.utils<br/>Media Utilities]
settings_components[settings.components<br/>Settings Components]
settings_schemas[settings.schemas<br/>Settings Schemas]
storage_config[storage.config<br/>Storage Configuration]
storage_file_storage[storage.file_storage<br/>Result Storage]
workflow_job_handler[workflow.job_handler<br/>Job Handler]
workflow_routes[workflow.routes<br/>Workflow Routes]
workflow_workflow[workflow.workflow<br/>Single File Transcription Workflow]
components_processor --> core_html_ids
components_processor --> core_config
components_results --> core_html_ids
components_results --> core_config
components_steps --> core_html_ids
components_steps --> core_protocols
components_steps --> core_config
core_adapters --> core_protocols
core_config --> core_html_ids
core_config --> media_config
core_config --> storage_config
media_components --> media_models
media_components --> media_mounter
media_file_selection_pagination --> media_models
media_file_selection_pagination --> media_library
media_file_selection_pagination --> media_scanner
media_library --> media_models
media_library --> media_config
media_library --> media_scanner
media_library --> media_mounter
media_pagination --> media_components
media_pagination --> media_library
media_pagination --> media_scanner
media_pagination --> media_mounter
media_scanner --> media_models
media_scanner --> media_utils
media_scanner --> media_config
settings_components --> core_html_ids
settings_schemas --> storage_config
settings_schemas --> core_config
settings_schemas --> media_config
storage_file_storage --> storage_config
workflow_job_handler --> components_results
workflow_job_handler --> core_html_ids
workflow_job_handler --> storage_file_storage
workflow_job_handler --> core_config
workflow_job_handler --> core_protocols
workflow_job_handler --> components_processor
workflow_routes --> workflow_job_handler
workflow_routes --> components_results
workflow_routes --> core_html_ids
workflow_routes --> workflow_workflow
workflow_routes --> components_steps
workflow_routes --> components_processor
workflow_workflow --> storage_file_storage
workflow_workflow --> core_html_ids
workflow_workflow --> media_library
workflow_workflow --> core_config
workflow_workflow --> workflow_job_handler
workflow_workflow --> components_steps
workflow_workflow --> core_adapters
51 cross-module dependencies detected
CLI Reference
No CLI commands found in this project.
Module Overview
Detailed documentation for each module in the project:
Adapters (adapters.ipynb)
Adapter implementations for integrating with plugin registries
Import
from cjm_fasthtml_workflow_transcription_single_file.core.adapters import (
PluginRegistryAdapter,
DefaultConfigPluginRegistryAdapter
)
Classes
class PluginRegistryAdapter:
def __init__(self,
"Adapts app's UnifiedPluginRegistry to workflow's PluginRegistryProtocol."
def __init__(self,
"Initialize the adapter."
def get_configured_plugins(self) -> List[PluginInfo]: # List of PluginInfo for configured plugins
"""Get all configured transcription plugins (those with saved config files)."""
plugins = self._registry.get_plugins_by_category(self._category)
return [
PluginInfo(
id=p.get_unique_id(),
name=p.name,
title=p.title,
is_configured=p.is_configured,
supports_streaming=self._check_streaming_support(p)
)
for p in plugins if p.is_configured
]
def get_all_plugins(self) -> List[PluginInfo]: # List of PluginInfo for all discovered plugins
"Get all configured transcription plugins (those with saved config files)."
def get_all_plugins(self) -> List[PluginInfo]: # List of PluginInfo for all discovered plugins
"""Get all discovered transcription plugins (configured or not)."""
plugins = self._registry.get_plugins_by_category(self._category)
return [
PluginInfo(
id=p.get_unique_id(),
name=p.name,
title=p.title,
is_configured=p.is_configured,
supports_streaming=self._check_streaming_support(p)
)
for p in plugins
]
def get_plugin(self,
plugin_id: str # Unique plugin identifier
) -> Optional[PluginInfo]: # PluginInfo if found, None otherwise
"Get all discovered transcription plugins (configured or not)."
def get_plugin(self,
plugin_id: str # Unique plugin identifier
) -> Optional[PluginInfo]: # PluginInfo if found, None otherwise
"Get a specific plugin by ID."
def get_plugin_config(self,
plugin_id: str # Unique plugin identifier
) -> Dict[str, Any]: # Configuration dictionary, empty dict if not configured
"Get the configuration for a plugin."
class DefaultConfigPluginRegistryAdapter:
def __init__(self,
registry: UnifiedPluginRegistry, # The UnifiedPluginRegistry instance to wrap
category: str = "transcription" # Plugin category to filter by
)
"Plugin registry adapter that provides default config values for unconfigured plugins."
def __init__(self,
registry: UnifiedPluginRegistry, # The UnifiedPluginRegistry instance to wrap
category: str = "transcription" # Plugin category to filter by
)
"Initialize adapter with registry instance."
def get_plugins_by_category(self,
category: str # Plugin category to filter by
) -> list: # List of plugins in the category
"Get all plugins in a specific category."
def get_plugin(self,
plugin_id: str # Unique plugin identifier
): # Plugin metadata or None
"Get a specific plugin by ID."
def load_plugin_config(self,
plugin_id: str # Unique plugin identifier
) -> Dict[str, Any]: # Configuration dictionary with defaults applied
"Load configuration for a plugin, using defaults if no saved config exists."
Media Components (components.ipynb)
UI components for media browser views (grid, list, preview modal)
Import
from cjm_fasthtml_workflow_transcription_single_file.media.components import (
get_media_icon,
grid_view_content,
list_view_content,
media_preview_modal,
empty_media_content,
media_browser_controls
)
Functions
def get_media_icon(
media_type: str # "video" or "audio"
) -> FT: # SVG element with appropriate icon
"Get an SVG icon for the media type."
def grid_view_content(
media_files: List[MediaFile], # List of media files to display
mounter: MediaMounter, # MediaMounter instance for URL generation
start_idx: int = 0, # Starting index for item numbering
media_type: Optional[str] = None, # Current filter type for maintaining state
preview_route_func = None, # Function to generate preview route URL
modal_id: str = "sf-media-preview" # ID for the preview modal
) -> FT: # Grid container with media cards
"Render media files as cards in a responsive grid layout."
def list_view_content(
media_files: List[MediaFile], # List of media files to display
mounter: MediaMounter, # MediaMounter instance for URL generation
start_idx: int = 0, # Starting index for item numbering
media_type: Optional[str] = None, # Current filter type for maintaining state
preview_route_func = None, # Function to generate preview route URL
modal_id: str = "sf-media-preview" # ID for the preview modal
) -> FT: # Table with media file rows
"Render media files as rows in a table."
def media_preview_modal(
media_file: MediaFile, # MediaFile to preview
media_url: Optional[str], # URL to the media file for playback
modal_id: str = "sf-media-preview" # ID for the modal element
) -> FT: # Modal dialog with media preview
"Create a modal dialog for previewing media files with video/audio player."
def empty_media_content(
message: str = "No media files found.", # Message to display
action_url: Optional[str] = None, # Optional URL for action button
action_text: str = "Configure Settings" # Text for action button
) -> FT: # Empty state container
"Render empty state display when no media files are found."
def media_browser_controls(
view_mode: str, # Current view mode ("grid" or "list")
media_type_filter: Optional[str], # Current media type filter
change_view_url_func, # Function to generate URL for view change
change_filter_url_func, # Function to generate URL for filter change
content_target_id: str # ID of content container to target
) -> FT: # Controls bar element
"Render control bar for media browser with view mode toggle and media type filter."
Settings Components (components.ipynb)
UI components for workflow settings modal and forms
Import
from cjm_fasthtml_workflow_transcription_single_file.settings.components import (
settings_trigger_button,
simple_settings_form,
settings_modal
)
Functions
def settings_trigger_button(
modal_id:str, # ID of the modal to trigger
label:str="Settings", # Button label text
button_cls:Optional[str]=None # Optional additional button classes
) -> FT: # Button element that triggers the modal
"Create a button that opens the settings modal."
def simple_settings_form(
directories:list, # List of media directories
auto_save:bool, # Current auto-save setting
results_directory:str, # Current results directory
save_url:str, # URL to POST settings to
target_id:str, # Target element ID for HTMX response
modal_id:str # Modal ID for close button
) -> FT: # Simple form element
"Create a simple settings form without full schema generation."
def settings_modal(
modal_id:str, # ID for the modal element
schema:Dict[str, Any], # JSON schema for settings
current_values:Dict[str, Any], # Current settings values
save_url:str, # URL to POST settings to
target_id:str # Target element ID for HTMX response
) -> FT: # Modal dialog with settings form
"Create the settings modal with form."
Configuration (config.ipynb)
Configuration dataclass for single-file transcription workflow
Import
from cjm_fasthtml_workflow_transcription_single_file.core.config import (
DEFAULT_WORKFLOW_CONFIG_DIR,
SingleFileWorkflowConfig
)
Classes
@dataclass
class SingleFileWorkflowConfig:
"Configuration for single-file transcription workflow."
workflow_id: str = 'single_file_transcription' # Unique identifier for this workflow
worker_type: str = 'transcription:single_file' # Worker process type identifier
route_prefix: str = '/single_file' # Base URL prefix for workflow routes
stepflow_prefix: str = '/flow' # Sub-prefix for StepFlow routes
media_prefix: str = '/media' # Sub-prefix for media browser routes
container_id: str = SingleFileHtmlIds.WORKFLOW_CONTAINER # HTML ID for main workflow container
show_progress: bool = True # Show step progress indicator
max_files_displayed: int = 50 # Maximum files to show in simple file selector
export_formats: List[str] = field(...) # Available export formats
no_plugins_redirect: Optional[str] # URL to redirect when no plugins configured
no_files_redirect: Optional[str] # URL to redirect when no media files found
sse_poll_interval: float = 2.0 # Seconds between SSE status checks
gpu_memory_threshold_percent: float = 45.0 # Max GPU memory % before blocking new jobs
config_dir: Path = field(...) # Directory for workflow settings
plugin_config_dir: Path = field(...) # Directory for plugin configs
plugin_category: str = 'transcription' # Plugin category for this workflow
media: MediaConfig = field(...) # Media scanning and display settings
storage: StorageConfig = field(...) # Result storage settings
def get_full_stepflow_prefix(self) -> str: # Combined route_prefix + stepflow_prefix
"""Get the full prefix for the StepFlow router."""
return f"{self.route_prefix}{self.stepflow_prefix}"
def get_full_media_prefix(self) -> str: # Combined route_prefix + media_prefix
"Get the full prefix for the StepFlow router."
def get_full_media_prefix(self) -> str: # Combined route_prefix + media_prefix
"Get the full prefix for the media router."
Variables
DEFAULT_WORKFLOW_CONFIG_DIR
Media Configuration (config.ipynb)
Configuration for media file discovery and browser settings
Import
from cjm_fasthtml_workflow_transcription_single_file.media.config import (
MEDIA_CONFIG_SCHEMA,
MediaConfig
)
Classes
@dataclass
class MediaConfig:
"Configuration for media file discovery and browser display."
directories: List[str] = field(...) # Directories to scan for media files
scan_audio: bool = True # Include audio files in scan results
scan_video: bool = True # Include video files in scan results
audio_extensions: List[str] = field(...) # File extensions recognized as audio
video_extensions: List[str] = field(...) # File extensions recognized as video
min_file_size_kb: int = 0 # Minimum file size in KB (0 = no minimum)
max_file_size_mb: int = 0 # Maximum file size in MB (0 = unlimited)
recursive_scan: bool = True # Scan subdirectories
include_hidden: bool = False # Include files starting with a dot
follow_symlinks: bool = False # Follow symbolic links when scanning
exclude_patterns: List[str] = field(...) # Glob patterns to exclude
cache_results: bool = True # Cache scan results for faster loading
cache_duration_minutes: int = 60 # How long to cache scan results
max_results: int = 1000 # Maximum number of files to return
items_per_page: int = 30 # Number of files to show per page
default_view: str = 'list' # Default view mode ("grid" or "list")
sort_by: str = 'name' # Default sort field (name, size, modified)
sort_descending: bool = False # Sort in descending order
Variables
MEDIA_CONFIG_SCHEMA = {5 items}
Storage Configuration (config.ipynb)
Configuration for transcription result storage
Import
from cjm_fasthtml_workflow_transcription_single_file.storage.config import (
STORAGE_CONFIG_SCHEMA,
StorageConfig
)
Classes
@dataclass
class StorageConfig:
"Result storage configuration."
auto_save: bool = True # Automatically save transcription results when complete
results_directory: str = 'transcription_results' # Directory to save transcription results
Variables
STORAGE_CONFIG_SCHEMA = {5 items}
File Selection Pagination (file_selection_pagination.ipynb)
Factory function for creating paginated file selection table with radio buttons
Import
from cjm_fasthtml_workflow_transcription_single_file.media.file_selection_pagination import (
create_file_selection_pagination
)
Functions
def _escape_js(
s: str # String to escape
) -> str: # Escaped string safe for JavaScript
"Escape a string for use in JavaScript."
def _render_file_row(
file: MediaFile, # MediaFile to render
idx: int, # Global index of this file (across all pages)
selected_file: Optional[str], # Currently selected file path (if any)
preview_url_func: Optional[Callable[[int], str]], # Function to generate preview URL
preview_target_id: Optional[str] # Target ID for preview modal
) -> FT: # Table row element
"Render a single file row in the selection table."
def create_file_selection_pagination(
pagination_id: str, # Unique identifier for this pagination instance
scanner: MediaScanner, # MediaScanner instance for loading files
items_per_page: int = 30, # Number of items per page
content_id: Optional[str] = None, # HTML ID for content area
preview_url_func: Optional[Callable[[int], str]] = None, # Function that takes file index and returns preview URL
preview_target_id: Optional[str] = None # HTML ID to target for preview modal
) -> Pagination: # Configured Pagination instance for file selection
"Create a Pagination instance for file selection with radio buttons."
Result Storage (file_storage.ipynb)
File-based storage for transcription results
Import
from cjm_fasthtml_workflow_transcription_single_file.storage.file_storage import (
ResultStorage
)
Functions
@patch
def should_auto_save(
self: ResultStorage
) -> bool: # True if results should be automatically saved
"Check if auto-save is enabled."
@patch
def save(
self: ResultStorage,
job_id: str, # Unique job identifier
file_path: str, # Path to the transcribed media file
file_name: str, # Name of the media file
plugin_id: str, # Plugin unique identifier
plugin_name: str, # Plugin display name
text: str, # The transcription text
metadata: Optional[Dict[str, Any]] = None, # Optional metadata from the transcription plugin
additional_info: Optional[Dict[str, Any]] = None # Optional additional information to store
) -> Path: # Path to the saved JSON file
"Save a transcription result to JSON file."
@patch
def load(
self: ResultStorage,
result_file: Path # Path to the JSON result file
) -> Optional[Dict[str, Any]]: # Dictionary containing the result data, or None if error
"Load a transcription result from JSON file."
@patch
def list_results(
self: ResultStorage,
sort_by: str = "timestamp", # Field to sort by ("timestamp", "file_name", "word_count")
reverse: bool = True # Sort in reverse order (newest first by default)
) -> List[Dict[str, Any]]: # List of result dictionaries
"List all saved transcription results."
@patch
def get_by_job_id(
self: ResultStorage,
job_id: str # The job identifier to search for
) -> Optional[Dict[str, Any]]: # Result dictionary if found, None otherwise
"Find and load a transcription result by job ID."
@patch
def delete(
self: ResultStorage,
result_file: str # Path to the result file (can be full path or filename)
) -> bool: # True if deletion successful, False otherwise
"Delete a transcription result file."
@patch
def update_text(
self: ResultStorage,
result_file: str, # Path to the result file
new_text: str # New transcription text
) -> bool: # True if update successful, False otherwise
"Update the transcription text in a saved result."
@patch
def _generate_filename(
self: ResultStorage,
job_id: str, # Unique job identifier
file_name: str # Original media file name
) -> str: # Generated filename for the JSON result file
"Generate a filename for storing transcription results."
Classes
class ResultStorage:
def __init__(self,
config: StorageConfig # Storage configuration
)
"File-based storage for transcription results."
def __init__(self,
config: StorageConfig # Storage configuration
)
"Initialize the storage."
def results_directory(self) -> Path: # Path to the results directory
"""Get the results directory, creating it if needed."""
if self._results_dir is None
"Get the results directory, creating it if needed."
HTML IDs (html_ids.ipynb)
Centralized HTML ID constants for single-file transcription workflow components
Import
from cjm_fasthtml_workflow_transcription_single_file.core.html_ids import (
SingleFileHtmlIds
)
Classes
class SingleFileHtmlIds(InteractionHtmlIds):
"HTML ID constants for single-file transcription workflow."
def plugin_radio(plugin_id: str # Unique plugin identifier to generate ID for
) -> str: # HTML ID for the plugin radio button
"Generate HTML ID for a plugin radio button."
def file_radio(index: int # File index in the selection list
) -> str: # HTML ID for the file radio button
"Generate HTML ID for a file radio button."
Job Handler (job_handler.ipynb)
Functions for starting transcription jobs and handling SSE streaming
Import
from cjm_fasthtml_workflow_transcription_single_file.workflow.job_handler import (
get_job_session_info,
start_transcription_job,
create_job_stream_handler
)
Functions
def get_job_session_info(
job_id: str, # Unique job identifier
job, # Job object from the manager
sess, # FastHTML session object
plugin_registry: PluginRegistryProtocol, # Plugin registry for getting plugin info
) -> tuple[Dict[str, Any], Dict[str, Any]]: # Tuple of (file_info, plugin_info) dictionaries
"Get file and plugin info from session with fallbacks."
def _save_job_result_once(
sess, # FastHTML session object
job_id: str, # Job identifier
job, # Job object
data: Dict[str, Any], # Transcription data containing text and metadata
plugin_registry: PluginRegistryProtocol, # Plugin registry for getting plugin info
result_storage: ResultStorage, # Storage for saving transcription results
) -> None
"""
Save transcription result to disk, ensuring it's only saved once per job.
Called from the SSE stream handler as a fallback. The primary save mechanism
is the workflow's `_on_job_completed` callback called by TranscriptionJobManager.
"""
def _create_sse_swap_message(
content, # HTML content to wrap
container_id: str, # Target container ID for the swap
): # Div with OOB swap attributes
"Wrap content in a Div with HTMX OOB swap for SSE messages."
async def start_transcription_job(
state: Dict[str, Any], # Workflow state containing plugin_id, file_path, file_name, etc.
request, # FastHTML request object
config: SingleFileWorkflowConfig, # Workflow configuration
router, # Workflow router for generating route URLs
transcription_manager, # Manager for starting transcription jobs
plugin_registry: PluginRegistryProtocol, # Plugin registry for getting plugin info
): # transcription_in_progress component showing job status
"""
Handle workflow completion by starting the transcription job.
Called by StepFlow's `on_complete` handler when the user confirms
and clicks "Start Transcription".
"""
def create_job_stream_handler(
job_id: str, # Unique job identifier
request, # FastHTML request object
sess, # FastHTML session object
config: SingleFileWorkflowConfig, # Workflow configuration
router, # Workflow router for generating route URLs
stepflow_router: APIRouter, # StepFlow router for generating stepflow URLs
transcription_manager, # Manager for getting job status
plugin_registry: PluginRegistryProtocol, # Plugin registry for getting plugin info
result_storage: ResultStorage, # Storage for saving transcription results
): # Async generator for SSE streaming
"Create an SSE stream generator for monitoring job completion."
Media Library (library.ipynb)
Unified facade for media file discovery, mounting, and access
Import
from cjm_fasthtml_workflow_transcription_single_file.media.library import (
MediaLibrary
)
Functions
@patch
def mount(
self: MediaLibrary,
app # FastHTML/Starlette application instance
) -> None
"Mount media directories to app for static file serving."
@patch
def scan(
self: MediaLibrary,
force_refresh: bool = False # Force a fresh scan, ignoring cache
) -> List[MediaFile]: # List of MediaFile objects
"Scan for media files."
@patch
def get_transcribable_files(
self: MediaLibrary
) -> List[MediaFile]: # List of MediaFile objects with media_type 'audio' or 'video'
"Get files suitable for transcription (audio and video only)."
@patch
def get_url(
self: MediaLibrary,
file_path: str # Full path to the media file
) -> Optional[str]: # URL path to access the file, or None if not in a mounted directory
"Get URL for a media file."
@patch
def clear_cache(
self: MediaLibrary
) -> None
"Clear the scan cache."
@patch
def get_summary(
self: MediaLibrary
) -> dict: # Dictionary with file counts, sizes, and breakdowns
"Get summary statistics for scanned files."
@patch
def create_pagination(
self: MediaLibrary,
pagination_id: str, # Unique identifier for this pagination instance
content_id: str, # HTML ID for the content area
preview_route_func = None, # Optional function to generate preview route URL
modal_id: str = "sf-media-preview" # ID for the preview modal
): # Configured Pagination instance
"Create a pagination instance for browsing media files."
@patch
def get_pagination_router(
self: MediaLibrary,
prefix: str # URL prefix for pagination routes
) -> Optional[APIRouter]: # APIRouter for pagination, or None if pagination not created
"Get the pagination router for registration with the app."
@patch
def create_file_selection_pagination(
self: MediaLibrary,
pagination_id: str, # Unique identifier for this pagination instance
content_id: str, # HTML ID for the content area
preview_url_func = None, # Function that takes file index and returns preview URL
preview_target_id: str = None # HTML ID to target for preview modal
): # Configured Pagination instance for file selection
"Create a pagination instance for file selection table with radio buttons."
@patch
def get_file_selection_router(
self: MediaLibrary,
prefix: str # URL prefix for pagination routes
) -> Optional[APIRouter]: # APIRouter for file selection pagination, or None if not created
"Get the file selection pagination router."
Classes
class MediaLibrary:
def __init__(
self,
config: MediaConfig # Media configuration with directories and settings
)
"Unified interface for media scanning, mounting, and browsing."
def __init__(
self,
config: MediaConfig # Media configuration with directories and settings
)
"Initialize the media library."
def pagination(self):
"""Access to the pagination instance."""
return self._pagination
@property
def pagination_router(self) -> Optional[APIRouter]
"Access to the pagination instance."
def pagination_router(self) -> Optional[APIRouter]:
"""Access to the pagination router."""
return self._pagination_router
@property
def file_selection_pagination(self)
"Access to the pagination router."
def file_selection_pagination(self):
"""Access to the file selection pagination instance."""
return getattr(self, '_file_selection_pagination', None)
@property
def file_selection_router(self) -> Optional[APIRouter]
"Access to the file selection pagination instance."
def file_selection_router(self) -> Optional[APIRouter]
"Access to the file selection pagination router."
Media Models (models.ipynb)
Data models for media files
Import
from cjm_fasthtml_workflow_transcription_single_file.media.models import (
MediaFile
)
Classes
@dataclass
class MediaFile:
"Represents a discovered media file for display and processing."
path: str # Full path to the file
name: str # Display name of the file
extension: str # File extension (without dot)
size: int # Size in bytes
size_str: str # Human-readable size (e.g., "15.2 MB")
modified: float # Modification timestamp
modified_str: str # Human-readable modification date
media_type: str # 'video' or 'audio'
directory: str # Parent directory path
Media Mounter (mounter.ipynb)
Mounts media directories as static files for serving through the web server
Import
from cjm_fasthtml_workflow_transcription_single_file.media.mounter import (
MediaMounter
)
Functions
@patch
def mount(
self: MediaMounter,
app, # FastHTML/Starlette application instance
directories: List[str] # List of directory paths to mount
) -> None
"Mount directories to app for static file serving."
@patch
def get_url(
self: MediaMounter,
file_path: str # Full path to the media file
) -> Optional[str]: # URL to access the file, or None if not in a mounted directory
"Get URL for a file based on mounted directories."
@patch
def is_mounted(
self: MediaMounter,
directory: str # Directory path to check
) -> bool: # True if the directory is mounted
"Check if a directory is currently mounted."
@patch
def get_mounted_directories(
self: MediaMounter
) -> List[str]: # List of mounted directory paths
"Get list of currently mounted directories."
@patch
def unmount_all(
self: MediaMounter
) -> None
"Remove all mounts from this instance."
@patch
def _mount_directory(
self: MediaMounter,
app, # FastHTML/Starlette application instance
directory: str # Directory path to mount
) -> None
"Mount a single directory."
@patch
def _generate_prefix(
self: MediaMounter,
directory: str # Directory path
) -> str: # Route prefix string (e.g., "sf_media_abc12345")
"Generate a unique route prefix for a directory using MD5 hash."
@patch
def _remove_existing_mounts(
self: MediaMounter,
app # FastHTML/Starlette application instance
) -> None
"Remove existing mounts matching this mounter's prefix pattern."
Classes
class MediaMounter:
def __init__(self):
"""Initialize the mounter with empty state."""
self._mounted: Dict[str, str] = {} # directory -> route_prefix
"Mounts directories for static file serving with instance-level state."
def __init__(self):
"""Initialize the mounter with empty state."""
self._mounted: Dict[str, str] = {} # directory -> route_prefix
"Initialize the mounter with empty state."
Media Pagination (pagination.ipynb)
Factory function for creating paginated media browser views
Import
from cjm_fasthtml_workflow_transcription_single_file.media.pagination import (
create_media_pagination
)
Functions
def create_media_pagination(
pagination_id: str, # Unique identifier for this pagination instance
scanner: MediaScanner, # MediaScanner instance for loading files
mounter: MediaMounter, # MediaMounter instance for URL generation
items_per_page: int = 30, # Number of items per page
content_id: Optional[str] = None, # HTML ID for content area
preview_route_func = None, # Function to generate preview route URL
modal_id: str = "sf-media-preview" # ID for the preview modal
) -> Pagination: # Configured Pagination instance
"Create a Pagination instance for media browsing."
Processor Component (processor.ipynb)
UI component for displaying transcription in-progress state
Import
from cjm_fasthtml_workflow_transcription_single_file.components.processor import (
transcription_in_progress
)
Functions
def transcription_in_progress(
job_id: str, # Unique identifier for the transcription job
plugin_info: Dict[str, Any], # Dictionary with plugin details (id, title, supports_streaming)
file_info: Dict[str, Any], # Dictionary with file details (name, path, type, size_str)
config: SingleFileWorkflowConfig, # Workflow configuration
router: APIRouter, # Workflow router for generating route URLs
) -> FT: # FastHTML component showing progress and SSE connection
"Render transcription in-progress view with SSE updates."
Protocols (protocols.ipynb)
Protocol definitions for external dependencies and plugin integration
Import
from cjm_fasthtml_workflow_transcription_single_file.core.protocols import (
PluginInfo,
PluginRegistryProtocol,
ResourceManagerProtocol
)
Classes
@dataclass
class PluginInfo:
"Information about a transcription plugin."
id: str # Unique plugin identifier (e.g., "transcription:voxtral_hf")
name: str # Plugin name (e.g., "voxtral_hf")
title: str # Display title (e.g., "Voxtral HF")
is_configured: bool # Whether the plugin has a valid configuration
supports_streaming: bool = False # Whether the plugin supports streaming output
@runtime_checkable
class PluginRegistryProtocol(Protocol):
"Protocol for plugin registry access."
def get_configured_plugins(self) -> List[PluginInfo]: # List of PluginInfo for configured plugins
"""Get all configured transcription plugins."""
...
def get_plugin(self,
plugin_id: str # Unique plugin identifier
) -> Optional[PluginInfo]: # PluginInfo if found, None otherwise
"Get all configured transcription plugins."
def get_plugin(self,
plugin_id: str # Unique plugin identifier
) -> Optional[PluginInfo]: # PluginInfo if found, None otherwise
"Get a specific plugin by ID."
def get_plugin_config(self,
plugin_id: str # Unique plugin identifier
) -> Dict[str, Any]: # Configuration dictionary, empty dict if not configured
"Get the configuration for a plugin."
@runtime_checkable
class ResourceManagerProtocol(Protocol):
"Protocol for resource availability checks."
def check_gpu_available(self) -> bool: # True if GPU is available and has sufficient memory
"""Check if GPU is available for processing."""
...
def get_gpu_memory_usage(self) -> float: # GPU memory usage as a percentage (0-100)
"Check if GPU is available for processing."
def get_gpu_memory_usage(self) -> float: # GPU memory usage as a percentage (0-100)
"Get current GPU memory usage percentage."
Results Components (results.ipynb)
UI components for displaying transcription results and errors
Import
from cjm_fasthtml_workflow_transcription_single_file.components.results import (
transcription_results,
transcription_error
)
Functions
def transcription_results(
job_id: str, # Unique identifier for the transcription job
transcription_text: str, # The transcribed text
metadata: Dict[str, Any], # Transcription metadata from the plugin
file_info: Dict[str, Any], # Dictionary with file details (name, path, type, size_str)
plugin_info: Dict[str, Any], # Dictionary with plugin details (id, title, supports_streaming)
config: SingleFileWorkflowConfig, # Workflow configuration
router: APIRouter, # Workflow router for generating route URLs
stepflow_router: APIRouter, # StepFlow router for generating stepflow URLs
) -> FT: # FastHTML component showing results with export options
"Render transcription results with export options."
def transcription_error(
error_message: str, # Description of the error that occurred
file_info: Optional[Dict[str, Any]], # Optional dictionary with file details
config: SingleFileWorkflowConfig, # Workflow configuration
stepflow_router: APIRouter, # StepFlow router for generating stepflow URLs
) -> FT: # FastHTML component showing error with retry option
"Render transcription error message."
Workflow Routes (routes.ipynb)
Route initialization and handlers for the single-file transcription workflow
Import
from cjm_fasthtml_workflow_transcription_single_file.workflow.routes import (
init_router
)
Functions
def init_router(
workflow: SingleFileTranscriptionWorkflow, # The workflow instance providing access to config and dependencies
) -> APIRouter: # Configured APIRouter with all workflow routes
"Initialize and return the workflow's API router with all routes."
def _export_transcription(
text: str, # Transcription text
format: str, # Export format (txt, srt, vtt)
filename: str, # Original filename for metadata
) -> str: # Formatted transcription string
"Format transcription for export."
Media Scanner (scanner.ipynb)
Scans directories for media files with caching support
Import
from cjm_fasthtml_workflow_transcription_single_file.media.scanner import (
MediaScanner
)
Functions
@patch
def _is_cache_valid(
self: MediaScanner
) -> bool: # True if cache exists and hasn't expired
"Check if cache is still valid."
@patch
def clear_cache(
self: MediaScanner
) -> None
"Clear the scan cache."
@patch
def _update_cache(
self: MediaScanner,
files: List[MediaFile] # List of scanned MediaFile objects
) -> None
"Update cache with new scan results."
@patch
def _scan_directories(
self: MediaScanner
) -> List[MediaFile]: # List of MediaFile objects matching the configuration
"Perform actual directory scan."
@patch
def _sort_files(
self: MediaScanner,
files: List[MediaFile] # Files to sort
) -> List[MediaFile]: # Sorted files
"Sort files according to configuration."
@patch
def scan(
self: MediaScanner,
force_refresh: bool = False # Force a fresh scan, ignoring cache
) -> List[MediaFile]: # List of MediaFile objects
"Scan for media files, using cache if valid."
@patch
def get_summary(
self: MediaScanner
) -> Dict[str, Any]: # Dictionary with total count, size, and breakdowns by type/extension
"Get summary statistics for scanned files."
Classes
class MediaScanner:
def __init__(
self,
config: MediaConfig # Media configuration with directories and filters
)
"Scans directories for media files with instance-level caching."
def __init__(
self,
config: MediaConfig # Media configuration with directories and filters
)
"Initialize the scanner."
Settings Schemas (schemas.ipynb)
JSON schemas and utilities for workflow settings
Import
from cjm_fasthtml_workflow_transcription_single_file.settings.schemas import (
WORKFLOW_SETTINGS_SCHEMA,
get_settings_from_config
)
Functions
def get_settings_from_config(
media_config, # MediaConfig instance with media scanning settings
storage_config, # StorageConfig instance with result storage settings
workflow_config=None # Optional SingleFileWorkflowConfig for additional settings
) -> dict: # Dictionary of current settings values
"Extract settings values from config objects."
Variables
WORKFLOW_SETTINGS_SCHEMA = {5 items}
Step Components (steps.ipynb)
UI components for workflow step rendering (plugin selection, file selection, confirmation)
Import
from cjm_fasthtml_workflow_transcription_single_file.components.steps import (
render_plugin_config_form,
render_plugin_details_route,
render_plugin_selection,
render_file_selection,
render_confirmation
)
Functions
def _get_file_attr(
file_path: str, # Path to the file to look up
media_files: list, # List of MediaFile objects to search
attr: str, # Attribute name to retrieve from the file
) -> str: # Attribute value or empty string if not found
"Get an attribute from a file by path."
def _render_plugin_details_content(
plugin_id: str, # ID of the plugin to display details for
plugins: List[PluginInfo], # List of available plugins
plugin_registry: PluginRegistryProtocol, # Plugin registry adapter for getting plugin config
) -> Optional[FT]: # Plugin info card or None if plugin not found
"Render details for selected plugin (info card only, no config collapse)."
def _render_plugin_details_with_config(
plugin_id: str, # ID of the plugin to display details for
plugins: List[PluginInfo], # List of available plugins
plugin_registry: PluginRegistryProtocol, # Plugin registry adapter for getting plugin config
raw_plugin_registry, # UnifiedPluginRegistry for config_schema access
save_url: str, # URL for saving plugin configuration
reset_url: str, # URL for resetting plugin configuration
) -> Optional[FT]: # Plugin details with config collapse, or None if not found
"Render plugin details with configuration collapse for initial render."
def render_plugin_config_form(
plugin_id: str, # ID of the plugin to render config for
plugin_registry, # UnifiedPluginRegistry with config_schema access
save_url: str, # URL for saving the configuration
reset_url: str, # URL for resetting to defaults
alert_message: Optional[Any] = None, # Optional alert to display above the form
) -> FT: # Div containing the settings form with alert container
"Render the plugin configuration form for the collapse content."
def _render_plugin_config_collapse(
plugin_id: str, # ID of the plugin to render config for
plugin_registry, # UnifiedPluginRegistry with config_schema access
save_url: str, # URL for saving the configuration
reset_url: str, # URL for resetting to defaults
) -> FT: # Collapse component with plugin configuration form
"Render a collapse component containing the plugin configuration form."
def render_plugin_details_route(
plugin_id: str, # ID of the plugin to display details for
plugin_registry: PluginRegistryProtocol, # Plugin registry adapter for getting plugins and config
raw_plugin_registry, # UnifiedPluginRegistry for config_schema access
save_url: str, # URL for saving plugin configuration
reset_url: str, # URL for resetting plugin configuration to defaults
) -> FT: # Plugin details with info card and config collapse
"Render plugin details for HTMX route when plugin dropdown changes."
def render_plugin_selection(
ctx: InteractionContext, # Interaction context with state and data
config: SingleFileWorkflowConfig, # Workflow configuration
plugin_registry: PluginRegistryProtocol, # Plugin registry adapter for getting plugin config
settings_modal_url: str, # URL for the settings modal route
plugin_details_url: str, # URL for the plugin details route
raw_plugin_registry=None, # UnifiedPluginRegistry for config_schema access (optional)
save_plugin_config_url: str = "", # URL for saving plugin configuration
reset_plugin_config_url: str = "", # URL for resetting plugin configuration
) -> FT: # Plugin selection step UI component
"Render plugin selection step showing all discovered plugins."
def render_file_selection(
ctx: InteractionContext, # Interaction context with state and data
config: SingleFileWorkflowConfig, # Workflow configuration
file_selection_router: APIRouter, # Router for file selection pagination (or None)
) -> FT: # File selection step UI component with paginated table
"Render file selection step with paginated table view and preview capability."
def render_confirmation(
ctx: InteractionContext, # Interaction context with state and data
plugin_registry: PluginRegistryProtocol, # Plugin registry adapter for getting plugin info
) -> FT: # Confirmation step UI component showing selected plugin and file
"Render confirmation step showing selected plugin and file."
Media Utilities (utils.ipynb)
Formatting utilities for media files
Import
from cjm_fasthtml_workflow_transcription_single_file.media.utils import (
format_file_size,
format_timestamp,
matches_patterns
)
Functions
def format_file_size(
size_bytes: int # Size in bytes
) -> str: # Human-readable size string (e.g., "15.2 MB")
"Format file size in human-readable format."
def format_timestamp(
timestamp: float # Unix timestamp
) -> str: # Human-readable date string
"Format timestamp to human-readable date with relative time for recent files."
def matches_patterns(
path: str, # File path to check
patterns: List[str] # List of glob patterns to match against
) -> bool: # True if path matches any pattern
"Check if path matches any of the exclude patterns."
Single File Transcription Workflow (workflow.ipynb)
Main workflow class orchestrating all subsystems for single-file transcription
Import
from cjm_fasthtml_workflow_transcription_single_file.workflow.workflow import (
SingleFileTranscriptionWorkflow
)
Functions
@patch
def setup(
self: SingleFileTranscriptionWorkflow,
app, # FastHTML application instance
) -> None
"Initialize workflow with FastHTML app. Must be called after app creation."
@patch
def _ensure_plugin_configs_exist(
self: SingleFileTranscriptionWorkflow,
) -> None
"""
Ensure all discovered plugins have config files.
For plugins without saved config files, creates a config file with
default values from the plugin's schema. Required because workers
only load plugins that have config files.
"""
@patch
def get_routers(
self: SingleFileTranscriptionWorkflow,
) -> List[APIRouter]: # List containing main router, stepflow router, media router, and file selection router
"Return all routers for registration with the app."
@patch
def render_entry_point(
self: SingleFileTranscriptionWorkflow,
request, # FastHTML request object
sess, # FastHTML session object
) -> FT: # AsyncLoadingContainer component
"""
Render the workflow entry point for embedding in tabs, etc.
Returns an AsyncLoadingContainer that loads the current_status endpoint,
which determines what to show (running job, workflow in progress,
completed job, or fresh start).
"""
@patch
def _on_job_completed(
"Workflow-specific completion handling. Auto-saves results if enabled."
@patch
def _create_preview_route_func(
self: SingleFileTranscriptionWorkflow,
): # Function that generates preview route URLs
"Create a function that generates preview route URLs (with optional media_type)."
@patch
def _create_preview_url_func(
self: SingleFileTranscriptionWorkflow,
): # Function that generates preview URLs for file selection
"Create a function that generates preview URLs for file selection (index only)."
@patch
def _create_step_flow(
self: SingleFileTranscriptionWorkflow,
) -> StepFlow: # Configured StepFlow instance
"Create and configure the StepFlow instance."
@patch
def _create_router(
self: SingleFileTranscriptionWorkflow,
) -> APIRouter: # Configured APIRouter with all workflow routes
"Create the workflow's API router with all routes."
Classes
class SingleFileTranscriptionWorkflow:
def __init__(
self,
config: Optional[SingleFileWorkflowConfig] = None, # Workflow configuration including media and storage settings
)
"""
Self-contained single-file transcription workflow.
Creates and manages internal UnifiedPluginRegistry, ResourceManager,
TranscriptionJobManager, SSEBroadcastManager, MediaLibrary, ResultStorage,
StepFlow (plugin → file → confirm wizard), and APIRouter.
"""
def __init__(
self,
config: Optional[SingleFileWorkflowConfig] = None, # Workflow configuration including media and storage settings
)
"Initialize the workflow."
def transcription_manager(self) -> TranscriptionJobManager:
"""Access to internal transcription manager."""
return self._transcription_manager
@property
def plugin_registry(self) -> PluginRegistryAdapter
"Access to internal transcription manager."
def plugin_registry(self) -> PluginRegistryAdapter:
"""Access to plugin registry adapter."""
return self._plugin_adapter
@property
def media_library(self) -> MediaLibrary
"Access to plugin registry adapter."
def media_library(self) -> MediaLibrary:
"""Access to internal media library."""
return self._media_library
@property
def result_storage(self) -> ResultStorage
"Access to internal media library."
def result_storage(self) -> ResultStorage:
"""Access to internal result storage."""
return self._result_storage
@property
def router(self) -> APIRouter
"Access to internal result storage."
def router(self) -> APIRouter:
"""Main workflow router."""
return self._router
@property
def stepflow_router(self) -> APIRouter
"Main workflow router."
def stepflow_router(self) -> APIRouter
"StepFlow-generated router."
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cjm_fasthtml_workflow_transcription_single_file-0.0.3.tar.gz.
File metadata
- Download URL: cjm_fasthtml_workflow_transcription_single_file-0.0.3.tar.gz
- Upload date:
- Size: 78.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96af9b728a3625656f45e1b4ff1b3a1ac5f2f19ad79810abb1270e192c9b8c01
|
|
| MD5 |
793f41fd337321af0d14539d75a6d276
|
|
| BLAKE2b-256 |
f70c0833a2753cec1ed6cbff91640d8f9c9c0cb3fa93a2f092f5a0a6895e19c5
|
File details
Details for the file cjm_fasthtml_workflow_transcription_single_file-0.0.3-py3-none-any.whl.
File metadata
- Download URL: cjm_fasthtml_workflow_transcription_single_file-0.0.3-py3-none-any.whl
- Upload date:
- Size: 72.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38a736d15284399bae988b66e6c80a49a61cfe298314050a8526afa26e4ccdfd
|
|
| MD5 |
edaf79e8d909027ee1df23b005f276cb
|
|
| BLAKE2b-256 |
98d3292f28d1eaa5ed04be6e538ff9f81adf757da4efc21907c8e74f8340a32d
|