An OpenAI Whisper speech-to-text capability for the cjm-substrate runtime that provides local transcription with configurable model selection and parameter control.

These details have not been verified by PyPI

Project links

Project description

cjm-capability-whisper

Install

pip install cjm_capability_whisper

Project Structure

nbs/
└── capability.ipynb # Capability implementation for OpenAI Whisper transcription

Total: 1 notebook across 1 directory

Module Dependencies

graph LR
    capability[capability<br/>Whisper Capability]

No cross-module dependencies detected.

CLI Reference

No CLI commands found in this project.

Module Overview

Detailed documentation for each module in the project:

Whisper Capability (`capability.ipynb`)

Capability implementation for OpenAI Whisper transcription

Import

from cjm_capability_whisper.capability import (
    WHISPER_AVAILABLE,
    WhisperCapabilityConfig,
    WhisperLocalCapability
)

Functions

@patch
def _apply_config(
    self:WhisperLocalCapability,
    config: Optional[Any] = None # Configuration dataclass, dict, or None
) -> None
    """
    CR-4: apply config values + derive config-dependent state (device,
    model_dir). No heavy-resource work. Called by initialize (first-time) and by
    the substrate's reconfigure delta path. Model release on a model/device/
    model_dir/compile_model change is handled declaratively via RELOAD_TRIGGER
    -> _release_model (fired by the substrate BEFORE this re-applies config).
    """

@patch
def _release_model(self:WhisperLocalCapability) -> None
    """
    CR-4: release the loaded model + free CUDA cache. RELOAD_TRIGGER target for
    model/device/model_dir/compile_model; on_disable / cleanup delegate here.
    Idempotent via cjm-substrate-torch-utils' release_model (no-op when already None).
    """

@patch
def _load_model(self:WhisperLocalCapability) -> None:
    """Load the Whisper model (lazy loading)."""
    if self.model is None
    "Load the Whisper model (lazy loading)."

@patch
def _prepare_audio(
    self:WhisperLocalCapability,
    audio: Union[str, Path] # Path to a decodable audio file
) -> str: # The audio file path
    """
    Validate the audio input and return it as a path string.
    
    The caller (orchestration / proxy) guarantees a model-ready audio file;
    in-memory preparation is no longer a capability responsibility.
    """

@patch
def is_available(self:WhisperLocalCapability) -> bool: # True if Whisper and its dependencies are available
    "Check if Whisper is available."

@patch
def prefetch(self:WhisperLocalCapability) -> None
    """
    CR-4 (SG-19): eagerly load the model so the first execute() doesn't pay
    the download/load cost. Idempotent via _load_model's None-guard.
    """

@patch
def on_disable(self:WhisperLocalCapability) -> None
    """
    CR-2: release the GPU model when the operator disables the capability (the
    worker stays alive); lazy reload on the next execute after re-enable.
    """

@patch
def cleanup(self:WhisperLocalCapability) -> None
    "Release resources on unload."

Classes

@dataclass
class WhisperCapabilityConfig:
    "Configuration for Whisper transcription capability."
    
    model: str = field(...)
    device: str = field(...)
    language: Optional[str] = field(...)
    task: str = field(...)
    temperature: float = field(...)
    temperature_increment_on_fallback: Optional[float] = field(...)
    beam_size: int = field(...)
    best_of: int = field(...)
    patience: float = field(...)
    length_penalty: Optional[float] = field(...)
    suppress_tokens: str = field(...)
    initial_prompt: Optional[str] = field(...)
    condition_on_previous_text: bool = field(...)
    fp16: bool = field(...)
    compression_ratio_threshold: float = field(...)
    logprob_threshold: float = field(...)
    no_speech_threshold: float = field(...)
    word_timestamps: bool = field(...)
    prepend_punctuations: str = field(...)
    append_punctuations: str = field(...)
    threads: int = field(...)
    model_dir: Optional[str] = field(...)
    compile_model: bool = field(...)

class WhisperLocalCapability:
    def __init__(self):
        """Initialize the Whisper capability with default configuration."""
        self.logger = logging.getLogger(f"{__name__}.{type(self).__name__}")
        self.config: WhisperCapabilityConfig = None
    """
    OpenAI Whisper transcription capability (stage 8: pure-compute tool capability).
    
    Native-surface model (PILLAR 1c): this tool is PURE COMPUTE — `transcribe`
    loads the model, runs inference, and builds the typed `TranscriptionResult`.
    The cache-check + persistence bookends + the per-call `force` control live in
    the generic transcription adapter (cjm-transcription-adapter-interface); the
    result DTO lives in cjm-capability-primitives; identity is derived from the
    installed distribution. No `get_plugin_metadata`, no `self.storage`.
    """
    
    def __init__(self):
            """Initialize the Whisper capability with default configuration."""
            self.logger = logging.getLogger(f"{__name__}.{type(self).__name__}")
            self.config: WhisperCapabilityConfig = None
        "Initialize the Whisper capability with default configuration."
    
    def name(self) -> str: # Capability name identifier
            """Capability identity, derived from the installed distribution (PILLAR 1c).
    
            Runtime-derived: in the worker / in-env introspection `__package__`
            resolves; the manifest records the same value independently (the
            dual-mode generator reads it from the distribution)."""
            from importlib.metadata import metadata, packages_distributions
            dist = (packages_distributions().get(__package__) or [__package__.replace("_", "-")])[0]
            return metadata(dist)["Name"]
    
        @property
        def version(self) -> str: # Capability version string
        "Capability identity, derived from the installed distribution (PILLAR 1c).

Runtime-derived: in the worker / in-env introspection `__package__`
resolves; the manifest records the same value independently (the
dual-mode generator reads it from the distribution)."
    
    def version(self) -> str: # Capability version string
            """Get the capability version string."""
            from cjm_capability_whisper import __version__
            return __version__
    
        def get_current_config(self) -> Dict[str, Any]: # Current configuration as dictionary
        "Get the capability version string."
    
    def get_current_config(self) -> Dict[str, Any]: # Current configuration as dictionary
            """Return current configuration state."""
            if not self.config
        "Return current configuration state."
    
    def get_config_schema(self) -> Dict[str, Any]: # JSON Schema for configuration
            """Return JSON Schema for UI generation."""
            return dataclass_to_jsonschema(WhisperCapabilityConfig)
    
        @staticmethod
        def get_config_dataclass() -> WhisperCapabilityConfig: # Configuration dataclass
        "Return JSON Schema for UI generation."
    
    def get_config_dataclass() -> WhisperCapabilityConfig: # Configuration dataclass
            """Return dataclass describing the capability's configuration options."""
            return WhisperCapabilityConfig
    
        def initialize(
            self,
            config: Optional[Any] = None # Configuration dataclass, dict, or None
        ) -> None
        "Return dataclass describing the capability's configuration options."
    
    def initialize(
            self,
            config: Optional[Any] = None # Configuration dataclass, dict, or None
        ) -> None
        "First-time setup. CR-4: the manual model/device diff-and-reload is replaced
by declarative RELOAD_TRIGGER metadata; the substrate's reconfigure path fires
_release_model then re-applies config via _apply_config."
    
    def transcribe(
            self,
            audio: Union[str, Path], # Path to MODEL-READY audio (converted upstream)
            **kwargs # Provenance (source_start_time/source_end_time) stamped into metadata
        ) -> TranscriptionResult: # Typed transcription output
        "Transcribe model-ready audio using Whisper — PURE COMPUTE.

Stage 8 / PILLAR 1c: the cache-check + persistence bookends moved to the
generic transcription adapter; this method loads the model, runs
inference, and builds the typed result. Model params come from
`self.config` (the CR-15 per-call override path is gone — the tool runs
its effective config, no metadata lie); `source_start_time` /
`source_end_time` ride the provenance kwarg channel into metadata."

Variables

WHISPER_AVAILABLE

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.42

Jun 21, 2026

This version

0.0.41

Jun 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cjm_capability_whisper-0.0.41.tar.gz (15.8 kB view details)

Uploaded Jun 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cjm_capability_whisper-0.0.41-py3-none-any.whl (16.5 kB view details)

Uploaded Jun 20, 2026 Python 3

File details

Details for the file cjm_capability_whisper-0.0.41.tar.gz.

File metadata

Download URL: cjm_capability_whisper-0.0.41.tar.gz
Upload date: Jun 20, 2026
Size: 15.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cjm_capability_whisper-0.0.41.tar.gz
Algorithm	Hash digest
SHA256	`75d26e90a8744cc3d389d0744224433d8e84dd232e1c4d449564103287efb1ae`
MD5	`b21d2f86f27f8f941edc79c596c412cd`
BLAKE2b-256	`61f14db48f560c4c96f3533d071cb52941a615faa35a6ecc1b63b8329eae62e2`

See more details on using hashes here.

File details

Details for the file cjm_capability_whisper-0.0.41-py3-none-any.whl.

File metadata

Download URL: cjm_capability_whisper-0.0.41-py3-none-any.whl
Upload date: Jun 20, 2026
Size: 16.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for cjm_capability_whisper-0.0.41-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13a5d3d0ccb8f6cbe1a411676df2ba34bfa8f74f7eb7f9c0ae2500a3b4cad9a9`
MD5	`613cc54a9240e058122426b2c26f3d63`
BLAKE2b-256	`5106139af83e3d51d246ef86e358fa799bec834e55278449302f5bfc434c8ed8`

See more details on using hashes here.

cjm-capability-whisper 0.0.41

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cjm-capability-whisper

Install

Project Structure

Module Dependencies

CLI Reference

Module Overview

Whisper Capability (`capability.ipynb`)

Import

Functions

Classes

Variables

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

cjm-capability-whisper 0.0.41

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cjm-capability-whisper

Install

Project Structure

Module Dependencies

CLI Reference

Module Overview

Whisper Capability (capability.ipynb)

Import

Functions

Classes

Variables

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Whisper Capability (`capability.ipynb`)