Skip to main content

GENSHI Works STT SDK — high-accuracy domain-specific speech-to-text

Project description

GENSHI Works STT SDK

High-accuracy domain-specific speech-to-text SDK. Supports batch transcription and realtime streaming with built-in VAD and on-device STT inference.

Installation

Python

pip install genshiai-stt

Node.js

npm install @genshiai/stt

The correct native addon for your platform is installed automatically via optional dependencies.

Browser

npm install @genshiai/stt-web

Quick Start

Python

import asyncio

from genshi_stt import GenshiSTTClient

async def main() -> None:
    async with GenshiSTTClient(api_key="gw-...", secure=True) as client:
        with open("recording.wav", "rb") as f:
            result = await client.transcribe(
                f.read(),
                model="genshi-stt-v1-pro",
                domain="medical",
            )
        print(result.text)
        for seg in result.segments:
            print(f"[{seg.start:.2f}-{seg.end:.2f}] {seg.text}")

        async with client.stream(
            model="genshi-stt-v1-pro-plus",
            effort="normal",
            dictionary_ids=["dict_hospital"],
        ) as session:
            partials = await session.push(audio_chunk)  # PCM16 bytes
            print(partials[0].text if partials else "")

            refined = await session.drain_events()
            for event in refined:
                if event.type == "refined":
                    print(event.index, event.text)

            final = await session.finalize()
            print(final.text)

asyncio.run(main())

Node.js / TypeScript

import { GenshiSTTClient } from '@genshiai/stt';

const client = new GenshiSTTClient({ apiKey: 'gw-...', secure: true });

// Batch transcription
const result = await client.transcribe(audioBuffer, {
  model: 'genshi-stt-v1-pro',
  domain: 'medical',
});
console.log(result.text);
for (const seg of result.segments) {
  console.log(`[${seg.start.toFixed(2)}-${seg.end.toFixed(2)}] ${seg.text}`);
}

// Realtime streaming
const session = client.stream({
  model: 'genshi-stt-v1-pro-plus',
  effort: 'normal',
  dictionaryIds: ['dict_hospital'],
});
const partials = await session.push(pcm16Chunk);
console.log(partials[0]?.text);

const refined = await session.drainEvents();
for (const event of refined) {
  if (event.type === 'refined') {
    console.log(event.index, event.text);
  }
}

const final = await session.finalize();
console.log(final.text);

Browser

import { GenshiSTTClient, createMicStream } from '@genshiai/stt-web';

const client = new GenshiSTTClient({ apiKey: 'gw-...', secure: true });
await client.init();

const session = client.stream({
  model: 'genshi-stt-v1-pro-plus',
  effort: 'normal',
  dictionaryIds: ['dict_hospital'],
});

const mic = await createMicStream({
  onChunk: async (chunk) => {
    const partials = await session.push(chunk);
    console.log(partials[0]?.text);

    const refined = await session.drainEvents();
    for (const event of refined) {
      if (event.type === 'refined') {
        console.log(event.index, event.text);
      }
    }
  },
});

// When done:
mic.stop();
const result = await session.finalize();
console.log(result.text);

Prefer await session.finalize() when you need the final corrected text. await session.close() now performs a best-effort finalize for cleanup. Use session.abort() only for intentional force-abort without billing finalize.

Choosing A Mode

Mode During recording Correction cadence Recommended for
batch Nothing is emitted until the request finishes One final full-text pass File upload, post-processing
realtime + effort="normal" partial text appears immediately Background correction is sparse Dictation, meeting notes, standard live input
realtime + effort="high" partial text appears immediately Background correction is more frequent Live captions, simultaneous charting, terminology-sensitive input

Realtime Mental Model

  • push() returns immediate partial events from local STT
  • drain_events() / drainEvents() returns queued refined / error events from background correction
  • effort: "normal" batches corrections sparsely, effort: "high" refines more often
  • finalize() still performs the final full-text correction pass

Public SDK configuration is intentionally centered on model, optional domain, dictionaryIds or dictionaries="bound", and effort. secure=True / secure: true is honored only when the API key is configured as flexible; normal keys ignore it and secure-only keys force it on every request. Configure the policy at key creation time in Console. Low-level VAD and local model tuning are not part of the public API.

Realtime Event Example

push() returns a partial event:

{
  "type": "partial",
  "text": "ほんじつのけつあつは130の80です。",
  "index": 0,
  "processing_time_ms": 0
}

drain_events() / drainEvents() returns a refined event for the same segment:

{
  "type": "refined",
  "text": "本日の血圧は130の80です。",
  "index": 0,
  "processing_time_ms": 88
}

Response

{
  "text": "本日の血圧は130の80です。次の患者さんをお願いします。",
  "processing_time_ms": 142,
  "segments": [
    {
      "id": 0,
      "start": 0.32,
      "end": 2.15,
      "text": "本日の血圧は130の80です。"
    },
    {
      "id": 1,
      "start": 3.2,
      "end": 4.8,
      "text": "次の患者さんをお願いします。"
    }
  ]
}

Pricing

Point-based billing. 1pt = ¥10. Billed per audio hour.

Model Standard billing secure billing
genshi-stt-v1-lite 2 pt/h (¥20) 5 pt/h (¥50)
genshi-stt-v1-standard 6 pt/h (¥60) 9 pt/h (¥90)
genshi-stt-v1-pro 10 pt/h (¥100) 13 pt/h (¥130)
genshi-stt-v1-pro-plus 13 pt/h (¥130) 15 pt/h (¥150)

Validation matrix:

  • lite: domain NG, dictionaries NG
  • normal: domain NG, dictionaries NG
  • pro: non-general domain required, dictionaries NG
  • pro-plus: dictionaryIds or dictionaries="bound" required, both together NG

See https://docs.genshi.ai/stt/pricing for details.

Supported Platforms

Platform Python Node.js
macOS ARM64 (Apple Silicon) genshiai-stt-native @genshiai/stt-native-darwin-arm64
Linux x64 genshiai-stt-native @genshiai/stt-native-linux-x64
Windows x64 genshiai-stt-native @genshiai/stt-native-windows-x64
Browser @genshiai/stt-web

Requirements

  • Python >= 3.10 / Node.js >= 20
  • Valid GENSHI Works API key
  • ffmpeg for Python file/bytes decode and Node.js encoded audio decode

Browser SDK note:

  • await client.init() is required before transcribe() or realtime()
  • the npm package includes JSON metadata, and secured ONNX assets are fetched via POST /v1/activate

Documentation

Full documentation: https://docs.genshi.ai/stt

License

Proprietary. Copyright (c) 2026 GENSHI Works Inc. All rights reserved. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genshiai_stt-5.0.1.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genshiai_stt-5.0.1-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file genshiai_stt-5.0.1.tar.gz.

File metadata

  • Download URL: genshiai_stt-5.0.1.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for genshiai_stt-5.0.1.tar.gz
Algorithm Hash digest
SHA256 532a0e76449bf5cc36ba421e9f620ddf2594eb85a143f6b795b39f2bbfbc55a8
MD5 8930c52d5a7272f7268e3114f9679f42
BLAKE2b-256 bab319aaea57dccaa291b2e910f656ca484eb27d2593a95dd55e44c449c0dce4

See more details on using hashes here.

File details

Details for the file genshiai_stt-5.0.1-py3-none-any.whl.

File metadata

  • Download URL: genshiai_stt-5.0.1-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for genshiai_stt-5.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3cef8350fd1d0be5ea1726be4c7c3040c89e28a8442e0e1e23e3f81de4ce352
MD5 90d81a1333ab5abd44256956a0afcd85
BLAKE2b-256 ced1d4e04dfc75a4ad3bc4a6263eebc9caf2f1c969dc0446ba132a2c26aa2d3b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page