GENSHI Works STT SDK — high-accuracy domain-specific speech-to-text

These details have not been verified by PyPI

Project links

Project description

GENSHI Works STT SDK

High-accuracy domain-specific speech-to-text SDK. Supports batch transcription and realtime streaming with built-in VAD and on-device STT inference.

Installation

Python

pip install genshiai-stt

Node.js

npm install @genshiai/stt

The correct native addon for your platform is installed automatically via optional dependencies.

Browser

npm install @genshiai/stt-web

Quick Start

Python

import asyncio

from genshi_stt import GenshiSTTClient

async def main() -> None:
    async with GenshiSTTClient(api_key="gw-...") as client:
        with open("recording.wav", "rb") as f:
            result = await client.transcribe(f.read())
        print(result.text)
        for seg in result.segments:
            print(f"[{seg.start:.2f}-{seg.end:.2f}] {seg.text}")

        async with client.stream(
            domain="medical",
            secure=True,
            effort="normal",
            dictionary_ids=["dict_hospital"],
            dictionaries=[
                {
                    "id": "dict_hospital",
                    "name": "院内用語",
                    "industry": "medical",
                    "terms": [{"term": "GENSHI AI", "reading": "げんしえーあい"}],
                }
            ],
        ) as session:
            partials = await session.push(audio_chunk)  # PCM16 bytes
            print(partials[0].text if partials else "")

            refined = await session.drain_events()
            for event in refined:
                if event.type == "refined":
                    print(event.index, event.text)

            final = await session.finalize()
            print(final.text)

asyncio.run(main())

Node.js / TypeScript

import { GenshiSTTClient } from '@genshiai/stt';

const client = new GenshiSTTClient({ apiKey: 'gw-...' });

// Batch transcription
const result = await client.transcribe(audioBuffer);
console.log(result.text);
for (const seg of result.segments) {
  console.log(`[${seg.start.toFixed(2)}-${seg.end.toFixed(2)}] ${seg.text}`);
}

// Realtime streaming
const session = client.stream({
  domain: 'medical',
  secure: true,
  effort: 'normal',
  dictionaryIds: ['dict_hospital'],
  dictionaries: [
    {
      id: 'dict_hospital',
      name: '院内用語',
      industry: 'medical',
      terms: [{ term: 'GENSHI AI', reading: 'げんしえーあい' }],
    },
  ],
});
const partials = await session.push(pcm16Chunk);
console.log(partials[0]?.text);

const refined = await session.drainEvents();
for (const event of refined) {
  if (event.type === 'refined') {
    console.log(event.index, event.text);
  }
}

const final = await session.finalize();
console.log(final.text);

Browser

import { GenshiSTTClient, createMicStream } from '@genshiai/stt-web';

const client = new GenshiSTTClient({ apiKey: 'gw-...' });
await client.init();

const session = client.stream({
  domain: 'medical',
  secure: true,
  effort: 'normal',
  dictionaryIds: ['dict_hospital'],
  dictionaries: [
    {
      id: 'dict_hospital',
      name: '院内用語',
      industry: 'medical',
      terms: [{ term: 'GENSHI AI', reading: 'げんしえーあい' }],
    },
  ],
});

const mic = await createMicStream({
  onChunk: async (chunk) => {
    const partials = await session.push(chunk);
    console.log(partials[0]?.text);

    const refined = await session.drainEvents();
    for (const event of refined) {
      if (event.type === 'refined') {
        console.log(event.index, event.text);
      }
    }
  },
});

// When done:
mic.stop();
const result = await session.finalize();
console.log(result.text);

Prefer await session.finalize() when you need the final corrected text. await session.close() now performs a best-effort finalize for cleanup. Use session.abort() only for intentional force-abort without billing finalize.

Choosing A Mode

Mode	During recording	Correction cadence	Recommended for
`batch`	Nothing is emitted until the request finishes	One final full-text pass	File upload, post-processing
`realtime` + `effort="normal"`	`partial` text appears immediately	Background correction is sparse	Dictation, meeting notes, standard live input
`realtime` + `effort="high"`	`partial` text appears immediately	Background correction is more frequent	Live captions, simultaneous charting, terminology-sensitive input

Realtime Mental Model

push() returns immediate partial events from local STT
drain_events() / drainEvents() returns queued refined / error events from background correction
effort: "normal" batches corrections sparsely, effort: "high" refines more often
finalize() still performs the final full-text correction pass

secure=True / secure: true requests the Secure tier. Secure requests require an industry domain such as medical, or inline custom dictionaries. Public SDK configuration is intentionally centered on domain, secure, dictionaryIds / dictionaries, and effort. Low-level VAD and local model tuning are not part of the public API.

Realtime Event Example

push() returns a partial event:

{
  "type": "partial",
  "text": "ほんじつのけつあつは130の80です。",
  "index": 0,
  "processing_time_ms": 0
}

drain_events() / drainEvents() returns a refined event for the same segment:

{
  "type": "refined",
  "text": "本日の血圧は130の80です。",
  "index": 0,
  "processing_time_ms": 88
}

Response

{
  "text": "本日の血圧は130の80です。次の患者さんをお願いします。",
  "processing_time_ms": 142,
  "segments": [
    {
      "id": 0,
      "start": 0.32,
      "end": 2.15,
      "text": "本日の血圧は130の80です。"
    },
    {
      "id": 1,
      "start": 3.2,
      "end": 4.8,
      "text": "次の患者さんをお願いします。"
    }
  ]
}

Pricing

Point-based billing. 1pt = ¥10. Billed per audio hour at finalize().

Standard

Domain	Batch	Realtime (normal)	Realtime (high)
General	2 pt/h (¥20)	5 pt/h (¥50)	7 pt/h (¥70)
Industry-specific	4 pt/h (¥40)	6 pt/h (¥60)	8 pt/h (¥80)
+ Custom Dictionary	6 pt/h (¥60)	7 pt/h (¥70)	10 pt/h (¥100)

Secure

Secure tier processes audio within a private VPC. Requires secure: true and a domain such as medical, or custom dictionaries.

Domain	Batch	Realtime (normal)	Realtime (high)
General	7 pt/h (¥70)	8 pt/h (¥80)	10 pt/h (¥100)
Industry-specific	8 pt/h (¥80)	10 pt/h (¥100)	12 pt/h (¥120)
+ Custom Dictionary	9 pt/h (¥90)	10 pt/h (¥100)	13 pt/h (¥130)

Effort levels:

effort: "normal" — background correction every 30-60s. Recommended for dictation, meeting notes, standard realtime input.
effort: "high" — more frequent correction (every 15-30s). Recommended for live captions, simultaneous charting, terminology-sensitive input.

Final correction (finalize()) is always included regardless of effort level.

See https://docs.genshi.ai/stt/pricing for details.

Supported Platforms

Platform	Python	Node.js
macOS ARM64 (Apple Silicon)	genshiai-stt-native	@genshiai/stt-native-darwin-arm64
Linux x64	genshiai-stt-native	@genshiai/stt-native-linux-x64
Windows x64	genshiai-stt-native	@genshiai/stt-native-windows-x64
Browser	—	@genshiai/stt-web

Requirements

Python >= 3.10 / Node.js >= 20
Valid GENSHI Works API key
ffmpeg for Python file/bytes decode and Node.js encoded audio decode

Browser SDK note:

await client.init() is required before transcribe() or realtime()
the npm package includes JSON metadata, and secured ONNX assets are fetched via POST /v1/activate

Documentation

Full documentation: https://docs.genshi.ai/stt

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

5.0.1

May 14, 2026

5.0.0

Apr 19, 2026

This version

4.0.0 yanked

Apr 8, 2026

3.0.0 yanked

Apr 3, 2026

2.0.0 yanked

Apr 2, 2026

1.0.2 yanked

Mar 17, 2026

1.0.1 yanked

Mar 12, 2026

1.0.0 yanked

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genshiai_stt-4.0.0.tar.gz (11.7 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

genshiai_stt-4.0.0-py3-none-any.whl (13.0 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file genshiai_stt-4.0.0.tar.gz.

File metadata

Download URL: genshiai_stt-4.0.0.tar.gz
Upload date: Apr 8, 2026
Size: 11.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for genshiai_stt-4.0.0.tar.gz
Algorithm	Hash digest
SHA256	`b6c9cd5033f912113a86750d837d7e3ea7472e15810090a2c28a2b8a1decf2a6`
MD5	`6ca3b8efe8e1a2d57fb6995e82264bca`
BLAKE2b-256	`7957a982d6275ea05fa26a06f21d19a813ffa5d9f37f623a47688389c13e282f`

See more details on using hashes here.

File details

Details for the file genshiai_stt-4.0.0-py3-none-any.whl.

File metadata

Download URL: genshiai_stt-4.0.0-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 13.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for genshiai_stt-4.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c4c0c04e8ef16583dcfc2e3501a975c903272ee267b7f02930e6d47b9ec487e`
MD5	`87c0d1f3e74e48a7e3f66987b82545b7`
BLAKE2b-256	`8e4b3062b0254ab6cfa50a3103befcb09deb990dea0e7280975931bb3af1eeb2`

See more details on using hashes here.

genshiai-stt 4.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GENSHI Works STT SDK

Installation

Python

Node.js

Browser

Quick Start

Python

Node.js / TypeScript

Browser

Choosing A Mode

Realtime Mental Model

Realtime Event Example

Response

Pricing

Standard

Secure

Supported Platforms

Requirements

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes