GENSHI Works STT SDK — high-accuracy domain-specific speech-to-text
Project description
GENSHI Works STT SDK
High-accuracy domain-specific speech-to-text SDK. Supports batch transcription and realtime streaming with built-in VAD and on-device STT inference.
Installation
Python
pip install genshiai-stt
Node.js
npm install @genshiai/stt
The correct native addon for your platform is installed automatically via optional dependencies.
Browser
npm install @genshiai/stt-web
Quick Start
Python
import asyncio
from genshi_stt import GenshiSTTClient
async def main() -> None:
async with GenshiSTTClient(api_key="gw-...") as client:
with open("recording.wav", "rb") as f:
result = await client.transcribe(f.read())
print(result.text)
for seg in result.segments:
print(f"[{seg.start:.2f}-{seg.end:.2f}] {seg.text}")
async with client.stream(
domain="medical",
secure=True,
effort="normal",
dictionary_ids=["dict_hospital"],
dictionaries=[
{
"id": "dict_hospital",
"name": "院内用語",
"industry": "medical",
"terms": [{"term": "GENSHI AI", "reading": "げんしえーあい"}],
}
],
) as session:
partials = await session.push(audio_chunk) # PCM16 bytes
print(partials[0].text if partials else "")
refined = await session.drain_events()
for event in refined:
if event.type == "refined":
print(event.index, event.text)
final = await session.finalize()
print(final.text)
asyncio.run(main())
Node.js / TypeScript
import { GenshiSTTClient } from '@genshiai/stt';
const client = new GenshiSTTClient({ apiKey: 'gw-...' });
// Batch transcription
const result = await client.transcribe(audioBuffer);
console.log(result.text);
for (const seg of result.segments) {
console.log(`[${seg.start.toFixed(2)}-${seg.end.toFixed(2)}] ${seg.text}`);
}
// Realtime streaming
const session = client.stream({
domain: 'medical',
secure: true,
effort: 'normal',
dictionaryIds: ['dict_hospital'],
dictionaries: [
{
id: 'dict_hospital',
name: '院内用語',
industry: 'medical',
terms: [{ term: 'GENSHI AI', reading: 'げんしえーあい' }],
},
],
});
const partials = await session.push(pcm16Chunk);
console.log(partials[0]?.text);
const refined = await session.drainEvents();
for (const event of refined) {
if (event.type === 'refined') {
console.log(event.index, event.text);
}
}
const final = await session.finalize();
console.log(final.text);
Browser
import { GenshiSTTClient, createMicStream } from '@genshiai/stt-web';
const client = new GenshiSTTClient({ apiKey: 'gw-...' });
await client.init();
const session = client.stream({
domain: 'medical',
secure: true,
effort: 'normal',
dictionaryIds: ['dict_hospital'],
dictionaries: [
{
id: 'dict_hospital',
name: '院内用語',
industry: 'medical',
terms: [{ term: 'GENSHI AI', reading: 'げんしえーあい' }],
},
],
});
const mic = await createMicStream({
onChunk: async (chunk) => {
const partials = await session.push(chunk);
console.log(partials[0]?.text);
const refined = await session.drainEvents();
for (const event of refined) {
if (event.type === 'refined') {
console.log(event.index, event.text);
}
}
},
});
// When done:
mic.stop();
const result = await session.finalize();
console.log(result.text);
Prefer await session.finalize() when you need the final corrected text.
await session.close() now performs a best-effort finalize for cleanup.
Use session.abort() only for intentional force-abort without billing finalize.
Choosing A Mode
| Mode | During recording | Correction cadence | Recommended for |
|---|---|---|---|
batch |
Nothing is emitted until the request finishes | One final full-text pass | File upload, post-processing |
realtime + effort="normal" |
partial text appears immediately |
Background correction is sparse | Dictation, meeting notes, standard live input |
realtime + effort="high" |
partial text appears immediately |
Background correction is more frequent | Live captions, simultaneous charting, terminology-sensitive input |
Realtime Mental Model
push()returns immediatepartialevents from local STTdrain_events()/drainEvents()returns queuedrefined/errorevents from background correctioneffort: "normal"batches corrections sparsely,effort: "high"refines more oftenfinalize()still performs the final full-text correction pass
secure=True / secure: true requests the Secure tier. Secure requests require an industry domain such as medical, or inline custom dictionaries.
Public SDK configuration is intentionally centered on domain, secure, dictionaryIds / dictionaries, and effort.
Low-level VAD and local model tuning are not part of the public API.
Realtime Event Example
push() returns a partial event:
{
"type": "partial",
"text": "ほんじつのけつあつは130の80です。",
"index": 0,
"processing_time_ms": 0
}
drain_events() / drainEvents() returns a refined event for the same segment:
{
"type": "refined",
"text": "本日の血圧は130の80です。",
"index": 0,
"processing_time_ms": 88
}
Response
{
"text": "本日の血圧は130の80です。次の患者さんをお願いします。",
"processing_time_ms": 142,
"segments": [
{
"id": 0,
"start": 0.32,
"end": 2.15,
"text": "本日の血圧は130の80です。"
},
{
"id": 1,
"start": 3.2,
"end": 4.8,
"text": "次の患者さんをお願いします。"
}
]
}
Pricing
Point-based billing. 1pt = ¥10. Billed per audio hour at finalize().
Standard
| Domain | Batch | Realtime (normal) | Realtime (high) |
|---|---|---|---|
| General | 2 pt/h (¥20) | 5 pt/h (¥50) | 7 pt/h (¥70) |
| Industry-specific | 4 pt/h (¥40) | 6 pt/h (¥60) | 8 pt/h (¥80) |
| + Custom Dictionary | 6 pt/h (¥60) | 7 pt/h (¥70) | 10 pt/h (¥100) |
Secure
Secure tier processes audio within a private VPC. Requires secure: true and a domain such as medical, or custom dictionaries.
| Domain | Batch | Realtime (normal) | Realtime (high) |
|---|---|---|---|
| General | 7 pt/h (¥70) | 8 pt/h (¥80) | 10 pt/h (¥100) |
| Industry-specific | 8 pt/h (¥80) | 10 pt/h (¥100) | 12 pt/h (¥120) |
| + Custom Dictionary | 9 pt/h (¥90) | 10 pt/h (¥100) | 13 pt/h (¥130) |
Effort levels:
effort: "normal"— background correction every 30-60s. Recommended for dictation, meeting notes, standard realtime input.effort: "high"— more frequent correction (every 15-30s). Recommended for live captions, simultaneous charting, terminology-sensitive input.
Final correction (finalize()) is always included regardless of effort level.
See https://docs.genshi.ai/stt/pricing for details.
Supported Platforms
| Platform | Python | Node.js |
|---|---|---|
| macOS ARM64 (Apple Silicon) | genshiai-stt-native | @genshiai/stt-native-darwin-arm64 |
| Linux x64 | genshiai-stt-native | @genshiai/stt-native-linux-x64 |
| Windows x64 | genshiai-stt-native | @genshiai/stt-native-windows-x64 |
| Browser | — | @genshiai/stt-web |
Requirements
- Python >= 3.10 / Node.js >= 20
- Valid GENSHI Works API key
ffmpegfor Python file/bytes decode and Node.js encoded audio decode
Browser SDK note:
await client.init()is required beforetranscribe()orrealtime()- the npm package includes JSON metadata, and secured ONNX assets are fetched via
POST /v1/activate
Documentation
Full documentation: https://docs.genshi.ai/stt
License
Proprietary. Copyright (c) 2026 GENSHI Works Inc. All rights reserved. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file genshiai_stt-4.0.0.tar.gz.
File metadata
- Download URL: genshiai_stt-4.0.0.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6c9cd5033f912113a86750d837d7e3ea7472e15810090a2c28a2b8a1decf2a6
|
|
| MD5 |
6ca3b8efe8e1a2d57fb6995e82264bca
|
|
| BLAKE2b-256 |
7957a982d6275ea05fa26a06f21d19a813ffa5d9f37f623a47688389c13e282f
|
File details
Details for the file genshiai_stt-4.0.0-py3-none-any.whl.
File metadata
- Download URL: genshiai_stt-4.0.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c4c0c04e8ef16583dcfc2e3501a975c903272ee267b7f02930e6d47b9ec487e
|
|
| MD5 |
87c0d1f3e74e48a7e3f66987b82545b7
|
|
| BLAKE2b-256 |
8e4b3062b0254ab6cfa50a3103befcb09deb990dea0e7280975931bb3af1eeb2
|