Skip to main content

FireRed ASR for fasr (bundled fireredasr2 inference)

Project description

fasr-asr-firered

Chinese documentation

FireRedASR2 speech recognition for fasr. The plugin exposes both AED decoding and LLM decoding. AED can return token timestamps; LLM focuses on full-text accuracy without timestamps.

Install

pip install fasr-asr-firered

Registered Models

Registry name Class Best for
firered FireRedAEDForASR Default alias for AED mode
firered_aed FireRedAEDForASR Timestamped AED recognition
firered_llm FireRedLLMForASR LLM decoding, no timestamps

Default checkpoints:

Model Checkpoint
firered_aed FireRedTeam/FireRedASR2-AED
firered_llm FireRedTeam/FireRedASR2-LLM

Pipeline Usage

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe(
        "recognizer",
        model="firered_aed",
        device="cuda",
        beam_size=3,
        return_timestamp=True,
    )
    .add_pipe("sentencizer", model="ct_transformer")
)

Quick choices:

Goal Use Result
Token timestamps model="firered_aed", return_timestamp=True Populates span.tokens
Full-text decoding model="firered_llm" Populates span.raw_text, no timestamps
Lower VRAM for AED use_half=True FP16 inference on GPU
CPU inference device="cpu" Runs without CUDA, slower
Wider search beam_size=5 Potentially better accuracy, slower

Confection Config

[asr_model]
@asr_models = "firered_aed"
device = "cuda"
beam_size = 3
return_timestamp = true
use_half = true

Inside a pipeline:

[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["recognizer"]

[pipeline.pipes]

[pipeline.pipes.recognizer]
@pipes = "thread_pipe"
batch_size = 2

[pipeline.pipes.recognizer.component]
@components = "recognizer"

[pipeline.pipes.recognizer.component.model]
@asr_models = "firered_aed"
device = "cuda"
beam_size = 3
return_timestamp = true

Direct Model Usage

from fasr.config import registry

model = registry.asr_models.get("firered_aed")(
    device="cuda",
    beam_size=3,
    return_timestamp=True,
)

spans = model.transcribe(audio_spans)
for span in spans:
    print(span.text)

Use local weights:

model.load_checkpoint("/path/to/FireRedASR2-AED")

Shared Parameters

Parameter Type / range Default Higher / true Lower / false Change when
device str or None None "cuda" uses GPU "cpu" uses CPU Deployment target changes
beam_size int >= 1 3 Wider search, slower, more memory Faster, possibly lower accuracy Accuracy/speed tradeoff
decode_max_len int >= 0 0 Allows longer outputs Shorter cap; 0 lets backend decide Output is truncated or too long

AED Parameters

Parameter Type / range Default Higher / true Lower / false Change when
use_half bool True Lower VRAM, faster on GPU FP32, more stable GPU memory or numeric stability matters
nbest int >= 1 1 More hypotheses Single best result You need alternative hypotheses
softmax_smoothing float 1.25 Smoother distribution Sharper distribution Beam search needs tuning
aed_length_penalty float 0.6 Favors different output lengths Less length adjustment Output is too short or too long
eos_penalty float 1.0 Discourages ending too early Easier EOS Decoding ends too early or too late
return_timestamp bool True Returns token timestamps Text only You need word/character timing
elm_weight float 0.0 More external LM influence 0.0 disables external LM You provide elm_dir

LLM Parameters

Parameter Type / range Default Higher value Lower value Change when
decode_min_len int >= 0 0 Forces longer minimum output Allows shorter output Output ends too early
repetition_penalty float 1.2 Stronger repetition suppression Allows more repetition Repeated phrases appear
llm_length_penalty float 0.0 Adjusts length preference Less length adjustment Output length is biased
temperature float >= 0 1.0 More diverse, less deterministic More deterministic You need stability or diversity

Generic checkpoint fields such as checkpoint, cache_dir, endpoint, revision, and force_download are inherited from the base model.

Output

  • AED writes span.raw_text.
  • AED also fills span.tokens when return_timestamp=True.
  • LLM writes span.raw_text and leaves span.tokens empty.

Dependencies

  • fasr
  • torch >= 2.0.0
  • torchaudio
  • transformers >= 4.36
  • librosa >= 0.10.0
  • kaldiio >= 2.18.0
  • kaldi-native-fbank >= 1.19.0
  • Python 3.10-3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_asr_firered-0.5.2.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_asr_firered-0.5.2-py3-none-any.whl (36.8 kB view details)

Uploaded Python 3

File details

Details for the file fasr_asr_firered-0.5.2.tar.gz.

File metadata

  • Download URL: fasr_asr_firered-0.5.2.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_asr_firered-0.5.2.tar.gz
Algorithm Hash digest
SHA256 9981a4dc78c25d953c4b5320e52049707256061e66dd7a76283361ad316e213a
MD5 a130d1653c243d6633c8f8221f7af710
BLAKE2b-256 612a708572e25b6ee151f9628c6adcc86f8b0675f793023c29ba9c9f5ad397c5

See more details on using hashes here.

File details

Details for the file fasr_asr_firered-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: fasr_asr_firered-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 36.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_asr_firered-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 077e43ffce592778d7dce1c6a5d70d0a0f70b09e3bf07f0a0c79a71e88c33a73
MD5 fbce9796d927135fe3ceda11ace84eba
BLAKE2b-256 55e5c25e402a2f7141d5a20a70310704d62c2721b88270b6923c0d7f63a6174a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page