Skip to main content

ct-transformer punctuation model for fasr

Project description

fasr-punc-ct-transformer

Chinese documentation

CT-Transformer punctuation restoration for fasr. Use it as the sentencizer stage after ASR to split raw recognized text into punctuated AudioSpan sentences.

Install

pip install fasr-punc-ct-transformer

Registered Model

Registry name Class Best for
ct_transformer CTTransformerForPunc Chinese and mixed Chinese-English punctuation restoration

The default checkpoint is iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch.

Pipeline Usage

from fasr import AudioPipeline

pipeline = (
    AudioPipeline()
    .add_pipe("detector", model="fsmn")
    .add_pipe("recognizer", model="paraformer")
    .add_pipe(
        "sentencizer",
        model="ct_transformer",
        disable_log=True,
        disable_pbar=True,
    )
)

Confection Config

[punc_model]
@punc_models = "ct_transformer"
disable_update = true
disable_log = true
disable_pbar = true

Inside a pipeline:

[pipeline]
@pipelines = "AudioPipeline.v1"
pipe_order = ["sentencizer"]

[pipeline.pipes]

[pipeline.pipes.sentencizer]
@pipes = "thread_pipe"

[pipeline.pipes.sentencizer.component]
@components = "sentencizer"

[pipeline.pipes.sentencizer.component.model]
@punc_models = "ct_transformer"
disable_update = true
disable_log = true
disable_pbar = true

Direct Model Usage

from fasr.config import registry

model = registry.punc_models.get("ct_transformer")()
sentences = model.restore("今天天气真好我想出去玩你觉得呢")
for sentence in sentences:
    print(sentence.text)

Use local weights:

model.load_checkpoint("/path/to/ct-transformer")

Parameters

Parameter Type / range Default true false Change when
disable_update bool True Skips FunASR checkpoint update checks Lets FunASR check for updates You need reproducible startup or want update checks
disable_log bool True Suppresses backend logs Shows backend logs Debugging model loading or inference
disable_pbar bool True Hides progress bars Shows progress bars Interactive scripts where progress output is useful

Generic checkpoint fields such as checkpoint, cache_dir, endpoint, revision, and force_download are inherited from the base model.

Notes

  • restore(text) returns an AudioSpanList, not a plain string.
  • Input text should already be recognized text. This plugin does not run ASR.
  • For pipeline usage, put this model on the sentencizer component.

Dependencies

  • fasr
  • funasr
  • Python 3.10-3.12

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasr_punc_ct_transformer-0.5.2.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasr_punc_ct_transformer-0.5.2-py3-none-any.whl (5.3 kB view details)

Uploaded Python 3

File details

Details for the file fasr_punc_ct_transformer-0.5.2.tar.gz.

File metadata

  • Download URL: fasr_punc_ct_transformer-0.5.2.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_punc_ct_transformer-0.5.2.tar.gz
Algorithm Hash digest
SHA256 ebc9b424c3c4fb44a2c56f9deef13922fb53ad6abf13352cda019709b26715a9
MD5 236708c0e9c1359144a1eca2e2462e9d
BLAKE2b-256 59ae11c3b3a288519dc71d616621396dafc6242e1a3c9f4c8d45acb3265f5144

See more details on using hashes here.

File details

Details for the file fasr_punc_ct_transformer-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: fasr_punc_ct_transformer-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 5.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasr_punc_ct_transformer-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 9c833d116a795b8030154d3f9d5cc97bf43ff23e59ed806b20b1582faf13e231
MD5 07329d99d3122010dfad2741eb6a8163
BLAKE2b-256 d8ba694ecc20371251765f7d2d2cec63ab9eb43b88044e6e01c581d5ee8117a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page