Skip to main content

No project description provided

Project description

Joint Disfluency Detection and Constituency Parsing

A joint disfluency detection and constituency parsing model for transcribed speech based on Neural Constituency Parsing of Speech Transcripts from NAACL 2019, with additional changes (e.g. self-training and ensembling) as described in Improving Disfluency Detection by Self-Training a Self-Attentive Model from ACL 2020.

This repository updated the original repository to focus on inferencing using the pretrained swbd_fisher_bert_Edev.0.9078.pt model.

Installation

$ git clone https://github.com/liwangd/disfluency-constituency-parser.git
$ cd disfluency-constituency-parser
$ pip install .

Usage

$ wget https://github.com/pariajm/joint-disfluency-detector-and-parser/releases/download/naacl2019/swbd_fisher_bert_Edev.0.9078.pt
$ wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
$ wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz
from dc_parser import DC_Model
model = DC_Model(model_path = "/path/to/swbd_fisher_bert_Edev.0.9078.pt",
                bert_model_path = "/path/to/bert-base-uncased.tar.gz",
                bert_vocab_path = "/path/to/bert-base-uncased-vocab.txt",)
model.parse(["Today is a very good day!"])

Citation

If you use this model, please cite the following papers:

@inproceedings{jamshid-lou-2019-neural,
    title = "Neural Constituency Parsing of Speech Transcripts",
    author = "Jamshid Lou, Paria and Wang, Yufei and Johnson, Mark",
    booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
    month = "June",
    year = "2019",
    address = "Minneapolis, Minnesota",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/N19-1282",
    doi = "10.18653/v1/N19-1282",
    pages = "2756--2765"
}
@inproceedings{jamshid-lou-2020-improving,
    title = "Improving Disfluency Detection by Self-Training a Self-Attentive Model",
    author = "Jamshid Lou, Paria and Johnson, Mark",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = "jul",
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.346",
    pages = "3754--3763"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

disfluency-constituency-parser-0.0.3.tar.gz (21.3 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file disfluency-constituency-parser-0.0.3.tar.gz.

File metadata

File hashes

Hashes for disfluency-constituency-parser-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5c358c7f4807b55315fa6e0c904aebdbe18d63e28c40034395fe7314ced6be9c
MD5 3a99f6d6b9fcdc95ccedbac17dd68d1d
BLAKE2b-256 2b4caa59f90cdc13d4fc67d9a3222e10d2d354076da4fc80924f7c560b61f50e

See more details on using hashes here.

File details

Details for the file disfluency_constituency_parser-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for disfluency_constituency_parser-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e402616f1e889206be96f6742fac9065b853d6c28dec1429c4bd4a1f9a69e8ca
MD5 7078f244520bd16a7d7c176748b1c7f6
BLAKE2b-256 d85a5953036d01019abc72c7b742ce1511d64a208fb1d0ee01684c730b7f7c3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page