No project description provided
Project description
Joint Disfluency Detection and Constituency Parsing
A joint disfluency detection and constituency parsing model for transcribed speech based on Neural Constituency Parsing of Speech Transcripts from NAACL 2019, with additional changes (e.g. self-training and ensembling) as described in Improving Disfluency Detection by Self-Training a Self-Attentive Model from ACL 2020.
This repository updated the original repository to focus on inferencing using the pretrained swbd_fisher_bert_Edev.0.9078.pt
model.
Installation
$ pip install disfluency-constituency-parser
Usage
$ wget https://github.com/pariajm/joint-disfluency-detector-and-parser/releases/download/naacl2019/swbd_fisher_bert_Edev.0.9078.pt
$ wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
$ wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz
from dc_parser import DC_Model
model = DC_Model(model_path = "/path/to/swbd_fisher_bert_Edev.0.9078.pt",
bert_model_path = "/path/to/bert-base-uncased.tar.gz",
bert_vocab_path = "/path/to/bert-base-uncased-vocab.txt",)
model.parse(["Today is a very good day!"])
Citation
If you use this model, please cite the following papers:
@inproceedings{jamshid-lou-2019-neural,
title = "Neural Constituency Parsing of Speech Transcripts",
author = "Jamshid Lou, Paria and Wang, Yufei and Johnson, Mark",
booktitle = "Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)",
month = "June",
year = "2019",
address = "Minneapolis, Minnesota",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/N19-1282",
doi = "10.18653/v1/N19-1282",
pages = "2756--2765"
}
@inproceedings{jamshid-lou-2020-improving,
title = "Improving Disfluency Detection by Self-Training a Self-Attentive Model",
author = "Jamshid Lou, Paria and Johnson, Mark",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = "jul",
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.346",
pages = "3754--3763"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file disfluency-constituency-parser-0.0.4.tar.gz
.
File metadata
- Download URL: disfluency-constituency-parser-0.0.4.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85b29f131daa4a3bf86aa9a7e933d447bb66d9f492dd1bbe4e58c9a3c98f2a34 |
|
MD5 | c5111f10d288f48b7e0f12bf321725a1 |
|
BLAKE2b-256 | 239f1baabcba0c7da5d995f6cc45d95c2a43ea5b869b3f1100f2e2de5a976931 |
File details
Details for the file disfluency_constituency_parser-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: disfluency_constituency_parser-0.0.4-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1fbfb439f0f3c7b77ca6a8fa18ff0c5a181075b93ef25df4ce6b542055d0c6a5 |
|
MD5 | 8f260648d12106b7048f2c96a7d867e4 |
|
BLAKE2b-256 | 1849fea190367a4836893cc389f7c35d6d17593359c026ab3a79c646813eda34 |
File details
Details for the file disfluency_constituency_parser-0.0.4-py2.py3-none-any.whl
.
File metadata
- Download URL: disfluency_constituency_parser-0.0.4-py2.py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86f06f223c49cbec5f26026b335b72da17a447c8de899ebd4f1733df54989d59 |
|
MD5 | 6c5da43134896d063c93f1a9d78ee9e1 |
|
BLAKE2b-256 | 3b9c079dc8ca2809269b2dc8ac59cfa78a782c9e4032ecaf91c147b9a7eeb313 |