asc-analyzer
Project description
ASC-analyzer
The ASC Analyzer extracts Argument Structure Constructions (ASCs) from raw English texts and computes indices related to ASC usage.
Installation
To ensure stability and compatibility, we recommend installing dependencies in the following order:
-
Install
spaCy:pip install spacy
-
Install
spaCy-transformers:pip install spacy-transformers
-
Download the transformer-based spaCy model:
python -m spacy download en_core_web_trf
-
Install the ASC analyzer package:
pip install asc-analyzer
Quickstart
Prepare a directory with .txt files (e.g., data/text/). Each file should contain plain English text.
Then run:
asc-analyzer \
--input-dir data/text \
--source cow \
--print-asc \
--save-asc-output
This command will:
- Assign ASC tags to each sentence
- Print the ASC-tagged results directly to the terminal (
--print-asc) - Save token-level ASC tagging results as
*_ASCinfo.txtfiles (--save-asc-output) - Compute ASC usage statistics (e.g., diversity, proportion, frequency, and verb–ASC association strength) and save them in a CSV summary file
- The
--sourceoption determines which reference corpus is used for computing frequency and association measures:cow: uses the COW corpus (web-based, written English)subt: uses the SUBTLEX corpus (subtitle-based, spoken English)- Choose the source based on the register that best matches your input data.
Options
| Option | Description |
|---|---|
--input-dir |
Directory containing .txt files to process (default: asc_analyzer/data/test) |
--output-csv |
Path to save the resulting CSV (default: Written_COW.csv or Spoken_SubT.csv) |
--source |
Reference dataset: cow (written, default) or subt (spoken) |
--indices |
Comma-separated list of index names to include in the CSV (default: all standard indices) |
--save-asc-output |
Save ASC-tagged outputs as *_ASCinfo.txt in the input directory |
--print-asc |
Print ASC-tagged results to the terminal |
Output for --print-asc
When using the --print-asc option, the output for each sentence shows aligned token information and its ASC label (None if no ASC applies):
# sent_id = 1
1 The the
2 idea idea
3 is be ATTR
4 trust trust
You can save this output to txt files by including --save-asc-output.
Citation
-
If you use the ASC tagger (
--print-asc,--save-asc-output) in your research, please cite:- Sung, H., & Kyle, K. (2024). Leveraging Pre-trained Language Models for Linguistic Analysis: A Case of Argument Structure Constructions. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).
-
The ASC Analyzer is currently in beta testing and will be updated.
License
This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
See the full license here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asc_analyzer-0.0.9.tar.gz.
File metadata
- Download URL: asc_analyzer-0.0.9.tar.gz
- Upload date:
- Size: 12.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b34033f0bba0be94ef47f2a1cfa7a3f8b404c450e6302a7f6f8daa99af8a9bc
|
|
| MD5 |
239875dd5a4c6b141dedc20bb08e35b2
|
|
| BLAKE2b-256 |
029605f91257779f32bea365a3bb8a0f612006e1562a37e1c25fa57a5b3b82fc
|
File details
Details for the file asc_analyzer-0.0.9-py3-none-any.whl.
File metadata
- Download URL: asc_analyzer-0.0.9-py3-none-any.whl
- Upload date:
- Size: 13.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8532af9d567b9218b62cdd93847a14c1f9279f1015ff52a80c5344ce78867639
|
|
| MD5 |
cdb5d8ddc236253b3b6f29b284449381
|
|
| BLAKE2b-256 |
717a4c809b6569eff3d96a63fe9ae4c567235149c1f1d6436a5b4b9313024d71
|