Skip to main content

Another syntactic complexity analyzer of written English language samples

Project description

NeoSCA

build lint codecov codacy pypi commit support-version platform downloads license

NeoSCA is a rewrite of L2 Syntactic Complexity Analyzer (L2SCA), developed by Xiaofei Lu, with added support for Windows and an improved command-line interface for easier usage. The same as L2SCA, NeoSCA takes written English language samples in plain text format as input, and computes:

the frequency of 9 structures in the text:
  1. words (W)
  2. sentences (S)
  3. verb phrases (VP)
  4. clauses (C)
  5. T-units (T)
  6. dependent clauses (DC)
  7. complex T-units (CT)
  8. coordinate phrases (CP)
  9. complex nominals (CN), and
14 syntactic complexity indices of the text:
  1. mean length of sentence (MLS)
  2. mean length of T-unit (MLT)
  3. mean length of clause (MLC)
  4. clauses per sentence (C/S)
  5. verb phrases per T-unit (VP/T)
  6. clauses per T-unit (C/T)
  7. dependent clauses per clause (DC/C)
  8. dependent clauses per T-unit (DC/T)
  9. T-units per sentence (T/S)
  10. complex T-unit ratio (CT/T)
  11. coordinate phrases per T-unit (CP/T)
  12. coordinate phrases per clause (CP/C)
  13. complex nominals per T-unit (CN/T)
  14. complex nominals per clause (CP/C)

Contents

Highlights Top ▲

  • Works on Windows/macOS/Linux
  • Reserves intermediate results, i.e., parsed trees of Stanford Parser and matched subtrees of Stanford Tregex
  • An improved command-line interface

Install Top ▲

Install NeoSCA Top ▲

To install NeoSCA, you need to have Python 3.7 or later installed on your system. You can check if you already have Python installed by running the following command in your terminal:

python --version

If Python is not installed, you can download and install it from Python website. Once you have Python installed, you can install NeoSCA using pip:

pip install neosca

If you are in China and having trouble with slow download speeds or network issues, you can use the Tsinghua University PyPI mirror to install NeoSCA:

pip install neosca -i https://pypi.tuna.tsinghua.edu.cn/simple

Install Dependents Top ▲

NeoSCA depends on Java, Stanford Parser, and Stanford Tregex. After you have NeoSCA installed, you can use nsca --check-depends to install them.

Basic Usage Top ▲

To use NeoSCA, run the nsca command in your terminal, followed by the options and arguments you want to use.

Single Input Top ▲

To analyze a single text file, use the command nsca followed by the file path.

nsca ./samples/sample1.txt
# frequency output: ./result.csv

A result.csv file will be generated in the current directory. You can specify a different output filename using -o.

nsca ./samples/sample1.txt -o sample1.csv
# frequency output: ./sample1.csv
When analyzing a text file with a filename that includes spaces, it is important to enclose the file path in double quotes. Assume you have a sample 1.txt to analyze:
nsca "./samples/sample 1.txt"

This ensures that the entire filename including the spaces, is interpreted as a single argument. Without the double quotes, the command would interpret "sample" and "1.txt" as two separate arguments and the analysis would fail.

Multiple Input Top ▲

To analyze multiple text files at once, simply list them after the nsca command.

nsca ./samples/sample1.txt ./samples/sample2.txt

You can also use wildcards to select multiple files at once.

nsca ./samples/sample*.txt 
nsca ./samples/sample[1-100].txt

Advanced Usage Top ▲

Output Frequencies in Json Format Top ▲

You can generate a json file by:

nsca ./samples/sample1.txt --output-format json
# frequency output: ./result.json

Or

nsca ./samples/sample1.txt -o sample1.json
# frequency output: ./sample1.json

Pass Text Through the Command Line Top ▲

If you want to analyze text that is passed directly through the command line, you can use --text followed by the text.

nsca --text 'The quick brown fox jumps over the lazy dog.'
# frequency output: ./result.csv

Reserve Intermediate Results Top ▲

To reserve the parsed trees, use -p or --reserve-parsed. To reserve matched subtrees, use -m or --reserve-matched.
nsca samples/sample1.txt -p
# frequency output: ./result.csv
# parsed trees: ./samples/sample1.parsed
nsca samples/sample1.txt -m
# frequency output: ./result.csv
# matched subtrees: ./result_matches/
nsca samples/sample1.txt -p -m
# frequency output: ./result.csv
# parsed trees: ./samples/sample1.parsed
# matched subtrees: ./result_matches/

Just Parse Texts and Exit Top ▲

If you only want to save the parsed trees and exit, you can use --no-query. This can be useful if you want to use the parsed trees for other purposes.

nsca samples/sample1.txt --no-query
# parsed trees: samples/sample1.parsed
nsca --text 'This is a test.' --no-query
# parsed trees: ./cmdline_text.parsed

List Output Fields Top ▲

If you are not sure what the output fields represent, you can use --list to print a list of all the available output fields.

nsca --list
W: words
S: sentences
VP: verb phrases
C: clauses
T: T-units
DC: dependent clauses
CT: complex T-units
CP: coordinate phrases
CN: complex nominals
MLS: mean length of sentence
MLT: mean length of T-unit
MLC: mean length of clause
C/S: clauses per sentence
VP/T: verb phrases per T-unit
C/T: clauses per T-unit
DC/C: dependent clauses per clause
DC/T: dependent clauses per T-unit
T/S: T-units per sentence
CT/T: complex T-unit ratio
CP/T: coordinate phrases per T-unit
CP/C: coordinate phrases per clause
CN/T: complex nominals per T-unit
CN/C: complex nominals per clause

Print the Help Message Top ▲

If you call the nsca command without any arguments or options, it will return a help message.

Citing Top ▲

If you use NeoSCA in your research, please cite as follows.

BibTeX:
@misc{tan2022neosca,
title        = {NeoSCA: A Rewrite of L2 Syntactic Complexity Analyzer, version 0.0.35},
author       = {Long Tan},
howpublished = {\url{https://github.com/tanloong/neosca}},
year         = {2022}
}
APA (7th edition):
Tan, L. (2022). NeoSCA: A Rewrite of L2 Syntactic Complexity Analyzer (version 0.0.35) [Software]. Github. https://github.com/tanloong/neosca
MLA (9th edition):
Tan, Long. NeoSCA: A Rewrite of L2 Syntactic Complexity Analyzer. version 0.0.35, GitHub, 2022, https://github.com/tanloong/neosca.

Also, you need to cite Xiaofei's article describing L2SCA.

BibTeX:
@article{lu2010automatic,
title     = {Automatic analysis of syntactic complexity in second language writing},
author    = {Xiaofei Lu},
journal   = {International journal of corpus linguistics},
volume    = {15},
number    = {4},
pages     = {474--496},
year      = {2010},
publisher = {John Benjamins Publishing Company},
doi       = {10.1075/ijcl.15.4.02lu},
}
APA (7th edition):
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474-496.
MLA (9th edition):
Lu, Xiaofei. "Automatic Analysis of Syntactic Complexity in Second Language Writing." International Journal of Corpus Linguistics, vol. 15, no. 4, John Benjamins Publishing Company, 2010, pp. 474-96.

License Top ▲

NeoSCA is licensed under the GNU General Public License version 2 or later.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neosca-0.0.35.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

neosca-0.0.35-py3-none-any.whl (28.0 kB view details)

Uploaded Python 3

File details

Details for the file neosca-0.0.35.tar.gz.

File metadata

  • Download URL: neosca-0.0.35.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for neosca-0.0.35.tar.gz
Algorithm Hash digest
SHA256 9f6d43bd7446128e3cdc8e812b4e1184415a4e240729dd4c4c980dfe584c31cf
MD5 ec7c079e88b084f7f5b595dc0a2533ad
BLAKE2b-256 5d939ab7dfa345898407e4fb04cf202035e5d003b946528d023b357ae1bf4e58

See more details on using hashes here.

File details

Details for the file neosca-0.0.35-py3-none-any.whl.

File metadata

  • Download URL: neosca-0.0.35-py3-none-any.whl
  • Upload date:
  • Size: 28.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for neosca-0.0.35-py3-none-any.whl
Algorithm Hash digest
SHA256 6bb5b67284e8e75bd35e897d4abad20f557c5bc54f069ad36be7563b80346639
MD5 2910362a349639e46690289dff7e46ba
BLAKE2b-256 0a8242a48cb7b52e9216364c0a15ce81b65012aa268d9d5292eb2c0ba0d07e29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page