Skip to main content

Another syntactic complexity analyzer of written English language samples

Project description

NeoSCA

support-version pypi platform license

Another syntactic complexity analyzer of written English language samples.

NeoSCA is a rewrite of Xiaofei Lu's L2 Syntactic Complexity Analyzer, supporting Windows, macOS, and Linux. The same as L2SCA, NeoSCA takes written English language samples in plain text format as input, and computes:

the frequency of 9 structures in the text:
  1. words (W)
  2. sentences (S)
  3. verb phrases (VP)
  4. clauses (C)
  5. T-units (T)
  6. dependent clauses (DC)
  7. complex T-units (CT)
  8. coordinate phrases (CP)
  9. complex nominals (CN), and
14 syntactic complexity indices of the text:
  1. mean length of sentence (MLS)
  2. mean length of T-unit (MLT)
  3. mean length of clause (MLC)
  4. clauses per sentence (C/S)
  5. verb phrases per T-unit (VP/T)
  6. clauses per T-unit (C/T)
  7. dependent clauses per clause (DC/C)
  8. dependent clauses per T-unit (DC/T)
  9. T-units per sentence (T/S)
  10. complex T-unit ratio (CT/T)
  11. coordinate phrases per T-unit (CP/T)
  12. coordinate phrases per clause (CP/C)
  13. complex nominals per T-unit (CN/T)
  14. complex nominals per clause (CP/C)

Contents

NeoSCA vs. L2SCA Top ▲

L2SCA NeoSCA
runs on macOS and Linux runs on Windows, macOS, and Linux
single and multiple input are handled respectively by two commands one command, nsca, for both cases, making your life easier
runs only under its own home directory runs under any directory
outputs only frequencies of the "9+14" syntactic structures add options to reserve intermediate results, such as the results of parsing the text with Stanford Parser and matching patterns with Stanford Tregex

Installation Top ▲

  1. Install neosca

To install NeoSCA, you need to have Python 3.7 or later installed on your system. You can check if you have Python installed by running the following command in your terminal:

python --version

If Python is not installed, you can download and install it from Python website. Once you have Python installed, you can install NeoSCA using pip:

pip install neosca

For users inside of China:

pip install neosca -i https://pypi.tuna.tsinghua.edu.cn/simple
  1. Install Java 8 or later

  2. Download and unzip latest versions of Stanford Parser and Stanford Tregex

4. Set environment variables
  • Windows:

In the Environment Variables window (press Windows+s, type env, and press Enter):

STANFORD_PARSER_HOME=\path\to\stanford-parser-full-2020-11-17
STANFORD_TREGEX_HOME=\path\to\stanford-tregex-2020-11-17
  • Linux/macOS:
export STANFORD_PARSER_HOME=/path/to/stanford-parser-full-2020-11-17
export STANFORD_TREGEX_HOME=/path/to/stanford-tregex-2020-11-17

Usage Top ▲

To use NeoSCA, run the nsca command in your terminal, followed by the options and arguments you want to use.

  1. Single input:
nsca ./samples/sample1.txt 
# frequency output: ./result.csv
nsca ./samples/sample1.txt -o sample1.csv 
# frequency output: ./sample1.csv
  1. Multiple input:
nsca ./samples/sample1.txt ./samples/sample2.txt
nsca ./samples/sample*.txt 
# wildcard characters are supported
nsca ./samples/sample[1-1000].txt
  1. Use --text to pass text through command line.
nsca --text 'The quick brown fox jumps over the lazy dog.'
# frequency output: ./result.csv
  1. Use -p/--reserve-parsed to reserve parsed trees of Stanford Parser. Use -m/--reserve-matched to reserve matched subtrees of Stanford Tregex.
nsca samples/sample1.txt -p -m
# frequency output: ./result.csv
# parsed trees: ./samples/sample1.parsed
# matched subtrees: ./result_matches/
5. Use `--list` to print output fields.
nsca --list
W: words
S: sentences
VP: verb phrases
C: clauses
T: T-units
DC: dependent clauses
CT: complex T-units
CP: coordinate phrases
CN: complex nominals
MLS: mean length of sentence
MLT: mean length of T-unit
MLC: mean length of clause
C/S: clauses per sentence
VP/T: verb phrases per T-unit
C/T: clauses per T-unit
DC/C: dependent clauses per clause
DC/T: dependent clauses per T-unit
T/S: T-units per sentence
CT/T: complex T-unit ratio
CP/T: coordinate phrases per T-unit
CP/C: coordinate phrases per clause
CN/T: complex nominals per T-unit
CN/C: complex nominals per clause
  1. Use --no-query to just save parsed trees and exit.
nsca samples/sample1.txt --no-query
# parsed trees: samples/sample1.parsed
nsca --text 'This is a test.' --no-query
# parsed trees: ./cmdline_text.parsed
  1. Calling nsca without any arguments returns the help message.

Citing Top ▲

If you use NeoSCA in your research, please cite it using the following BibTeX entry:

@misc{tan2022neosca,
author = {Tan, Long},
title = {NeoSCA (version 0.0.32)},
howpublished = {\url{https://github.com/tanloong/neosca}},
year = {2022}
}

Also, you need to cite Lu's article describing L2SCA:

@article{lu2010automatic,
title={Automatic analysis of syntactic complexity in second language writing},
author={Lu, Xiaofei},
journal={International journal of corpus linguistics},
volume={15},
number={4},
pages={474--496},
year={2010},
publisher={John Benjamins}
}

License Top ▲

NeoSCA is licensed under the GNU General Public License version 2 or later.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neosca-0.0.32.tar.gz (21.2 kB view details)

Uploaded Source

Built Distributions

neosca-0.0.32-py3.10.egg (27.8 kB view details)

Uploaded Source

neosca-0.0.32-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file neosca-0.0.32.tar.gz.

File metadata

  • Download URL: neosca-0.0.32.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for neosca-0.0.32.tar.gz
Algorithm Hash digest
SHA256 5ac2d9aca674d4fae72837c1d378ad9f883149d86c89f4fd18d82d7783dbb719
MD5 de7330eafca7a445d5d987966768f5e4
BLAKE2b-256 fb0d52218ea0a64b5c64cf44e039bc101eec621d82018f483dbd46dc321c5558

See more details on using hashes here.

File details

Details for the file neosca-0.0.32-py3.10.egg.

File metadata

  • Download URL: neosca-0.0.32-py3.10.egg
  • Upload date:
  • Size: 27.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for neosca-0.0.32-py3.10.egg
Algorithm Hash digest
SHA256 1b75b5477ddcfab6bd3cdb3b3710b3cf193704dc4bd900ff1500b9b395a4093a
MD5 d90270efef4e21088468e5da6368e2aa
BLAKE2b-256 d79b8ceaa6428e5c6fa3cc1d73c8d76a14222d0f5f296ab33e08d00701859f48

See more details on using hashes here.

File details

Details for the file neosca-0.0.32-py3-none-any.whl.

File metadata

  • Download URL: neosca-0.0.32-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for neosca-0.0.32-py3-none-any.whl
Algorithm Hash digest
SHA256 1e88652f9638aab9750dca993b69fb4fd02cd55c04f64ef5b300fdb88deb8f32
MD5 88338a0d86c2471e7738cab0033c9b2c
BLAKE2b-256 654a2d6e3d3b19434558d459f307f588607b7c9f78ac11db4f212f06e352c770

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page