Skip to main content

Another syntactic complexity analyzer of written English language samples

Project description

NeoSCA

support-version pypi platform license

Another syntactic complexity analyzer of written English language samples.

NeoSCA is a rewrite of Xiaofei Lu's L2 Syntactic Complexity Analyzer, supporting Windows, macOS, and Linux. The same as L2SCA, NeoSCA takes written English language samples in plain text format as input, and computes:

the frequency of 9 structures in the text:
  1. words (W)
  2. sentences (S)
  3. verb phrases (VP)
  4. clauses (C)
  5. T-units (T)
  6. dependent clauses (DC)
  7. complex T-units (CT)
  8. coordinate phrases (CP)
  9. complex nominals (CN), and
14 syntactic complexity indices of the text:
  1. mean length of sentence (MLS)
  2. mean length of T-unit (MLT)
  3. mean length of clause (MLC)
  4. clauses per sentence (C/S)
  5. verb phrases per T-unit (VP/T)
  6. clauses per T-unit (C/T)
  7. dependent clauses per clause (DC/C)
  8. dependent clauses per T-unit (DC/T)
  9. T-units per sentence (T/S)
  10. complex T-unit ratio (CT/T)
  11. coordinate phrases per T-unit (CP/T)
  12. coordinate phrases per clause (CP/C)
  13. complex nominals per T-unit (CN/T)
  14. complex nominals per clause (CP/C)

Contents

  • NeoSCA vs. L2SCA Top ▲
  • Installation Top ▲
  • Usage Top ▲
  • Citing Top ▲
  • License Top ▲

NeoSCA vs. L2SCA Top ▲

L2SCA NeoSCA
runs on macOS and Linux runs on Windows, macOS, and Linux
single and multiple input are handled respectively by two commands one command, nsca, for both cases, making your life easier
runs only under its own home directory runs under any directory
outputs only frequencies of the "9+14" syntactic structures add options to reserve intermediate results, i.e., Stanford Parser's parsing results and Tregex's querying results

Installation Top ▲

  1. Install neosca
pip install neosca

For uers inside of China:

pip install neosca -i https://pypi.tuna.tsinghua.edu.cn/simple
  1. Install Java 8 or later

  2. Download and unzip latest versions of Stanford Parser and Stanford Tregex

4. Set `STANFORD_PARSER_HOME` and `STANFORD_TREGEX_HOME`
  • Windows:

In the Environment Variables window (press Windows+s, type env, and press Enter):

STANFORD_PARSER_HOME=\path\to\stanford-parser-full-2020-11-17
STANFORD_TREGEX_HOME=\path\to\stanford-tregex-2020-11-17
  • Linux/macOS:
export STANFORD_PARSER_HOME=/path/to/stanford-parser-full-2020-11-17
export STANFORD_TREGEX_HOME=/path/to/stanford-tregex-2020-11-17

Usage Top ▲

The NeoSCA runs via the command nsca.

  1. Single input:
nsca sample1.txt 
# output will be saved in result.csv
nsca sample1.txt -o sample1.csv 
# custom output file
  1. Multiple input:
nsca sample1.txt sample2.txt
nsca sample*.txt 
# wildcard characters are supported
nsca sample[1-10].txt
  1. Use -p/--reserve-parsed to reserve parsed files of Stanford Parser. Use -m/--reserve-match to reserve match results of Stanford Tregex.
nsca sample1.txt -p -m
  1. Calling nsca without any arguments returns the help message.

Citing Top ▲

Please use the following citation if you use NeoSCA in your work:

@misc{tan2022neosca,
author = {Tan, Long},
title = {NeoSCA},
howpublished = {\url{https://github.com/tanloong/neosca}},
year = {2022}
}

Also, you need to cite Lu's article describing L2SCA:

@article{lu2010automatic,
title={Automatic analysis of syntactic complexity in second language writing},
author={Lu, Xiaofei},
journal={International journal of corpus linguistics},
volume={15},
number={4},
pages={474--496},
year={2010},
publisher={John Benjamins}
}

License Top ▲

The same as L2SCA, NeoSCA is licensed under the GNU General Public License, version 2 or later.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neosca-0.0.24.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

neosca-0.0.24-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file neosca-0.0.24.tar.gz.

File metadata

  • Download URL: neosca-0.0.24.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for neosca-0.0.24.tar.gz
Algorithm Hash digest
SHA256 3b09493e10bdc62aaa92129e01d08f8124ced8e89eeefe3b92e87f4f91a47c52
MD5 15e1494e8ef1e140331ad1d7fffe8f5e
BLAKE2b-256 b95e70d9e4c58d52895cfd60b4e683f73cfec73e3d8dd5ddb03053a129d68b5a

See more details on using hashes here.

File details

Details for the file neosca-0.0.24-py3-none-any.whl.

File metadata

  • Download URL: neosca-0.0.24-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.8

File hashes

Hashes for neosca-0.0.24-py3-none-any.whl
Algorithm Hash digest
SHA256 e2988714e645bc434f69b20d0c9d81e750ce731143aabde98ff1aa2876fde92c
MD5 05e0b712a449ec86d9965396871cdbf2
BLAKE2b-256 876583ed54ce093a622569dfa0642edd52bd522808729d8e76e29797ca7f805c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page