Skip to main content

Another syntactic complexity analyzer of written English language samples

Project description

NeoSCA

support-version pypi platform license

Another syntactic complexity analyzer of written English language samples.

NeoSCA is a rewrite of Xiaofei Lu's L2 Syntactic Complexity Analyzer, supporting Windows, macOS, and Linux. The same as L2SCA, NeoSCA takes written English language samples in plain text format as input, and computes:

the frequency of 9 structures in the text:
  1. words (W)
  2. sentences (S)
  3. verb phrases (VP)
  4. clauses (C)
  5. T-units (T)
  6. dependent clauses (DC)
  7. complex T-units (CT)
  8. coordinate phrases (CP)
  9. complex nominals (CN), and
14 syntactic complexity indices of the text:
  1. mean length of sentence (MLS)
  2. mean length of T-unit (MLT)
  3. mean length of clause (MLC)
  4. clauses per sentence (C/S)
  5. verb phrases per T-unit (VP/T)
  6. clauses per T-unit (C/T)
  7. dependent clauses per clause (DC/C)
  8. dependent clauses per T-unit (DC/T)
  9. T-units per sentence (T/S)
  10. complex T-unit ratio (CT/T)
  11. coordinate phrases per T-unit (CP/T)
  12. coordinate phrases per clause (CP/C)
  13. complex nominals per T-unit (CN/T)
  14. complex nominals per clause (CP/C)

Contents


NeoSCA vs. L2SCA Top ▲

L2SCA NeoSCA
runs on macOS and Linux runs on Windows, macOS, and Linux
single and multiple input are handled respectively by two commands one command, nsca, for both cases, making your life easier
runs only under its own home directory runs under any directory
outputs only frequencies of the "9+14" syntactic structures add options to reserve intermediate results, i.e., Stanford Parser's parsing results and Tregex's querying results

Installation Top ▲

  1. Install neosca
pip install neosca

For users inside of China:

pip install neosca -i https://pypi.tuna.tsinghua.edu.cn/simple
  1. Install Java 8 or later

  2. Download and unzip latest versions of Stanford Parser and Stanford Tregex

4. Set environment variables `STANFORD_PARSER_HOME` and `STANFORD_TREGEX_HOME`
  • Windows:

In the Environment Variables window (press Windows+s, type env, and press Enter):

STANFORD_PARSER_HOME=\path\to\stanford-parser-full-2020-11-17
STANFORD_TREGEX_HOME=\path\to\stanford-tregex-2020-11-17
  • Linux/macOS:
export STANFORD_PARSER_HOME=/path/to/stanford-parser-full-2020-11-17
export STANFORD_TREGEX_HOME=/path/to/stanford-tregex-2020-11-17

Usage Top ▲

The NeoSCA runs via the command nsca.

  1. Single input:
nsca ./samples/sample1.txt 
# frequency output: ./result.csv
nsca ./samples/sample1.txt -o sample1.csv 
# frequency output: ./sample1.csv
  1. Multiple input:
nsca ./samples/sample1.txt ./samples/sample2.txt
nsca ./samples/sample*.txt 
# wildcard characters are supported
nsca ./samples/sample[1-1000].txt
  1. Use --text to pass text through command line.
nsca --text 'This is a test.'
# frequency output: ./result.csv
  1. Use -p/--reserve-parsed to reserve parsed trees of Stanford Parser. Use -m/--reserve-matched to reserve matched subtrees of Stanford Tregex.
nsca samples/sample1.txt -p -m
# frequency output: ./result.csv
# parsed trees: ./samples/sample1.parsed
# matched subtrees: ./result_matches/
5. Use `--list` to print output fields.
nsca --list
W: words
S: sentences
VP: verb phrases
C: clauses
T: T-units
DC: dependent clauses
CT: complex T-units
CP: coordinate phrases
CN: complex nominals
MLS: mean length of sentence
MLT: mean length of T-unit
MLC: mean length of clause
C/S: clauses per sentence
VP/T: verb phrases per T-unit
C/T: clauses per T-unit
DC/C: dependent clauses per clause
DC/T: dependent clauses per T-unit
T/S: T-units per sentence
CT/T: complex T-unit ratio
CP/T: coordinate phrases per T-unit
CP/C: coordinate phrases per clause
CN/T: complex nominals per T-unit
CN/C: complex nominals per clause
  1. Use --no-query to just save parsed trees and exit.
nsca samples/sample1.txt --no-query
# parsed trees: samples/sample1.parsed
nsca --text 'This is a test.' --no-query
# parsed trees: ./cmdline_text.parsed
  1. Calling nsca without any arguments returns the help message.

Citing Top ▲

Please use the following citation if you use NeoSCA in your work:

@misc{tan2022neosca,
author = {Tan, Long},
title = {NeoSCA (version 0.0.28)},
howpublished = {\url{https://github.com/tanloong/neosca}},
year = {2022}
}

Also, you need to cite Lu's article describing L2SCA:

@article{lu2010automatic,
title={Automatic analysis of syntactic complexity in second language writing},
author={Lu, Xiaofei},
journal={International journal of corpus linguistics},
volume={15},
number={4},
pages={474--496},
year={2010},
publisher={John Benjamins}
}

License Top ▲

The same as L2SCA, NeoSCA is licensed under the GNU General Public License, version 2 or later.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neosca-0.0.28.tar.gz (19.8 kB view details)

Uploaded Source

Built Distributions

neosca-0.0.28-py3.10.egg (24.8 kB view details)

Uploaded Source

neosca-0.0.28-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file neosca-0.0.28.tar.gz.

File metadata

  • Download URL: neosca-0.0.28.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for neosca-0.0.28.tar.gz
Algorithm Hash digest
SHA256 337255ef2b481cdaba26927fb4fbc9273acd9c506db1ffe66c05b42761fc3d19
MD5 244451a4a0cec48ef1daa2ac2419cf76
BLAKE2b-256 0565487dcf377e06d819d7cf422650b09cd7ab40a5071ded315c7eb2b6eba4a7

See more details on using hashes here.

File details

Details for the file neosca-0.0.28-py3.10.egg.

File metadata

  • Download URL: neosca-0.0.28-py3.10.egg
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for neosca-0.0.28-py3.10.egg
Algorithm Hash digest
SHA256 68fc04108a94218f2b827e2be5412e2c5bc6bdb27ca9e99332915166e1b146cf
MD5 edfa9c1366f5da584c7c574f7828c7d9
BLAKE2b-256 7d9f7758c52886c24e042f8c0a63bcd43b4cf183ec1a17ef1313304e519eb1ab

See more details on using hashes here.

File details

Details for the file neosca-0.0.28-py3-none-any.whl.

File metadata

  • Download URL: neosca-0.0.28-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for neosca-0.0.28-py3-none-any.whl
Algorithm Hash digest
SHA256 89d56685dc6c071ddb72a32f7f32e0cd03dc26a27931ae262b542a1c99191091
MD5 b1c453a19dd8e88beff434fcab9c3f94
BLAKE2b-256 e478627ac4067f6b71ea6a483737ea52cd505fabff3ca39b28d00c509f38ce39

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page