a simple utility to take in a sentence and output information about the AWL words in it
Project description
Awlify
A very basic tool that takes in a sentence of text and outputs the same text, annotated with information about whether any of its words are in the Academic Word List.
installing
pip install awlify
and if you haven't used spacy on your system before, you'll need to install the model we're using here with the command below:
python -m spacy download en_core_web_sm
tests
python -m unittest
usage inside a file
from awlify import awlify
result = awlify('please inform me of the academic words in this sentence')
print(result)
{"data": {"sentence": "please inform me of the academic words in this sentence", "awl_words": [{"index": 5, "word": "academic", "meta": {"head": "academy", "sublist": 5}}]}}
usage from the command line
python -m awlify 'this is a sentence to check'
{"data": {"sentence": "this is a sentence to check", "awl_words": []}}
expected input / output
format for output:
{
"data": {
"sentence": "THIS IS THE ORIGINAL SENTENCE",
"awl_words": [
{
"index": INDEX_OF_AWL_WORD_FOUND,
"word": "AWL_WORD_FOUND",
"meta": {
"head": "THE_HEADWORD_FROM_THE_AWL",
"sublist": THE_AWL_SUBLIST_OF_THE_WORD
}
}
]
}
}
example input for a simple sentence (no AWL words):
simple_sentence = awlify('this is a sentence')
example output for a simple sentence (no AWL words):
{
"data": {
"sentence": "this is a sentence",
"awl_words": []
}
}
example input for a complex sentence (a few AWL words):
complex_sentence = awlify('the economic recovery is ongoing and potentially problematic')
example output for a complex sentence (a few AWL words):
{
"data": {
"sentence": "the economic recovery is ongoing and potentially problematic",
"awl_words": [
{
"index": 1,
"word": "economic",
"meta": {
"head": "economy",
"sublist": 1
}
},
{
"index": 2,
"word": "recovery",
"meta": {
"head": "recover",
"sublist": 6
}
},
{
"index": 6,
"word": "potentially",
"meta": {
"head": "potential",
"sublist": 2
}
}
]
}
}
NOTES
The current implementation of the sentence tokenization uses spacy, and so it's a bit heavier than absolutely necessary, since we're not taking advantage of any of the more advanced characteristics of the package.
In theory, it could probably perform 98% as well with just a simple regex, so I might add the option to do that in the future if there aren't any real use cases for needing the full weight of spacy.
REFERENCES
Coxhead, Averil (2000) A New Academic Word List. TESOL Quarterly, 34(2): 213-238.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file awlify-1.1.2.tar.gz
.
File metadata
- Download URL: awlify-1.1.2.tar.gz
- Upload date:
- Size: 19.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e9881c19331da26d57e760923e2340ca096e9e4862546a568c9e545cbc772d9 |
|
MD5 | f750a24caf96b42659770f0b2461ea4a |
|
BLAKE2b-256 | 5bcf148ee90c5282c32f71e0a3dc3b2530998dae20fb826d98e5248f9d47ceb9 |
File details
Details for the file awlify-1.1.2-py3-none-any.whl
.
File metadata
- Download URL: awlify-1.1.2-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.19.1 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.28.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e5b51cefeef52189fa383ce1c8df8d570c6d7e266cd2de0f9dafdd2526ba0cd |
|
MD5 | ba93879ae5a0273618438f283be053eb |
|
BLAKE2b-256 | 4b2f077051d052d673086a372afe368bc03f2478666fb8bcc29866f4ed36ce03 |