Implementation of syntactic n-grams (sn-gram) extraction
Project description
What is it?
Companion library of machine learning book Feature Engineering & Selection for Explainable Models: A Second Course for Data Scientists
SNgramExtractor module helps extract Syntactic relations (SR tags) as elements of sn-grams.
We follow the path marked by the arrows in the dependencies and obtain sngrams.[1]
The advantage of syntactic n-grams (SN-grams), i.e., n-grams that are constructed using paths in syntactic trees, is that they are less arbitrary than traditional n-grams. Thus, their number is less than the number of traditional n-grams. Besides, they can be interpreted as linguistic phenomenon, while traditional n-grams have no plausible linguistic interpretation they are merely statistical artifact. [1]
SN-gram has usability across many natural language processing application areas, such as classification tasks in machine learning[2], information extraction[3], query understanding[4], machine translation[5], question answering systems[6]
Input parameters
- text input text as a single sentence.
- meta_tag Resultant bigram and trigram should be concatenated with part of speech tag('pos') or dependency tag('dep') or original SN-gram('original')
- trigram_flag if we need to include trigrams derived from SN-grams as well ('yes') or not ('no'). Default is 'yes'
- nlp_model Specify the spacy language model you want to use. Default is spacy English language model en_core_web_sm. This is useful for being able to use languages other than english.
Output
Dictionary object with key value pairs for bigram and trigram derived from SN-gram.
- SNBigram dictionary key for bigram derived from SN-gram
- SNTrigram dictionary key for trigram derived from SN-gram
How to use is it?
from SNgramExtractor import SNgramExtractor
text='Economic news have little effect on financial markets.'
SNgram_obj=SNgramExtractor(text,meta_tag='original',trigram_flag='yes',nlp_model=None)
output=SNgram_obj.get_SNgram()
print(text)
print('SNGram bigram:',output['SNBigram'])
print('SNGram trigram:',output['SNTrigram'])
print('-----------------------------------')
text='every cloud has a silver lining'
SNgram_obj=SNgramExtractor(text,meta_tag='original',trigram_flag='yes',nlp_model=None)
output=SNgram_obj.get_SNgram()
print(text)
print('SNGram bigram:',output['SNBigram'])
print('SNGram trigram:',output['SNTrigram'])
print('-----------------------------------')
nlp_french = spacy.load('fr_core_news_sm')
text='Je voudrais réserver un hôtel à Rennes.'
SNgram_obj=SNgramExtractor(text,meta_tag='original',trigram_flag='yes',nlp_model=nlp_french)
output=SNgram_obj.get_SNgram()
print(text)
print('SNGram bigram:',output['SNBigram'])
print('SNGram trigram:',output['SNTrigram'])
Where to get it?
pip install SNgramExtractor
How to cite?
Md Azimul Haque (2022). Feature Engineering & Selection for Explainable Models: A Second Course for Data Scientists. Lulu Press, Inc.
Dependencies
References
- Syntactic Dependency-Based N-grams as Classification Features by Grigori Sidorov , Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh and Liliana Chanona-Hernández
- Syntactic N-grams as Machine Learning Features for Natural Language Processing by Grigori Sidorov , Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh and Liliana Chanona-Hernández
- Dependency-Based Open Information Extraction by Pablo Gamallo, Marcos Garcia and Santiago Fernandez-Lanza
- Query Understanding Enhanced By Hierarchical Parsing Structures by Jingjing Liu, Panupong Pasupat, Yining Wang, Scott Cyphers, and Jim Glass
- Dependency Structure Trees in Syntax Based Machine Translation by Vamshi Ambati
- Question Answering Passage Retrieval Using Dependency Relations by Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan and Tat-Seng Chua
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file SNgramExtractor-0.0.6.tar.gz
.
File metadata
- Download URL: SNgramExtractor-0.0.6.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.27.1 setuptools/58.3.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 71a9da1043fcd81d304414c5082b5043662621dec896e3fd9a5f86ed69ed419e |
|
MD5 | a8fbbf960cb772314a7bb0ef80bd1366 |
|
BLAKE2b-256 | 497a691e56ff4af9aa2f94b2f2d71bb7d26a876481224903f1b69cddc79cd5ff |
File details
Details for the file SNgramExtractor-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: SNgramExtractor-0.0.6-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.27.1 setuptools/58.3.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b32b8546cc554312793a18fc20c552634c2843117bc8b5fe83a7277bc63487c5 |
|
MD5 | 81c587a4d51bbdcdf0fe472a0e4c8d87 |
|
BLAKE2b-256 | 7291aab4fbb497b8f405898b7f6cfe2dae5b2af9bb20a3a918b622ef80b89b86 |