Skip to main content

Implementation of syntactic n-grams (sn-gram) extraction

Project description

What is it?

Companion library of machine learning book Feature Engineering & Selection for Explainable Models: A Second Course for Data Scientists

SNgramExtractor module helps extract Syntactic relations (SR tags) as elements of sn-grams.

We follow the path marked by the arrows in the dependencies and obtain sngrams.[1]

The advantage of syntactic n-grams (SN-grams), i.e., n-grams that are constructed using paths in syntactic trees, is that they are less arbitrary than traditional n-grams. Thus, their number is less than the number of traditional n-grams. Besides, they can be interpreted as linguistic phenomenon, while traditional n-grams have no plausible linguistic interpretation they are merely statistical artifact. [1]

SN-gram has usability across many natural language processing application areas, such as classification tasks in machine learning[2], information extraction[3], query understanding[4], machine translation[5], question answering systems[6]

Input parameters

  • text input text as a single sentence.
  • meta_tag Resultant bigram and trigram should be concatenated with part of speech tag('pos') or dependency tag('dep') or original SN-gram('original')
  • trigram_flag if we need to include trigrams derived from SN-grams as well ('yes') or not ('no'). Default is 'yes'
  • nlp_model Specify the spacy language model you want to use. Default is spacy English language model en_core_web_sm. This is useful for being able to use languages other than english.

Output

Dictionary object with key value pairs for bigram and trigram derived from SN-gram.

  • SNBigram dictionary key for bigram derived from SN-gram
  • SNTrigram dictionary key for trigram derived from SN-gram

How to use is it?

from SNgramExtractor import SNgramExtractor

text='Economic news have little effect on financial markets.'    
SNgram_obj=SNgramExtractor(text,meta_tag='original',trigram_flag='yes',nlp_model=None)
output=SNgram_obj.get_SNgram()
print(text)
print('SNGram bigram:',output['SNBigram'])
print('SNGram trigram:',output['SNTrigram'])

print('-----------------------------------')
text='every cloud has a silver lining'
SNgram_obj=SNgramExtractor(text,meta_tag='original',trigram_flag='yes',nlp_model=None)
output=SNgram_obj.get_SNgram()
print(text)
print('SNGram bigram:',output['SNBigram'])
print('SNGram trigram:',output['SNTrigram'])

print('-----------------------------------')
nlp_french = spacy.load('fr_core_news_sm')
text='Je voudrais réserver un hôtel à Rennes.'
SNgram_obj=SNgramExtractor(text,meta_tag='original',trigram_flag='yes',nlp_model=nlp_french)
output=SNgram_obj.get_SNgram()    
print(text)
print('SNGram bigram:',output['SNBigram'])
print('SNGram trigram:',output['SNTrigram'])

Where to get it?

pip install SNgramExtractor

How to cite?

Md Azimul Haque (2022). Feature Engineering & Selection for Explainable Models: A Second Course for Data Scientists. Lulu Press, Inc.

Dependencies

References

  1. Syntactic Dependency-Based N-grams as Classification Features by Grigori Sidorov , Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh and Liliana Chanona-Hernández
  2. Syntactic N-grams as Machine Learning Features for Natural Language Processing by Grigori Sidorov , Francisco Velasquez, Efstathios Stamatatos, Alexander Gelbukh and Liliana Chanona-Hernández
  3. Dependency-Based Open Information Extraction by Pablo Gamallo, Marcos Garcia and Santiago Fernandez-Lanza
  4. Query Understanding Enhanced By Hierarchical Parsing Structures by Jingjing Liu, Panupong Pasupat, Yining Wang, Scott Cyphers, and Jim Glass
  5. Dependency Structure Trees in Syntax Based Machine Translation by Vamshi Ambati
  6. Question Answering Passage Retrieval Using Dependency Relations by Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan and Tat-Seng Chua

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SNgramExtractor-0.0.6.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

SNgramExtractor-0.0.6-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file SNgramExtractor-0.0.6.tar.gz.

File metadata

  • Download URL: SNgramExtractor-0.0.6.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.27.1 setuptools/58.3.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for SNgramExtractor-0.0.6.tar.gz
Algorithm Hash digest
SHA256 71a9da1043fcd81d304414c5082b5043662621dec896e3fd9a5f86ed69ed419e
MD5 a8fbbf960cb772314a7bb0ef80bd1366
BLAKE2b-256 497a691e56ff4af9aa2f94b2f2d71bb7d26a876481224903f1b69cddc79cd5ff

See more details on using hashes here.

File details

Details for the file SNgramExtractor-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: SNgramExtractor-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.27.1 setuptools/58.3.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.9

File hashes

Hashes for SNgramExtractor-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b32b8546cc554312793a18fc20c552634c2843117bc8b5fe83a7277bc63487c5
MD5 81c587a4d51bbdcdf0fe472a0e4c8d87
BLAKE2b-256 7291aab4fbb497b8f405898b7f6cfe2dae5b2af9bb20a3a918b622ef80b89b86

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page