Skip to main content

Natural language processing package based on modern deep learning methods

Project description

DeepNLP

This is a new natural language processing library based on modern deep learning methods. The library focus on basic NLP tasks such as: POS (part of speech), NER (named entity recognition) and DP (dependency parsing). The main language is English but we are working hard to support Vietnamese and others in the near future.

Installation 🔥

  • This repository is tested on python 3.7+ and Tensorflow 2.8+
  • Deepnlp can be installed using pip as follows:
pip install deepnlp-cerelab 
  • Deepnlp can also be installed from source with the following commands:
git clone https://github.com/hieupth/deepnlp.git
cd deepnlp/
pip install -e .

Tutorials 🥮

1. Sentence Segmentation

Usage

>>> import deepnlp 
>>> text = """\
Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't.
"""
>>> deepnlp.sentence_tokenize(text)
['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it.',
 'Did he mind?',
 "Adam Jones Jr. thinks he didn't.",
 "In any case, this isn't true...",
 "Well, with a probability of .9 it isn't.",
 '']

2. Word Tokenize

Usage

>>> import deepnlp 
>>> text = "I have an apple."
>>> deepnlp.word_tokenize(text)
['I', 'have', 'an', 'apple', '.']

3. Install and load pretrained model and vocabs

  • Install pretrained model and vocabs
>>> import deepnlp
>>> deepnlp.download('deepnlp_eng')
  • Or you can also install pretrained model and vocabs independently of each other
>>> import deepnlp 
>>> deepnlp.download_model('deepnlp_eng')
>>> deepnlp.download_vocabs('deepnlp_eng')
  • Load models and vocabs
>>> import deepnlp 
>>> model = deepnlp.load_model('deepnlp_eng')
>>> vocabs= deepnlp.load_vocabs('deepnlp_eng', task= 'multi') # pos, ner, dp

4. POS Tagging

  • With PosTagger class
>>> import deepnlp
>>> model= deepnlp.PosTagger('deepnlp_eng')
>>> model 
model_name: deepnlp_eng, vocab_name: deepnlp_eng, tokenizer_name: distilroberta-base
>>> output= model.inference('I have an apple.', device= 'cpu') # default device = 'cpu'
>>> output
<deepnlp.utils.data_struct.TokenClassificationData at 0x7fbc3ddbab90>
>>> output.value()
{'Sequence': 'I have an apple.',
 'Inference': {'I': {'score': 0.9175689, 'label': 'PRP'},
  'have': {'score': 0.9232193, 'label': 'VBP'},
  'an': {'score': 0.9158458, 'label': 'DT'},
  'apple': {'score': 0.86957675, 'label': 'NN'},
  '.': {'score': 0.8892631, 'label': '.'}}}
>>> deepnlp.print_out([output])
I have an apple.
1	I	PRP
2	have	VBP
3	an	DT
4	apple	NN
5	.       .
  • With pipeline class
>>> import deepnlp 
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'pos_tagger')
>>> output= pipeline("I have an apple.", device= 'cpu') # default device = 'cpu'
>>> deepnlp.print_out([output])
I have an apple.
1	I	PRP
2	have	VBP
3	an	DT
4	apple	NN
5	.       .

5. Named Entity Recognition

With NerTagger class

>>> import deepnlp
>>> model = deepnlp.NerTagger('deepnlp_eng')
>>> output= model.inference('Please confirm your song choice: Same Old War, playing on the kitchen speaker', device= 'cpu') # default device = 'cpu'
output
<deepnlp.utils.data_struct.TokenClassificationData at 0x7f69d9504750>
>>> output.value()
{'Sequence': 'Please confirm your song choice: Same Old War, playing on the kitchen speaker',
 'Inference': {'Same': {'score': 0.922773, 'label': 'B-MISC'},
  'Old': {'score': 0.9353856, 'label': 'I-MISC'},
  'War': {'score': 0.92017937, 'label': 'I-MISC'}}}
>>> deepnlp.print_out([output], del_prefix_ner= False) # if you set del_prefix_ner= True, B-MISC or I-MISC will become MISC 
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1	Please	    O
2	confirm	    O
3	your	    O
4	song        O
5	choice 	    O
6	Same	    B-MISC
7	Old	    I-MISC
8	War	    I-MISC
9	,	    O
10	playing	    O
11	on	    O
12	the	    O
13	kitchen	    O
14	speaker	    O

With pipeline class

>>> import deepnlp
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'ner_tagger')
>>> output= pipeline("Please confirm your song choice: Same Old War, playing on the kitchen speaker") 
>>> deepnlp.print_out([output], del_prefix_ner= True, device= 'cpu') # default device = 'cpu'
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1	Please	    O
2	confirm	    O
3	your	    O
4	song        O
5	choice 	    O
6	Same	    MISC
7	Old	    MISC
8	War	    MISC
9	,	    O
10	playing	    O
11	on	    O
12	the	    O
13	kitchen	    O
14	speaker	    O

6. Dependency Parsing

With DPParser class

>>> import deepnlp
>>> model = deepnlp.DPParser('deepnlp_eng')
>>> output= model.inference("I have an apple.", device= 'cpu') # default device = 'cpu'
>>> output 
<deepnlp.utils.data_struct.ParserData at 0x7f69da3125d0>
>>> output.value()
{'Sequence': 'I have an apple.',
 'Inference': {'xpos': ['PRP', 'VBP', 'DT', 'NN', '.'],
  'head': [2, 0, 4, 2, 2],
  'rela': ['nsubj', 'root', 'det', 'obj', 'punct']}}
>>> deepnlp.print_out([output])
I have an apple.
1	I	    PRP	  2	  nsubj
2	have	    VBP	  0	  root
3	an	    DT	  4	  det
4	apple	    NN	  2	  obj
5	.	    .	  2	  punct

With pipeline class

>>> import deepnlp
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'dp_parser')
>>> output= pipeline("I have an apple.", device= 'cpu') # default device = 'cpu'
>>> deepnlp.print_out([output])
I have an apple.
1	I	    PRP	  2	  nsubj
2	have	    VBP	  0	  root
3	an	    DT	  4	  det
4	apple	    NN	  2	  obj
5	.	    .	  2	  punct

7. Multi Task

With MultiTask

>>> import deepnlp
>>> model = deepnlp.MultiTask('deepnlp_eng')
>>> output= model.inference("Please confirm your song choice: Same Old War, playing on the kitchen speaker", device= 'cpu') # default device = 'cpu'
>>> output 
<deepnlp.utils.data_struct.MultiData at 0x7f69da8f7650>
>>> deepnlp.print_out([output])
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1	Please	  UH	O	2	discourse
2	confirm	  VB	O	0	root
3	your	  PRP$	O 	5	nmod:poss
4	song	  NN	O	5	compound
5	choice	  NN	O 	2	obj
6	Same	  JJ	MISC	8	amod
7	Old	  NNP	MISC	8	compound
8	War	  NNP	MISC	2	obj
9	,	  ,	O	2	punct
10	playing	  VBG	O	2	advcl
11	on	  IN	O	14	case
12	the	  DT	O	14	det
13	kitchen	  NN	O	14	compound
14	speaker   NN	O	10	obl

With pipeline

>>> import deepnlp 
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'multi')
>>> output= pipeline("Please confirm your song choice: Same Old War, playing on the kitchen speaker", device= 'cpu') # default device = 'cpu'
>>> deepnlp.print_out([output])
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1	Please	  UH	O	2	discourse
2	confirm	  VB	O	0	root
3	your	  PRP$	O 	5	nmod:poss
4	song	  NN	O	5	compound
5	choice	  NN	O 	2	obj
6	Same	  JJ	MISC	8	amod
7	Old	  NNP	MISC	8	compound
8	War	  NNP	MISC	2	obj
9	,	  ,	O	2	punct
10	playing	  VBG	O	2	advcl
11	on	  IN	O	14	case
12	the	  DT	O	14	det
13	kitchen	  NN	O	14	compound
14	speaker   NN	O	10	obl

8. Clear Cache

  • Remove pretrained model and vocabs deepnlp_eng
>>> deepnlp.clear_cache('deepnlp_eng')
  • Or
>>> deepnlp.clear_model('deepnlp_eng')
>>> deepnlp.clear_vocabs('deepnlp_eng') 

9. List of pretrained models

License

Apache 2.0 License.
Copyright © 2022 Hieu Pham. All rights reserved.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepnlp-cerelab-1.0.2.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

deepnlp_cerelab-1.0.2-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file deepnlp-cerelab-1.0.2.tar.gz.

File metadata

  • Download URL: deepnlp-cerelab-1.0.2.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.7

File hashes

Hashes for deepnlp-cerelab-1.0.2.tar.gz
Algorithm Hash digest
SHA256 76fe678148f15133feb3023b3adafc2a9644cb8715df068142a1d729114733be
MD5 ab230b5bf97dfe0dee3cf75a52df3184
BLAKE2b-256 35884d6a5ece03f0bd15ac59b27ba4df22a91787a80937f843b5e512584923af

See more details on using hashes here.

File details

Details for the file deepnlp_cerelab-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for deepnlp_cerelab-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d313870ee8933f81f123835b2a744c1bde47ccdf013ca5ba093971b0b616cceb
MD5 9fa973003899b7f32a868872a9e1440a
BLAKE2b-256 7088fd129318c13e6d7173cc9adb60ccddd95360c6ef95ffb2d117a15880a524

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page