Natural language processing package based on modern deep learning methods
Project description
DeepNLP
This is a new natural language processing library based on modern deep learning methods. The library focus on basic NLP tasks such as: POS (part of speech), NER (named entity recognition) and DP (dependency parsing). The main language is English but we are working hard to support Vietnamese and others in the near future.
Installation 🔥
- This repository is tested on python 3.7+ and Tensorflow 2.8+
- Deepnlp can be installed using pip as follows:
pip install deepnlp-cerelab
- Deepnlp can also be installed from source with the following commands:
git clone https://github.com/hieupth/deepnlp.git
cd deepnlp/
pip install -e .
Tutorials 🥮
- 1. Sentence Segmentation
- 2. Word Tokenizer
- 3. Install and load pretrained model and vocabs
- 4. POS Tagging
- 5. Named Entity Recognition
- 6. Dependency Parsing
- 7. Multil Task
- 8. Clear Cache
- 9. List of pretrained models
1. Sentence Segmentation
>>> import deepnlp
>>> text = """\
Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it. Did he mind? Adam Jones Jr. thinks he didn't. In any case, this isn't true... Well, with a probability of .9 it isn't.
"""
>>> deepnlp.sentence_tokenize(text)
['Mr. Smith bought cheapsite.com for 1.5 million dollars, i.e. he paid a lot for it.',
'Did he mind?',
"Adam Jones Jr. thinks he didn't.",
"In any case, this isn't true...",
"Well, with a probability of .9 it isn't.",
'']
2. Word Tokenize
>>> import deepnlp
>>> text = "I have an apple."
>>> deepnlp.word_tokenize(text)
['I', 'have', 'an', 'apple', '.']
3. Install and load pretrained model and vocabs
>>> import deepnlp
>>> deepnlp.download('deepnlp_eng')
- Or you can also install pretrained model and vocabs independently of each other
>>> import deepnlp
>>> deepnlp.download_model('deepnlp_eng')
>>> deepnlp.download_vocabs('deepnlp_eng')
- Load models and vocabs
>>> import deepnlp
>>> model = deepnlp.load_model('deepnlp_eng')
>>> vocabs= deepnlp.load_vocabs('deepnlp_eng', task= 'multi') # pos, ner, dp
4. POS Tagging
- With
PosTagger
class
>>> import deepnlp
>>> model= deepnlp.PosTagger('deepnlp_eng')
>>> model
model_name: deepnlp_eng, vocab_name: deepnlp_eng, tokenizer_name: distilroberta-base
>>> output= model.inference('I have an apple.', device= 'cpu') # default device = 'cpu'
>>> output
<deepnlp.utils.data_struct.TokenClassificationData at 0x7fbc3ddbab90>
>>> output.value()
{'Sequence': 'I have an apple.',
'Inference': {'I': {'score': 0.9175689, 'label': 'PRP'},
'have': {'score': 0.9232193, 'label': 'VBP'},
'an': {'score': 0.9158458, 'label': 'DT'},
'apple': {'score': 0.86957675, 'label': 'NN'},
'.': {'score': 0.8892631, 'label': '.'}}}
>>> deepnlp.print_out([output])
I have an apple.
1 I PRP
2 have VBP
3 an DT
4 apple NN
5 . .
- With
pipeline
class
>>> import deepnlp
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'pos_tagger')
>>> output= pipeline("I have an apple.", device= 'cpu') # default device = 'cpu'
>>> deepnlp.print_out([output])
I have an apple.
1 I PRP
2 have VBP
3 an DT
4 apple NN
5 . .
5. Named Entity Recognition
>>> import deepnlp
>>> model = deepnlp.NerTagger('deepnlp_eng')
>>> output= model.inference('Please confirm your song choice: Same Old War, playing on the kitchen speaker', device= 'cpu') # default device = 'cpu'
output
<deepnlp.utils.data_struct.TokenClassificationData at 0x7f69d9504750>
>>> output.value()
{'Sequence': 'Please confirm your song choice: Same Old War, playing on the kitchen speaker',
'Inference': {'Same': {'score': 0.922773, 'label': 'B-MISC'},
'Old': {'score': 0.9353856, 'label': 'I-MISC'},
'War': {'score': 0.92017937, 'label': 'I-MISC'}}}
>>> deepnlp.print_out([output], del_prefix_ner= False) # if you set del_prefix_ner= True, B-MISC or I-MISC will become MISC
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1 Please O
2 confirm O
3 your O
4 song O
5 choice O
6 Same B-MISC
7 Old I-MISC
8 War I-MISC
9 , O
10 playing O
11 on O
12 the O
13 kitchen O
14 speaker O
With pipeline
class
>>> import deepnlp
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'ner_tagger')
>>> output= pipeline("Please confirm your song choice: Same Old War, playing on the kitchen speaker")
>>> deepnlp.print_out([output], del_prefix_ner= True, device= 'cpu') # default device = 'cpu'
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1 Please O
2 confirm O
3 your O
4 song O
5 choice O
6 Same MISC
7 Old MISC
8 War MISC
9 , O
10 playing O
11 on O
12 the O
13 kitchen O
14 speaker O
6. Dependency Parsing
>>> import deepnlp
>>> model = deepnlp.DPParser('deepnlp_eng')
>>> output= model.inference("I have an apple.", device= 'cpu') # default device = 'cpu'
>>> output
<deepnlp.utils.data_struct.ParserData at 0x7f69da3125d0>
>>> output.value()
{'Sequence': 'I have an apple.',
'Inference': {'xpos': ['PRP', 'VBP', 'DT', 'NN', '.'],
'head': [2, 0, 4, 2, 2],
'rela': ['nsubj', 'root', 'det', 'obj', 'punct']}}
>>> deepnlp.print_out([output])
I have an apple.
1 I PRP 2 nsubj
2 have VBP 0 root
3 an DT 4 det
4 apple NN 2 obj
5 . . 2 punct
With pipeline
class
>>> import deepnlp
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'dp_parser')
>>> output= pipeline("I have an apple.", device= 'cpu') # default device = 'cpu'
>>> deepnlp.print_out([output])
I have an apple.
1 I PRP 2 nsubj
2 have VBP 0 root
3 an DT 4 det
4 apple NN 2 obj
5 . . 2 punct
7. Multi Task
>>> import deepnlp
>>> model = deepnlp.MultiTask('deepnlp_eng')
>>> output= model.inference("Please confirm your song choice: Same Old War, playing on the kitchen speaker", device= 'cpu') # default device = 'cpu'
>>> output
<deepnlp.utils.data_struct.MultiData at 0x7f69da8f7650>
>>> deepnlp.print_out([output])
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1 Please UH O 2 discourse
2 confirm VB O 0 root
3 your PRP$ O 5 nmod:poss
4 song NN O 5 compound
5 choice NN O 2 obj
6 Same JJ MISC 8 amod
7 Old NNP MISC 8 compound
8 War NNP MISC 2 obj
9 , , O 2 punct
10 playing VBG O 2 advcl
11 on IN O 14 case
12 the DT O 14 det
13 kitchen NN O 14 compound
14 speaker NN O 10 obl
With pipeline
>>> import deepnlp
>>> model= deepnlp.load_model('deepnlp_eng')
>>> pipeline= deepnlp.pipeline(model, task= 'multi')
>>> output= pipeline("Please confirm your song choice: Same Old War, playing on the kitchen speaker", device= 'cpu') # default device = 'cpu'
>>> deepnlp.print_out([output])
Please confirm your song choice: Same Old War, playing on the kitchen speaker
1 Please UH O 2 discourse
2 confirm VB O 0 root
3 your PRP$ O 5 nmod:poss
4 song NN O 5 compound
5 choice NN O 2 obj
6 Same JJ MISC 8 amod
7 Old NNP MISC 8 compound
8 War NNP MISC 2 obj
9 , , O 2 punct
10 playing VBG O 2 advcl
11 on IN O 14 case
12 the DT O 14 det
13 kitchen NN O 14 compound
14 speaker NN O 10 obl
8. Clear Cache
- Remove pretrained model and vocabs
deepnlp_eng
>>> deepnlp.clear_cache('deepnlp_eng')
- Or
>>> deepnlp.clear_model('deepnlp_eng')
>>> deepnlp.clear_vocabs('deepnlp_eng')
9. List of pretrained models
deppnlp_eng
- support for English: download pretrained model - download vocabsdeepnlp_vie
- support for Vietnamese: Will be updated in the future
License
Apache 2.0 License.
Copyright © 2022 Hieu Pham. All rights reserved.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
deepnlp-cerelab-1.0.2.tar.gz
(14.6 kB
view details)
Built Distribution
File details
Details for the file deepnlp-cerelab-1.0.2.tar.gz
.
File metadata
- Download URL: deepnlp-cerelab-1.0.2.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76fe678148f15133feb3023b3adafc2a9644cb8715df068142a1d729114733be |
|
MD5 | ab230b5bf97dfe0dee3cf75a52df3184 |
|
BLAKE2b-256 | 35884d6a5ece03f0bd15ac59b27ba4df22a91787a80937f843b5e512584923af |
File details
Details for the file deepnlp_cerelab-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: deepnlp_cerelab-1.0.2-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d313870ee8933f81f123835b2a744c1bde47ccdf013ca5ba093971b0b616cceb |
|
MD5 | 9fa973003899b7f32a868872a9e1440a |
|
BLAKE2b-256 | 7088fd129318c13e6d7173cc9adb60ccddd95360c6ef95ffb2d117a15880a524 |