English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.
Customized SpaCy pipeline
This is available on PyPi
$ pip install en-qai-sm > installs the package and deps including spacy
import spacy nlp = spacy.load('en_qai_sm') >>> doc = nlp("I ain't got no hands!") >>> for token in doc: print(token, token.pos_) ... I PRON ain't VERB got VERB no DET hands NOUN ! PUNCT
About SpaCy pipelines
Default spaCy pipeline consists of 4 steps (components):
tokenizer- splits text into tokens
tagger- assigns part-of-speech tags
parser- assigns dependency labels
ner- detects and label named entities
Custom components (ex. any functions on
doc) can be inserted into the pipeline (at any place after the
tokenizer. For simplicity,
tokenizer is not listed in pipelines descriptions.)
Reference: spaCy docs.
The pipeline consists of:
pipeline = [ "merge_matcher", "tagger", "parser", "ner" ]
merge_matcher matches and merges into 1 token spans of type:
- connected by hyphens ex.
- contractions ex.
- special (informal) short forms ex.
As this is just a small extension of spaCy's
en_core_web_sm, we include the same license - MIT.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size en_qai_sm-1.2.1.tar.gz (51.2 MB)||File type Source||Python version None||Upload date||Hashes View hashes|