English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.
Project description
library.qai.spacy
Customized SpaCy pipeline
Installing
This is available on PyPi
$ pip install en-qai-sm > installs the package and deps including spacy
Usage
import spacy nlp = spacy.load('en_qai_sm') >>> doc = nlp("I ain't got no hands!") >>> for token in doc: print(token, token.pos_) ... I PRON ain't VERB got VERB no DET hands NOUN ! PUNCT
About SpaCy pipelines
Default spaCy pipeline consists of 4 steps (components):
tokenizer
- splits text into tokenstagger
- assigns part-of-speech tagsparser
- assigns dependency labelsner
- detects and label named entities
Custom components (ex. any functions on doc
) can be inserted into the pipeline (at any place after the tokenizer
. For simplicity, tokenizer
is not listed in pipelines descriptions.)
Reference: spaCy docs.
Pipeline components
v1.0.0
The pipeline consists of:
pipeline = [ "merge_matcher", "tagger", "parser", "ner" ]
where merge_matcher
matches and merges into 1 token spans of type:
- connected by hyphens ex.
rock-hard
- contractions ex.
don't
- special (informal) short forms ex.
gonna
License
As this is just a small extension of spaCy's en_core_web_sm
, we include the same license - MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size en_qai_sm-1.2.1.tar.gz (51.2 MB) | File type Source | Python version None | Upload date | Hashes View |