Skip to main content

English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.

Project description


Customized SpaCy pipeline


This is available on PyPi

$ pip install en-qai-sm
> installs the package and deps including spacy


import spacy
nlp = spacy.load('en_qai_sm')
>>> doc = nlp("I ain't got no hands!")
>>> for token in doc: print(token, token.pos_)
ain't VERB
got VERB
no DET
hands NOUN

About SpaCy pipelines

Default spaCy pipeline consists of 4 steps (components):

spaCy pipeline

  • tokenizer - splits text into tokens
  • tagger - assigns part-of-speech tags
  • parser - assigns dependency labels
  • ner - detects and label named entities

Custom components (ex. any functions on doc) can be inserted into the pipeline (at any place after the tokenizer. For simplicity, tokenizer is not listed in pipelines descriptions.)

Reference: spaCy docs.

Pipeline components


The pipeline consists of:

pipeline = [

where merge_matcher matches and merges into 1 token spans of type:

  • connected by hyphens ex. rock-hard
  • contractions ex. don't
  • special (informal) short forms ex. gonna


As this is just a small extension of spaCy's en_core_web_sm, we include the same license - MIT.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for en-qai-sm, version 1.2.1
Filename, size File type Python version Upload date Hashes
Filename, size en_qai_sm-1.2.1.tar.gz (51.2 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page