Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

English multi-task CNN trained on OntoNotes. Assigns context-specific token vectors, POS tags, dependency parse and named entities.

Project description

library.qai.spacy

Customized SpaCy pipeline

Installing

This is available on PyPi

$ pip install en-qai-sm
> installs the package and deps including spacy

Usage

import spacy
nlp = spacy.load('en_qai_sm')
>>> doc = nlp("I ain't got no hands!")
>>> for token in doc: print(token, token.pos_)
...
I PRON
ain't VERB
got VERB
no DET
hands NOUN
! PUNCT

About SpaCy pipelines

Default spaCy pipeline consists of 4 steps (components):

spaCy pipeline

  • tokenizer - splits text into tokens
  • tagger - assigns part-of-speech tags
  • parser - assigns dependency labels
  • ner - detects and label named entities

Custom components (ex. any functions on doc) can be inserted into the pipeline (at any place after the tokenizer. For simplicity, tokenizer is not listed in pipelines descriptions.)

Reference: spaCy docs.

Pipeline components

v1.0.0

The pipeline consists of:

pipeline = [
    "merge_matcher",
    "tagger",
    "parser",
    "ner"
    ]

where merge_matcher matches and merges into 1 token spans of type:

  • connected by hyphens ex. rock-hard
  • contractions ex. don't
  • special (informal) short forms ex. gonna

License

As this is just a small extension of spaCy's en_core_web_sm, we include the same license - MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for en-qai-sm, version 1.2.1
Filename, size File type Python version Upload date Hashes
Filename, size en_qai_sm-1.2.1.tar.gz (51.2 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page