Skip to main content

spacy wrapper for Trankit, a Transformer-based multilingual neural dependency parser with tokenization and NER

Project description

spaCy + Trankit

This package wraps the Trankit library, so you can use trankit models in a spaCy pipeline.

GitHub Code style: black

Using this wrapper, you'll be able to use the following annotations, computed by your pretrained trankit pipeline/model:

  • Statistical tokenization (reflected in the Doc and its tokens)
  • Lemmatization (token.lemma and token.lemma_)
  • Part-of-speech tagging (token.tag, token.tag_, token.pos, token.pos_)
  • Morphological analysis (token.morph)
  • Dependency parsing (token.dep, token.dep_, token.head)
  • Named entity recognition (doc.ents, token.ent_type, token.ent_type_, token.ent_iob, token.ent_iob_)
  • Sentence segmentation (doc.sents)

️️️⌛️ Installation

As of v0.1.0 spacy-trankit is only compatible with spaCy v3.x. To install the most recent version:

pip install git+https://github.com/imvladikon/spacy-trankit

📖 Usage & Examples

Load pre-trained trankit model into a spaCy pipeline:

import spacy_trankit

# Initialize the pipeline
nlp = spacy_trankit.load("en")

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)

Load it from the path:

import spacy_trankit

# Initialize the pipeline
nlp = spacy_trankit.load_from_path(name="en", path="./cache") 

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

spacy_trankit-0.1.0-py3-none-any.whl (6.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page