spacy wrapper for Trankit, a Transformer-based multilingual neural dependency parser with tokenization and NER

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vladigur

These details have not been verified by PyPI

Project description

spaCy + Trankit

This package wraps the Trankit library, so you can use trankit models in a spaCy pipeline.

Using this wrapper, you'll be able to use the following annotations, computed by your pretrained trankit pipeline/model:

Statistical tokenization (reflected in the Doc and its tokens)
Lemmatization (token.lemma and token.lemma_)
Part-of-speech tagging (token.tag, token.tag_, token.pos, token.pos_)
Morphological analysis (token.morph)
Dependency parsing (token.dep, token.dep_, token.head)
Named entity recognition (doc.ents, token.ent_type, token.ent_type_, token.ent_iob, token.ent_iob_)
Sentence segmentation (doc.sents)
Multiword token preservation for languages such as Arabic and Hebrew via token._.trankit_expanded

️️️⌛️ Installation

As of v0.2.1 spacy-trankit is only compatible with spaCy v3.x. On Python 3.12, spacy-trankit applies a runtime compatibility patch for the current trankit dataclass issue in adapter_transformers before creating the pipeline. To install the most recent version:

pip install git+https://github.com/imvladikon/spacy-trankit

or from pypi:

pip install spacy-trankit

📖 Usage & Examples

Load pre-trained trankit model into a spaCy pipeline:

import spacy_trankit

# Initialize the pipeline
nlp = spacy_trankit.load("en")

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)

By default, mwt_strategy="auto" expands multiword tokens when the expanded tokens can be aligned back to the original text without changing doc.text. Expansions that cannot be represented as substrings of the original text are kept non-destructive. For example, Arabic and Hebrew clitic expansions can differ from the surface token, so the spaCy token keeps the original surface form and stores Trankit's expansion under token._.trankit_expanded.

doc = nlp("ذهبت للبيت اليوم")
for token in doc:
    print(token.text, token._.trankit_expanded)

If you always want surface tokens, pass mwt_strategy="preserve". If you need the previous expanded-token behavior and accept that spaCy may have to replace the original text with space-separated expanded tokens for unalignable cases, pass mwt_strategy="expand":

nlp = spacy_trankit.load("ar", mwt_strategy="preserve")
nlp = spacy_trankit.load("ar", mwt_strategy="expand")

Load it from the path:

import spacy_trankit

# Initialize the pipeline
nlp = spacy_trankit.load_from_path(name="en", path="./cache") 

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)

📦 Model downloads

The Trankit release on PyPI fetches its pretrained models from nlp.uoregon.edu, which is currently unavailable. spacy-trankit bypasses that broken download path and pulls the same artifacts from Trankit's HuggingFace mirror (https://huggingface.co/uonlp/trankit) into the local cache before instantiating the Trankit pipeline. The behaviour is automatic; no extra setup is needed.

If you mirror the artifacts elsewhere (e.g. for offline / air-gapped use), point spacy-trankit at it via the SPACY_TRANKIT_MODEL_URL environment variable. The template understands {version}, {embedding} and {lang}:

export SPACY_TRANKIT_MODEL_URL="https://my-mirror.example.com/trankit/{version}/{embedding}/{lang}.zip"

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

vladigur

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.3

May 5, 2026

0.1.0

Jan 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_trankit-0.2.3.tar.gz (15.2 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spacy_trankit-0.2.3-py3-none-any.whl (11.2 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file spacy_trankit-0.2.3.tar.gz.

File metadata

Download URL: spacy_trankit-0.2.3.tar.gz
Upload date: May 5, 2026
Size: 15.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spacy_trankit-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`fd45e0c70a51b1c242671bc431ba5645941043fb3d697e65a7a25ec6bcf24fc4`
MD5	`fa5189040a8dd25776350022f46b69bc`
BLAKE2b-256	`5aac84701a3c4a28ced7997829f48dd079f792f2fb14e85c4325f90bd2e4b604`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spacy_trankit-0.2.3.tar.gz:

Publisher: ci.yml on imvladikon/spacy-trankit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spacy_trankit-0.2.3.tar.gz
- Subject digest: fd45e0c70a51b1c242671bc431ba5645941043fb3d697e65a7a25ec6bcf24fc4
- Sigstore transparency entry: 1440039286
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: imvladikon/spacy-trankit@8a9e3830c17d0e6bff339b6800d7cd345505b4b3
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/imvladikon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@8a9e3830c17d0e6bff339b6800d7cd345505b4b3
- Trigger Event: release

File details

Details for the file spacy_trankit-0.2.3-py3-none-any.whl.

File metadata

Download URL: spacy_trankit-0.2.3-py3-none-any.whl
Upload date: May 5, 2026
Size: 11.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for spacy_trankit-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5f9c375a9d6dd661d38ebc038b99cacafb5665fe36ea5ea7ec9803673a1f33d8`
MD5	`3ff15b6fbadac8f10bef48eb795b853e`
BLAKE2b-256	`80450022467dd41b5a12e929618b35a27c5f2ca736eedd72d999851728014640`

See more details on using hashes here.

Provenance

The following attestation bundles were made for spacy_trankit-0.2.3-py3-none-any.whl:

Publisher: ci.yml on imvladikon/spacy-trankit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: spacy_trankit-0.2.3-py3-none-any.whl
- Subject digest: 5f9c375a9d6dd661d38ebc038b99cacafb5665fe36ea5ea7ec9803673a1f33d8
- Sigstore transparency entry: 1440039289
- Sigstore integration time: May 5, 2026
Source repository:
- Permalink: imvladikon/spacy-trankit@8a9e3830c17d0e6bff339b6800d7cd345505b4b3
- Branch / Tag: refs/tags/v0.2.3
- Owner: https://github.com/imvladikon
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@8a9e3830c17d0e6bff339b6800d7cd345505b4b3
- Trigger Event: release

spacy-trankit 0.2.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

spaCy + Trankit

️️️⌛️ Installation

📖 Usage & Examples

📦 Model downloads

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance