Use fast UDPipe models directly in spaCy
Project description
spaCy + UDPipe
This package wraps the fast and efficient UDPipe language-agnostic NLP pipeline (via its Python bindings), so you can use UDPipe pre-trained models as a spaCy pipeline for 50+ languages out-of-the-box. Inspired by spacy-stanfordnlp, this package offers slightly less accurate models that are in turn much faster (see benchmarks for UDPipe and StanfordNLP).
Installation
Use the package manager pip to install spacy-udpipe.
pip install spacy-udpipe
After installation, use spacy_udpipe.download(lang) to download the pre-trained model for the desired language.
Usage
The loaded UDPipeLanguage class returns a spaCy Language object, i.e., the nlp object you can use to process text and create a Doc object.
import spacy_udpipe
spacy_udpipe.download("en") # download English model
text = "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world."
nlp = spacy_udpipe.load("en")
doc = nlp(text)
for token in doc:
print(token.text, token.lemma_, token.pos_, token.dep_)
As all attributes are computed once and set in the custom Tokenizer, the nlp.pipeline is empty.
Authors and acknowledgment
Created by Antonio Šajatović during an internship at Text Analysis and Knowledge Engineering Lab (TakeLab).
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
To start the tests, just run pytest in the root source directory.
License
MIT © TakeLab
Project status
Maintained by Text Analysis and Knowledge Engineering Lab (TakeLab).
Notes
-
All available pre-trained models are licensed under CC BY-NC-SA 4.0.
-
All annotations match with Spacy's, except for token.tag_, which map from CoNLL XPOS tag (language-specific part-of-speech tag), defined for each language separately by the corresponding Universal Dependencies treebank.
-
Full list of supported languages and models is available in
languages.json. -
This package exposes a
spacy_languagesentry point in itssetup.pyso full suport for serialization is enabled:nlp = spacy_udpipe.load("en") nlp.to_disk("./udpipe-spacy-model")
To properly load a saved model, you must pass the
udpipe_modelargument when loading it:udpipe_model = spacy_udpipe.UDPipeModel("en") nlp = spacy.load("./udpipe-spacy-model", udpipe_model=udpipe_model)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spacy-udpipe-0.0.4.tar.gz.
File metadata
- Download URL: spacy-udpipe-0.0.4.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ed9684991a9c5c0a5933472dfc156f06c1c0dcfa1519d79346a9526ef94d75e
|
|
| MD5 |
6db98b9bb103d50cda1f9ff7bb68f9d2
|
|
| BLAKE2b-256 |
a224dec8e57afb68fab7f4f0c7acef9b498fbebaf1e2554457c30cf1cb7ffe47
|
File details
Details for the file spacy_udpipe-0.0.4-py3-none-any.whl.
File metadata
- Download URL: spacy_udpipe-0.0.4-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5937b068d974958e2f5fe25829ff82ce9761848823fe3e63cfbb8d8bde3012f
|
|
| MD5 |
ae1bca33a809cafafe9cab72deda4f5a
|
|
| BLAKE2b-256 |
32ec92c72d31760876771f4feeb28b368b466175c4a5a2e5c71ce4438938a025
|