Skip to main content

Use fast UDPipe models directly in spaCy

Project description

spaCy + UDPipe

This package wraps the fast and efficient UDPipe language-agnostic NLP pipeline (via its Python bindings), so you can use UDPipe pre-trained models as a spaCy pipeline for 50+ languages out-of-the-box. Inspired by spacy-stanfordnlp, this package offers slightly less accurate models that are in turn much faster (see benchmarks for UDPipe and StanfordNLP).

Installation

Use the package manager pip to install spacy-udpipe.

pip install spacy-udpipe

After installation, use spacy_udpipe.download(lang) to download the pre-trained model for the desired language.

Usage

The loaded UDPipeLanguage class returns a spaCy Language object, i.e., the nlp object you can use to process text and create a Doc object.

import spacy_udpipe

spacy_udpipe.download("en") # download English model

text = "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world."
nlp = spacy_udpipe.load("en")

doc = nlp(text)
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_)

As all attributes are computed once and set in the custom Tokenizer, the nlp.pipeline is empty.

Authors and acknowledgment

Created by Antonio Šajatović during an internship at Text Analysis and Knowledge Engineering Lab (TakeLab).

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

To start the tests, just run pytest in the root source directory.

License

MIT © TakeLab

Project status

Maintained by Text Analysis and Knowledge Engineering Lab (TakeLab).

Notes

  • All available pre-trained models are licensed under CC BY-NC-SA 4.0.

  • All annotations match with Spacy's, except for token.tag_, which map from CoNLL XPOS tag (language-specific part-of-speech tag), defined for each language separately by the corresponding Universal Dependencies treebank.

  • Full list of supported languages and models is available in languages.json.

  • This package exposes a spacy_languages entry point in its setup.py so full suport for serialization is enabled:

    nlp = spacy_udpipe.load("en")
    nlp.to_disk("./udpipe-spacy-model")
    

    To properly load a saved model, you must pass the udpipe_model argument when loading it:

    udpipe_model = spacy_udpipe.UDPipeModel("en")
    nlp = spacy.load("./udpipe-spacy-model", udpipe_model=udpipe_model)
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy-udpipe-0.0.3.tar.gz (10.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

spacy_udpipe-0.0.3-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

spacy_udpipe-0.0.3-py2.py3-none-any.whl (10.4 kB view details)

Uploaded Python 2Python 3

File details

Details for the file spacy-udpipe-0.0.3.tar.gz.

File metadata

  • Download URL: spacy-udpipe-0.0.3.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.9

File hashes

Hashes for spacy-udpipe-0.0.3.tar.gz
Algorithm Hash digest
SHA256 90774a79c50819b71bf678760cecb6ed35e0ce79e0529d43c7384dc844b05f7d
MD5 6f69ab7947018714592f539b8ca82c33
BLAKE2b-256 a80188dc8fefac862910e51fb19a9cad43351e94bf5de2df0dc97393531958d3

See more details on using hashes here.

File details

Details for the file spacy_udpipe-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: spacy_udpipe-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.9

File hashes

Hashes for spacy_udpipe-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 87fd855a2c7cd5b578b88c5f5fc488aa026837130df10e5c46b42a46ec520ef3
MD5 89b48596b4acd4fafa00ab8cb27ab369
BLAKE2b-256 4c686bccfb048bcf28147a2a261c986dddd988ad45325ce0a73278a77a55d3fc

See more details on using hashes here.

File details

Details for the file spacy_udpipe-0.0.3-py2.py3-none-any.whl.

File metadata

  • Download URL: spacy_udpipe-0.0.3-py2.py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.9

File hashes

Hashes for spacy_udpipe-0.0.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 55e79ba16672cfabd39df3b5b8f91077791838be6d77ce19e6efc18bdc7b6335
MD5 94842bbb21af0abe996fe6b013142c4f
BLAKE2b-256 704cef2175b18434ab0ca530fe38dc12fb4ba86f4047fa2e9d9925a2200cb715

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page