Use fast UDPipe models directly in spaCy

These details have not been verified by PyPI

Project links

Project description

spaCy + UDPipe

This package wraps the fast and efficient UDPipe language-agnostic NLP pipeline (via its Python bindings), so you can use UDPipe pre-trained models as a spaCy pipeline for 50+ languages out-of-the-box. Inspired by spacy-stanza, this package offers slightly less accurate models that are in turn much faster (see benchmarks for UDPipe and Stanza).

Installation

Use the package manager pip to install spacy-udpipe.

pip install spacy-udpipe

After installation, use spacy_udpipe.download() to download the pre-trained model for the desired language.

A full list of pre-trained UDPipe models for supported languages can be found in languages.json.

Usage

The loaded UDPipeLanguage class returns a spaCy Language object, i.e., the object you can use to process text and create a Doc object.

import spacy_udpipe

spacy_udpipe.download("en") # download English model

text = "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world."
nlp = spacy_udpipe.load("en")

doc = nlp(text)
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_)

As all attributes are computed once and set in the custom Tokenizer, the Language.pipeline is empty.

The type of text can be one of the following:

unprocessed: str,
presegmented: List[str],
pretokenized: List[List[str]].

Loading a custom model

The following code snippet demonstrates how to load a custom UDPipe model (for the Croatian language):

import spacy_udpipe

nlp = spacy_udpipe.load_from_path(lang="hr",
                                  path="./custom_croatian.udpipe",
                                  meta={"description": "Custom 'hr' model"})
text = "Wikipedija je enciklopedija slobodnog sadržaja."

doc = nlp(text)
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_)

This can be done for any of the languages supported by spaCy. For an exhaustive list, see spaCy languages.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update the tests as appropriate. Tests are run automatically for each pull request on the master branch. To start the tests locally, first, install the package with pip install -e ., then run pytest in the root source directory.

License

Source code: MIT © Text Analysis and Knowledge Engineering Lab (TakeLab)
Available pre-trained models: CC BY-NC-SA 4.0

Project status

Maintained by Text Analysis and Knowledge Engineering Lab (TakeLab).

Notes

Known possible issues:
- Tag map
  
  Token.tag_ is a CoNLL XPOS tag (language-specific part-of-speech tag), defined for each language separately by the corresponding Universal Dependencies treebank. Mappings between XPOS and Universal Dependencies POS tags should be defined in a TAG_MAP dictionary (located in language-specific tag_map.py files), along with optional morphological features. See spaCy tag map for more details.
- Syntax iterators
  
  In order to extract Doc.noun_chunks, a proper syntax iterator implementation for the language of interest is required. For more details, please see spaCy syntax iterators.
- Other language-specific issues
  
  A quick way to check language-specific defaults in spaCy is to visit spaCy language support. Also, please see spaCy language data for details regarding other language-specific data.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.0

Jun 13, 2021

0.3.2

Oct 18, 2020

0.3.1

May 23, 2020

0.3.0

May 9, 2020

0.2.1

Apr 20, 2020

0.2.0

Mar 27, 2020

0.1.0

Dec 16, 2019

0.0.4

Oct 15, 2019

0.0.3

Aug 21, 2019

0.0.2

Aug 8, 2019

0.0.1

Aug 8, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_udpipe-1.0.0.tar.gz (9.8 kB view details)

Uploaded Jun 13, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spacy_udpipe-1.0.0-py3-none-any.whl (11.9 kB view details)

Uploaded Jun 13, 2021 Python 3

File details

Details for the file spacy_udpipe-1.0.0.tar.gz.

File metadata

Download URL: spacy_udpipe-1.0.0.tar.gz
Upload date: Jun 13, 2021
Size: 9.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.6.13

File hashes

Hashes for spacy_udpipe-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`56dd86d4c079f381c0931967dd76a7c42889c7daa4d019b33a49f8d0e6be549b`
MD5	`ff2a8a9174157beea14f332128980ef3`
BLAKE2b-256	`dc27cb47e9f96d3871c4f77a53beca23e152f93621a726ddd11fef4818a066e6`

See more details on using hashes here.

File details

Details for the file spacy_udpipe-1.0.0-py3-none-any.whl.

File metadata

Download URL: spacy_udpipe-1.0.0-py3-none-any.whl
Upload date: Jun 13, 2021
Size: 11.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.6.13

File hashes

Hashes for spacy_udpipe-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f1a80034b9c9a71c1e769d450bc3fe203ba5a3d4c17409adf763567fab63db9b`
MD5	`0732b2985a50c9cf0b0e3385ec2bb02c`
BLAKE2b-256	`a40d248fa6101b8ad44891b2bf5f1893a12b03e85c7da718d41fd8967e5c1a5e`

See more details on using hashes here.

spacy-udpipe 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

spaCy + UDPipe

Installation

Usage

Loading a custom model

Contributing

License

Project status

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes