Skip to main content

HuSpaCy: industrial strength Hungarian natural language processing

Project description

python version spacy PyPI - Wheel PyPI version license

Hits pip downloads Demo stars

HuSpaCy: Industrial-strength Hungarian NLP

HuSpaCy is a spaCy model and library providing industrial-strength Hungarian language processing facilities. A live demo is available here.

This repository contain material to build the models for HuSpaCy.

Installation

To get started using the latest Hungarian model, you can fetch the model by installing huspacy from PyPI:

pip install huspacy

This should be followed by the model download:

import huspacy

huspacy.download()

Alternatively, one can install the latest models directly from Hugging Face Hub:

pip install https://huggingface.co/huspacy/hu_core_news_lg/resolve/main/hu_core_news_lg-any-py3-none-any.whl

To speed up inference, you might want to run the models on GPU for which you need to add CUDA support for spacy as described in here.

Usage

# Load the model through huspacy
import huspacy
nlp = huspacy.load()

# Load the mode using spacy.load().
import spacy
nlp = spacy.load("hu_core_news_lg")

# Or load the model directly as a module.
import hu_core_news_lg
nlp = hu_core_news_lg.load()

# Either way you get the same model and can start processing your texts.
doc = nlp('Csiribiri csiribiri zabszalma - négy csillag közt alszom ma.')

For a detailed guide on usage, check spaCy's documentation.

Available Models

Currently, we only support a single large model which has a good balance between accuracy and speed. You can play around with the tool capabilities in this interactive demo.

hu_core_news_lg provides tokenization, sentence splitting, part-of-speech tagging (UD labels w/ detailed morphosyntactic features), lemmatization, dependency parsing and named entity recognition and ships with pretrained word vectors.

Models' changes are recorded in the changelog.

Development

Installing requirements

  • poetry install will install all the dependencies
  • For better performance you might need to reinstall spacy with GPU support, e.g. poetry add spacy[cuda92] will add support for CUDA 9.2

Repository structure

├── .github            -- Github configuration files
├── data               -- Data files
│   ├── external       -- External models required to train models (e.g. word vectors)
│   ├── processed      -- Processed data ready to feed spacy
│   └── raw            -- Raw data, mostly corpora as they are obtained from the web
├── hu_core_news_lg    -- Spacy 3.x project files for building a model for news texts
│   ├── configs        -- Spacy pipeline configuration files
│   ├── project.lock               -- Auto-generated project script
│   ├── project.yml                -- Spacy3 Project file describing steps needed to build the model
│   └── README.md                  -- Instructions on building a model from scratch
├── huspacy            -- subproject for the PyPI distributable package
├── tools              -- Source package for tools
│   └── cli            -- Command line scripts (Python)
├── models             -- Trained models and their metadata
├── resources          -- Resource files
├── scripts            -- Bash scripts
├── tests              -- Test files 
├── CHANGELOG.md       -- Keeps the changelog
├── LICENSE            -- License file
├── poetry.lock        -- Locked poetry dependencies files
├── poetry.toml        -- Poetry configurations
├── pyproject.toml     -- Python project configutation, including dependencies managed with Poetry 
└── README.md          -- This file

Citing

If you use the models or this library in your research please cite this paper.
Additionally, please indicate the version of the model you used so that your research can be reproduced.

License

This library is released under the Apache 2.0 License. See the LICENSE file for more details.

The trained models have their own permissive license (CC BY-SA 4.0) as described on the models page.

Contact

For feature request issues and bugs please use the GitHub Issue Tracker. Otherwise, please use the Discussion Forums.

Acknowledgments

The project was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

huspacy-0.4.0a6-py3-none-any.whl (4.8 kB view details)

Uploaded Python 3

File details

Details for the file huspacy-0.4.0a6-py3-none-any.whl.

File metadata

  • Download URL: huspacy-0.4.0a6-py3-none-any.whl
  • Upload date:
  • Size: 4.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.0 CPython/3.7.7 Linux/5.11.0-41-generic

File hashes

Hashes for huspacy-0.4.0a6-py3-none-any.whl
Algorithm Hash digest
SHA256 fce575077ab6a42ac5a897caa4c3c55aa093fa38cdc9098c25c1eb8367001103
MD5 56cd6982654f68ff32a9136cb9ba3b63
BLAKE2b-256 78462277c2dabe5f0add20cfd8c2808efe4fde62d420cc5bf80522eba51fe84c

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page