Skip to main content

HuSpaCy: industrial strength Hungarian natural language processing

Project description

python version spacy PyPI - Wheel PyPI version license

Hits pip downloads Demo stars

HuSpaCy: Industrial-strength Hungarian NLP

HuSpaCy is a spaCy model and library providing industrial-strength Hungarian language processing facilities. A live demo is available here. This repository contain material to build the models for HuSpaCy.

Installation

To get started using the latest Hungarian model, you can fetch the model by installing huspacy from PyPI:

pip install huspacy

This should be followed by the model download:

import huspacy

huspacy.download()

Alternatively, one can install the latest models directly from Hugging Face Hub:

pip install https://huggingface.co/huspacy/hu_core_news_lg/resolve/main/hu_core_news_lg-any-py3-none-any.whl

To speed up inference, you might want to run the models on GPU for which you need to add CUDA support for spacy as described in here.

Usage

# Load the model through huspacy
import huspacy
huspacy.load()

# Load the mode using spacy.load().
import spacy
nlp = spacy.load("hu_core_news_lg")

# Or load the model directly as a module.
import hu_core_news_lg
nlp = hu_core_news_lg.load()

# Either way you get the same model and can start processing your texts.
doc = nlp('Csiribiri csiribiri zabszalma - négy csillag közt alszom ma.')

For a detailed guide on usage, check spaCy's documentation.

Available Models

Currently, we only support a single large model which has a good balance between accuracy and speed. You can play around with the tool capabilities in this interactive demo.

hu_core_news_lg provides tokenization, sentence splitting, part-of-speech tagging (UD labels w/ detailed morphosyntactic features), lemmatization, dependency parsing and named entity recognition and ships with pretrained word vectors.

Models' changes are recorded in the changelog.

Development

Installing requirements

  • poetry install will install all the dependencies
  • For better performance you might need to reinstall spacy with GPU support, e.g. poetry add spacy[cuda92] will add support for CUDA 9.2

Repository structure

├── .github            -- Github configuration files
├── data               -- Data files
│   ├── external       -- External models required to train models (e.g. word vectors)
│   ├── processed      -- Processed data ready to feed spacy
│   └── raw            -- Raw data, mostly corpora as they are obtained from the web
├── hu_core_news_lg    -- Spacy 3.x project files for building a model for news texts
│   ├── configs        -- Spacy pipeline configuration files
│   ├── project.lock               -- Auto-generated project script
│   ├── project.yml                -- Spacy3 Project file describing steps needed to build the model
│   └── README.md                  -- Instructions on building a model from scratch
├── huspacy            -- subproject for the PyPI distributable package
├── tools              -- Source package for tools
│   └── cli            -- Command line scripts (Python)
├── models             -- Trained models and their metadata
├── resources          -- Resource files
├── scripts            -- Bash scripts
├── tests              -- Test files 
├── CHANGELOG.md       -- Keeps the changelog
├── LICENSE            -- License file
├── poetry.lock        -- Locked poetry dependencies files
├── poetry.toml        -- Poetry configurations
├── pyproject.toml     -- Python project configutation, including dependencies managed with Poetry 
└── README.md          -- This file

Citing

If you use the models or this library in your research please cite this paper.
Additionally, please indicate the version of the model you used so that your research can be reproduced.

License

This library is released under the Apache 2.0 License. See the LICENSE file for more details.

The trained models have their own license as described on the models hub.

Contact

For feature request issues and bugs please use the GitHub Issue Tracker. Otherwise, please use the Discussion Forums.

Acknowledgments

The project was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

huspacy-0.4.0a0-py3-none-any.whl (4.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page