Skip to main content

HuSpaCy: industrial strength Hungarian natural language processing

Project description

python version spacy PyPI - Wheel PyPI version license

Hits pip downloads Demo stars

HuSpaCy: Industrial-strength Hungarian NLP

HuSpaCy is a spaCy model and library providing industrial-strength Hungarian language processing facilities. A live demo is available here. This repository contain material to build the models for HuSpaCy.

Installation

To get started using the latest Hungarian model, you can fetch the model by installing huspacy from PyPI:

pip install huspacy

This should be followed by the model download:

import huspacy

huspacy.download()

Alternatively, one can install the latest models directly from Hugging Face Hub:

pip install https://huggingface.co/huspacy/hu_core_news_lg/resolve/main/hu_core_news_lg-any-py3-none-any.whl

To speed up inference, you might want to run the models on GPU for which you need to add CUDA support for spacy as described in here.

Usage

# Load the model through huspacy
import huspacy
huspacy.load()

# Load the mode using spacy.load().
import spacy
nlp = spacy.load("hu_core_news_lg")

# Or load the model directly as a module.
import hu_core_news_lg
nlp = hu_core_news_lg.load()

# Either way you get the same model and can start processing your texts.
doc = nlp('Csiribiri csiribiri zabszalma - négy csillag közt alszom ma.')

For a detailed guide on usage, check spaCy's documentation.

Available Models

Currently, we only support a single large model which has a good balance between accuracy and speed. You can play around with the tool capabilities in this interactive demo.

hu_core_news_lg provides tokenization, sentence splitting, part-of-speech tagging (UD labels w/ detailed morphosyntactic features), lemmatization, dependency parsing and named entity recognition and ships with pretrained word vectors.

Models' changes are recorded in the changelog.

Development

Installing requirements

  • poetry install will install all the dependencies
  • For better performance you might need to reinstall spacy with GPU support, e.g. poetry add spacy[cuda92] will add support for CUDA 9.2

Repository structure

├── .github            -- Github configuration files
├── data               -- Data files
│   ├── external       -- External models required to train models (e.g. word vectors)
│   ├── processed      -- Processed data ready to feed spacy
│   └── raw            -- Raw data, mostly corpora as they are obtained from the web
├── hu_core_news_lg    -- Spacy 3.x project files for building a model for news texts
│   ├── configs        -- Spacy pipeline configuration files
│   ├── project.lock               -- Auto-generated project script
│   ├── project.yml                -- Spacy3 Project file describing steps needed to build the model
│   └── README.md                  -- Instructions on building a model from scratch
├── huspacy            -- subproject for the PyPI distributable package
├── tools              -- Source package for tools
│   └── cli            -- Command line scripts (Python)
├── models             -- Trained models and their metadata
├── resources          -- Resource files
├── scripts            -- Bash scripts
├── tests              -- Test files 
├── CHANGELOG.md       -- Keeps the changelog
├── LICENSE            -- License file
├── poetry.lock        -- Locked poetry dependencies files
├── poetry.toml        -- Poetry configurations
├── pyproject.toml     -- Python project configutation, including dependencies managed with Poetry 
└── README.md          -- This file

Citing

If you use the models or this library in your research please cite this paper.
Additionally, please indicate the version of the model you used so that your research can be reproduced.

License

This library is released under the Apache 2.0 License. See the LICENSE file for more details.

The trained models have their own license as described on the models hub.

Contact

For feature request issues and bugs please use the GitHub Issue Tracker. Otherwise, please use the Discussion Forums.

Acknowledgments

The project was supported by the Ministry of Innovation and Technology NRDI Office within the framework of the Artificial Intelligence National Laboratory Program.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

huspacy-0.4.0a2-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file huspacy-0.4.0a2-py3-none-any.whl.

File metadata

  • Download URL: huspacy-0.4.0a2-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.0 CPython/3.7.7 Linux/5.11.0-41-generic

File hashes

Hashes for huspacy-0.4.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 5faca43c9a64a9c54e7eaad9b0e2eaeddb585b6dce963fd2dad3c99266a32060
MD5 be5c6c37d5e738cd2b819f5b35657403
BLAKE2b-256 08a7ab752980460f63888d12f1a1da4b698967cbf36f6c369188e009ccad57a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page