Skip to main content

A collection of ukrainian language datasets

Project description

NaUKMA FIdo Logo

ua_datasets

Version License

Repo Status

ua_datasets is a collection of Ukrainian language datasets. Our aim is to build a benchmark for research related to natural language processing in Ukrainian.

This library is provided by FIdo.ai (machine learning research division of the non-profit student's organization FIdo, National University of Kyiv-Mohyla Academy) for research purposes.

Installation

The library can be installed from PyPi in your virtual environment (e.g. venv, conda env)

pip install ua_datasets

Available Datasets

Text classification

Token classification

Question Answering

Contribution

In case you are willing to contribute (update any part of the library, add your dataset) do not hesitate to connect through GitHub Issue. Thanks in advance for your contribution! Let's make the Ukrainian language even greater!

Citation

@article{
  title={ua_datasets: a collection of Ukrainian language datasets},
  author={Bogdan Ivanyuk-Skulskiy, Anton Zaliznyi, Oleksand Reshetar, Oleksiy Protsyk, Bohdan Romanchuk, Vladyslav Shpihanovych},
  year={2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ua-datasets-0.0.5.tar.gz (609.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page