Skip to main content

A collection of ukrainian language datasets

Project description

NaUKMA FIdo Logo

ua_datasets

Version License

Repo Status

UA-datasets is a collection of Ukrainian language datasets. Our aim is to build a benchmark for research related to natural language processing in Ukrainian.

This library is provided by FIdo.ai (machine learning research division of the non-profit student's organization FIdo, National University of Kyiv-Mohyla Academy) for research purposes.

Installation

The library can be installed from PyPi in your virtual environment (e.g. venv, conda env)

pip install ua_datasets

Latest Updates

05.07.22 - Added HuggingFace API for Q&A (UA-SQuAD) and Text Classification (UA-News) datasets

Available Datasets

  • Question Answering (UA-SQuAD)
  • Text Classification (UA-News)
  • Token Classification (Mova Institute Part of Speech)

Contribution

In case you are willing to contribute (update any part of the library, add your dataset) do not hesitate to connect through GitHub Issue. Thanks in advance for your contribution! Let's make the Ukrainian language even greater!

Citation

@software{ua_datasets_2021,
  author = {Ivanyuk-Skulskiy, Bogdan and Zaliznyi, Anton and Reshetar, Oleksand and Protsyk, Oleksiy and Romanchuk, Bohdan and Shpihanovych, Vladyslav},
  month = oct,
  title = {ua_datasets: a collection of Ukrainian language datasets},
  url = {https://github.com/fido-ai/ua-datasets},
  version = {0.0.1},
  year = {2021}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ua_datasets-0.1.2.tar.gz (9.3 kB view details)

Uploaded Source

Built Distribution

ua_datasets-0.1.2-py3-none-any.whl (9.9 kB view details)

Uploaded Python 3

File details

Details for the file ua_datasets-0.1.2.tar.gz.

File metadata

  • Download URL: ua_datasets-0.1.2.tar.gz
  • Upload date:
  • Size: 9.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for ua_datasets-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ab5191c0bff6d5cb24eaf38a8443b0145850a505089dc2accd7891335a7271a7
MD5 e0888aecc0c3a05c0a64a9aebd1271b0
BLAKE2b-256 5a62523be412d52a4479e6f1ed850a827f5c7be457d43e1456e2a9dddda8467a

See more details on using hashes here.

File details

Details for the file ua_datasets-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: ua_datasets-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for ua_datasets-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5ebb02e9a33a7ef9a2f2f5e48da08002ba60dc7b204f93e1d819760520c4df70
MD5 6802641aab10511e579cad2934483989
BLAKE2b-256 56f350718a5b5b471b9e4ef7fef1a20324f88b648318fb5e084cd453a50a18ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page