A collection of ukrainian language datasets
Project description
ua_datasets
UA-datasets is a collection of Ukrainian language datasets. Our aim is to build a benchmark for research related to natural language processing in Ukrainian.
This library is provided by FIdo.ai (machine learning research division of the non-profit student's organization FIdo, National University of Kyiv-Mohyla Academy) for research purposes.
Installation
The library can be installed from PyPi in your virtual environment (e.g. venv, conda env)
pip install ua_datasets
Latest Updates
05.07.22 - Added HuggingFace API for Q&A (UA-SQuAD) and Text Classification (UA-News) datasets
Available Datasets
- Question Answering (UA-SQuAD)
- Text Classification (UA-News)
- Token Classification (Mova Institute Part of Speech)
Contribution
In case you are willing to contribute (update any part of the library, add your dataset) do not hesitate to connect through GitHub Issue. Thanks in advance for your contribution! Let's make the Ukrainian language even greater!
Citation
@software{ua_datasets_2021,
author = {Ivanyuk-Skulskiy, Bogdan and Zaliznyi, Anton and Reshetar, Oleksand and Protsyk, Oleksiy and Romanchuk, Bohdan and Shpihanovych, Vladyslav},
month = oct,
title = {ua_datasets: a collection of Ukrainian language datasets},
url = {https://github.com/fido-ai/ua-datasets},
version = {0.0.1},
year = {2021}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ua_datasets-0.1.2.tar.gz
.
File metadata
- Download URL: ua_datasets-0.1.2.tar.gz
- Upload date:
- Size: 9.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab5191c0bff6d5cb24eaf38a8443b0145850a505089dc2accd7891335a7271a7 |
|
MD5 | e0888aecc0c3a05c0a64a9aebd1271b0 |
|
BLAKE2b-256 | 5a62523be412d52a4479e6f1ed850a827f5c7be457d43e1456e2a9dddda8467a |
File details
Details for the file ua_datasets-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: ua_datasets-0.1.2-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ebb02e9a33a7ef9a2f2f5e48da08002ba60dc7b204f93e1d819760520c4df70 |
|
MD5 | 6802641aab10511e579cad2934483989 |
|
BLAKE2b-256 | 56f350718a5b5b471b9e4ef7fef1a20324f88b648318fb5e084cd453a50a18ac |