common-datasets

common_datasets

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Programming Language
- Python
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

common-datasets: common machine learning datasets

This package provides an unofficial collection of datasets widely used in the evaluation of machine learning techniques, mainly small and imbalanced datasets for binary, multiclass classification and regression. The datasets are provided in the usual sklearn.datasets format, with missing data imputation and the encoding of category and ordinal features. The authors of this repository do not own any licenses for the datasets, the goal of the project is to provide a stanardized collection of datasets for research purposes.

PLEASE DO NOT CITE OR REFER TO THIS PACKAGE IN ANY FORM!

If you use data through this repository, please cite the original works publishing and specifying these datasets:

@article{keel,
  author={Alcala-Fdez, J. and Fernandez, A. and Luengo, J. and Derrac, J. and Garcia, S.
          and Sanchez, L. and Herrera, F.},
  title={KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms
          and Experimental Analysis Framework},
  journal={Journal of Multiple-Valued Logic and Soft Computing},
  volume={17},
  number={2-3},
  year={2011},
  pages={255-287}}

@misc{uci,
  author = "Dua, Dheeru and Karra Taniskidou, Efi",
  year = "2017",
  title = "{UCI} Machine Learning Repository",
  url = "http://archive.ics.uci.edu/ml",
  institution = "University of California, Irvine, School of Information and Computer Sciences"}

@article{krnn,
  author={X. J. Zhang and Z. Tari and M. Cheriet},
  title={{KRNN}: k {Rare-class Nearest Neighbor} classification},
  journal={Pattern Recognition},
  year={2017},
  volume={62},
  number={2},
  pages={33--44}
  }

For each individual dataset the citation key referring to its publisher or a relevant publication in which the dataset in the given configuration has been used is provided as part of the dataset. For example:

# binary classification
>> import common_datasets.binary_classification as binclas

>> dataset = bin_clas.load_abalone19()
>> dataset['citation_key']
'keel'

Introduction

The package contains 119 binary classification, 23 multiclass classification and 23 regression datasets.

Installation

The package can be cloned from GitHub in the usual way, and the latest stable version is also available in the PyPI repository:

pip install common_datasets

Use cases

Loading a dataset

# binary classification
import common_datasets.binary_classification as binclas

dataset = binclas.load_abalone19()

# multiclass classification
import common_datasets.multiclass_classification as multclas

dataset = multclas.load_abalone()

# regression
from common_datasets import regression

dataset = regression.load_treasury()

Querying all dataset loaders and loading a dataset

# binary classification
import common_datasets.binary_classification as binclas

data_loaders = binclas.get_data_loaders()

dataset_0 = data_loaders[0]()

# multiclass classification
import common_datasets.multiclass_classification as multclas

data_loaders = multclas.get_data_loaders()

dataset_0 = data_loaders[0]()

# regression
from common_datasets import regression

data_loaders = regression.get_data_loaders()

dataset_0 = data_loaders[0]()

Querying the loaders of the 5 smallest datasets regarding the total number of records

# binary classification
import common_datasets.binary_classification as binclas

data_loaders = binclas.get_filtered_data_loaders(n_smallest=5, sorting='n')

dataset_0 = data_loaders[0]()

# multiclass classification
import common_datasets.multiclass_classification as multclas

data_loaders = multclas.get_data_loaders(n_smallest=5, sorting='n')

dataset_0 = data_loaders[0]()

# regression
from common_datasets import regression

data_loaders = regression.get_data_loaders(n_smallest=5, sorting='n')

dataset_0 = data_loaders[0]()

Documentation

For a detailed documentation and parameters of the functions see http://common_datasets.readthedocs.io.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
License
- OSI Approved :: MIT License
Programming Language
- Python
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.3.10

Nov 18, 2023

0.3.9

Nov 18, 2023

0.3.8

Aug 17, 2023

0.3.6

Aug 17, 2023

This version

0.3.5

Feb 19, 2023

0.3.4

Jan 9, 2023

0.3.3

Jan 4, 2023

0.3.2

Dec 5, 2022

0.3.0

Dec 2, 2022

0.2.8

Aug 28, 2022

0.2.7

Aug 27, 2022

0.2.6

Aug 23, 2022

0.2.5

Aug 22, 2022

0.2.4

Aug 21, 2022

0.2.3

Aug 20, 2022

0.2.2

Aug 19, 2022

0.2.1

Aug 19, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

common_datasets-0.3.5.tar.gz (14.6 MB view details)

Uploaded Feb 19, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

common_datasets-0.3.5-py3-none-any.whl (15.4 MB view details)

Uploaded Feb 19, 2023 Python 3

File details

Details for the file common_datasets-0.3.5.tar.gz.

File metadata

Download URL: common_datasets-0.3.5.tar.gz
Upload date: Feb 19, 2023
Size: 14.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for common_datasets-0.3.5.tar.gz
Algorithm	Hash digest
SHA256	`f6905b2755c0f154f7887ecd04e5c26bab9cbb6d4a7b8d152386d8c86eb0a00d`
MD5	`9d2d77af7c09a7c808071d94431f7ff6`
BLAKE2b-256	`f0407f3f422064402a4990aa437ce444f7da907add13642b2872f041d89f2ed7`

See more details on using hashes here.

File details

Details for the file common_datasets-0.3.5-py3-none-any.whl.

File metadata

Download URL: common_datasets-0.3.5-py3-none-any.whl
Upload date: Feb 19, 2023
Size: 15.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for common_datasets-0.3.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`104f2a25b0e1b577ac1e310e6f4564817b16f4758651d10012ffc1570caed064`
MD5	`0c7e67b824c1104e2e712dfb25becce3`
BLAKE2b-256	`cac65725eb997f06eba3b207eab3a070da6f5bac1f2b976b8b64eb5d28887928`

See more details on using hashes here.

common-datasets 0.3.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

common-datasets: common machine learning datasets

Introduction

Installation

Use cases

Loading a dataset

Querying all dataset loaders and loading a dataset

Querying the loaders of the 5 smallest datasets regarding the total number of records

Documentation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes