Skip to main content

KeelDS is a package to load some datasets from KEEL repository with some normalizations and with split and discretization options.

Project description

KeelDS

KeelDS: A Python package for loading datasets from KEEL repository

KeelDS is a Python package that provides easy access to datasets from the KEEL repository, a popular source for machine learning datasets. This package simplifies the process of loading KEEL datasets, offering options for cross-validation and discretization.

Features

  • Load KEEL datasets with a single line of code
  • Access datasets pre-split into train and test sets
  • Discretization option using the Fayyad algorithm (MDLP)
  • Support for both balanced and imbalanced datasets
  • Easy integration with machine learning workflows

Installation

Dependencies

  • Python (>= 3.12)
  • pandas (>= 2.2.2)

You can install KeelDS using pip:

pip install keel-ds

Usage

Here's a simple example of how to use KeelDS with a machine learning model:

from keel_ds import load_data
import numpy as np
from catboost import CatBoostClassifier

file_name = 'iris'
folds = load_data(file_name)

evaluations = []
for x_train, y_train, x_test, y_test in folds:
    model = CatBoostClassifier(verbose=False)
    model.fit(x_train, y_train)
    evaluation = model.score(x_test, y_test)
    evaluations.append(evaluation)

print(np.mean(evaluations))  # Output: 0.933333333333

API Reference

load_data(data, imbalanced=False, raw=False)

Load a dataset from the KEEL repository.

  • data (str): Name of the dataset to load
  • imbalanced (bool): If True, load from imbalanced datasets. Default is False.
  • raw (bool): If True, return the raw dataset. Default is False.

Returns a list of tuples (x_train, y_train, x_test, y_test) for each fold.

list_data()

List all available datasets.

Returns a dictionary with two keys: 'balanced' and 'imbalanced', each containing a list of available dataset names.

Contributing

Contributions to KeelDS are welcome! Please feel free to submit a Pull Request.

License

[Add license information here]

Contact

For any queries or issues, please open an issue on the GitHub repository.


This updated README provides a more comprehensive overview of the KeelDS package, including:

1. A clearer introduction and feature list
2. Updated installation instructions
3. A more detailed usage example
4. API reference for the main functions
5. Information about contributing and contact

You may want to add more sections or details based on your specific needs, such as a more detailed API reference, troubleshooting tips, or information about the dataset preprocessing steps. Also, don't forget to add the appropriate license information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keel_ds-0.1.19.tar.gz (26.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keel_ds-0.1.19-py3-none-any.whl (26.6 MB view details)

Uploaded Python 3

File details

Details for the file keel_ds-0.1.19.tar.gz.

File metadata

  • Download URL: keel_ds-0.1.19.tar.gz
  • Upload date:
  • Size: 26.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.8.0-41-generic

File hashes

Hashes for keel_ds-0.1.19.tar.gz
Algorithm Hash digest
SHA256 473b0f0d764709aa05eaf551565d908ab325da60d6f1159bb05c7292a5bfece2
MD5 c5350815c08f0584869e3492f8fdcb8a
BLAKE2b-256 275f1834ebe26da81a80ec2434d0dece7c74c7a50ddf4e4008804b13f71721e7

See more details on using hashes here.

File details

Details for the file keel_ds-0.1.19-py3-none-any.whl.

File metadata

  • Download URL: keel_ds-0.1.19-py3-none-any.whl
  • Upload date:
  • Size: 26.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.12.3 Linux/6.8.0-41-generic

File hashes

Hashes for keel_ds-0.1.19-py3-none-any.whl
Algorithm Hash digest
SHA256 c69e0015b5a2cd5a56d5ab2f5fbd1cda2d9e99080a16004099b74148a2ecc6ac
MD5 344dde719a64e60de65d7ae2b785098a
BLAKE2b-256 101c8675dc5306927d0ede2be15530de2064c9a4afed9dff155dc98f942b3775

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page