Skip to main content

Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.

Project description

ucimlrepo package

Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.
Current Version: 0.0.7

Installation

In a Jupyter notebook, install with the command

!pip3 install -U ucimlrepo 

Restart the kernel and import the module ucimlrepo.

Example Usage

from ucimlrepo import fetch_ucirepo, list_available_datasets

# check which datasets can be imported
list_available_datasets()

# import dataset
heart_disease = fetch_ucirepo(id=45)
# alternatively: fetch_ucirepo(name='Heart Disease')

# access data
X = heart_disease.data.features
y = heart_disease.data.targets
# train model e.g. sklearn.linear_model.LinearRegression().fit(X, y)

# access metadata
print(heart_disease.metadata.uci_id)
print(heart_disease.metadata.num_instances)
print(heart_disease.metadata.additional_info.summary)

# access variable info in tabular format
print(heart_disease.variables)

fetch_ucirepo

Loads a dataset from the UCI ML Repository, including the dataframes and metadata information.

Parameters

Provide either a dataset ID or name as keyword (named) arguments. Cannot accept both.

  • id: Dataset ID for UCI ML Repository
  • name: Dataset name, or substring of name

Returns

  • dataset
    • data: Contains dataset matrices as pandas dataframes
      • ids: Dataframe of ID columns
      • features: Dataframe of feature columns
      • targets: Dataframe of target columns
      • original: Dataframe consisting of all IDs, features, and targets
      • headers: List of all variable names/headers
    • metadata: Contains metadata information about the dataset
      • See Metadata section below for details
    • variables: Contains variable details presented in a tabular/dataframe format
      • name: Variable name
      • role: Whether the variable is an ID, feature, or target
      • type: Data type e.g. categorical, integer, continuous
      • demographic: Indicates whether the variable represents demographic data
      • description: Short description of variable
      • units: variable units for non-categorical data
      • missing_values: Whether there are missing values in the variable's column

list_available_datasets

Prints a list of datasets that can be imported via fetch_ucirepo

Parameters

  • filter: Optional keyword argument to filter available datasets based on a category
    • Valid filters: aim-ahead
  • search: Optional keyword argument to search datasets whose name contains the search query

Returns

none

Metadata

  • uci_id: Unique dataset identifier for UCI repository
  • name
  • abstract: Short description of dataset
  • area: Subject area e.g. life science, business
  • task: Associated machine learning tasks e.g. classification, regression
  • characteristics: Dataset types e.g. multivariate, sequential
  • num_instances: Number of rows or samples
  • num_features: Number of feature columns
  • feature_types: Data types of features
  • target_col: Name of target column(s)
  • index_col: Name of index column(s)
  • has_missing_values: Whether the dataset contains missing values
  • missing_values_symbol: Indicates what symbol represents the missing entries (if the dataset has missing values)
  • year_of_dataset_creation
  • dataset_doi: DOI registered for dataset that links to UCI repo dataset page
  • creators: List of dataset creator names
  • intro_paper: Information about dataset's published introductory paper
  • repository_url: Link to dataset webpage on the UCI repository
  • data_url: Link to raw data file
  • additional_info: Descriptive free text about dataset
    • summary: General summary
    • purpose: For what purpose was the dataset created?
    • funding: Who funded the creation of the dataset?
    • instances_represent: What do the instances in this dataset represent?
    • recommended_data_splits: Are there recommended data splits?
    • sensitive_data: Does the dataset contain data that might be considered sensitive in any way?
    • preprocessing_description: Was there any data preprocessing performed?
    • variable_info: Additional free text description for variables
    • citation: Citation Requests/Acknowledgements
  • external_url: URL to external dataset page. This field will only exist for linked datasets i.e. not hosted by UCI

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ucimlrepo-0.0.7.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file ucimlrepo-0.0.7.tar.gz.

File metadata

  • Download URL: ucimlrepo-0.0.7.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for ucimlrepo-0.0.7.tar.gz
Algorithm Hash digest
SHA256 4cff3f9e814367dd60956da999ace473197237b9fce4c07e9a689e77b4ffb59a
MD5 e4d228c4b01fcea87d2a3a13afa877ef
BLAKE2b-256 877cf5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e

See more details on using hashes here.

File details

Details for the file ucimlrepo-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: ucimlrepo-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.9

File hashes

Hashes for ucimlrepo-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 0a5ce7e21d7ec850a0da4427c47f9dd96fcc6532f1c7e95dcec63eeb40f08026
MD5 0d2573e037a2139365385e8588dbde52
BLAKE2b-256 3b071252560194df2b4fad1cb3c46081b948331c63eb1bb0b97620d508d12a53

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page