Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.
Project description
ucimlrepo
package
Package to easily import datasets from the UC Irvine Machine Learning Repository into scripts and notebooks.
Current Version: 0.0.7
Installation
In a Jupyter notebook, install with the command
!pip3 install -U ucimlrepo
Restart the kernel and import the module ucimlrepo
.
Example Usage
from ucimlrepo import fetch_ucirepo, list_available_datasets
# check which datasets can be imported
list_available_datasets()
# import dataset
heart_disease = fetch_ucirepo(id=45)
# alternatively: fetch_ucirepo(name='Heart Disease')
# access data
X = heart_disease.data.features
y = heart_disease.data.targets
# train model e.g. sklearn.linear_model.LinearRegression().fit(X, y)
# access metadata
print(heart_disease.metadata.uci_id)
print(heart_disease.metadata.num_instances)
print(heart_disease.metadata.additional_info.summary)
# access variable info in tabular format
print(heart_disease.variables)
fetch_ucirepo
Loads a dataset from the UCI ML Repository, including the dataframes and metadata information.
Parameters
Provide either a dataset ID or name as keyword (named) arguments. Cannot accept both.
id
: Dataset ID for UCI ML Repositoryname
: Dataset name, or substring of name
Returns
dataset
data
: Contains dataset matrices as pandas dataframesids
: Dataframe of ID columnsfeatures
: Dataframe of feature columnstargets
: Dataframe of target columnsoriginal
: Dataframe consisting of all IDs, features, and targetsheaders
: List of all variable names/headers
metadata
: Contains metadata information about the dataset- See Metadata section below for details
variables
: Contains variable details presented in a tabular/dataframe formatname
: Variable namerole
: Whether the variable is an ID, feature, or targettype
: Data type e.g. categorical, integer, continuousdemographic
: Indicates whether the variable represents demographic datadescription
: Short description of variableunits
: variable units for non-categorical datamissing_values
: Whether there are missing values in the variable's column
list_available_datasets
Prints a list of datasets that can be imported via fetch_ucirepo
Parameters
filter
: Optional keyword argument to filter available datasets based on a category- Valid filters:
aim-ahead
- Valid filters:
search
: Optional keyword argument to search datasets whose name contains the search query
Returns
none
Metadata
uci_id
: Unique dataset identifier for UCI repositoryname
abstract
: Short description of datasetarea
: Subject area e.g. life science, businesstask
: Associated machine learning tasks e.g. classification, regressioncharacteristics
: Dataset types e.g. multivariate, sequentialnum_instances
: Number of rows or samplesnum_features
: Number of feature columnsfeature_types
: Data types of featurestarget_col
: Name of target column(s)index_col
: Name of index column(s)has_missing_values
: Whether the dataset contains missing valuesmissing_values_symbol
: Indicates what symbol represents the missing entries (if the dataset has missing values)year_of_dataset_creation
dataset_doi
: DOI registered for dataset that links to UCI repo dataset pagecreators
: List of dataset creator namesintro_paper
: Information about dataset's published introductory paperrepository_url
: Link to dataset webpage on the UCI repositorydata_url
: Link to raw data fileadditional_info
: Descriptive free text about datasetsummary
: General summarypurpose
: For what purpose was the dataset created?funding
: Who funded the creation of the dataset?instances_represent
: What do the instances in this dataset represent?recommended_data_splits
: Are there recommended data splits?sensitive_data
: Does the dataset contain data that might be considered sensitive in any way?preprocessing_description
: Was there any data preprocessing performed?variable_info
: Additional free text description for variablescitation
: Citation Requests/Acknowledgements
external_url
: URL to external dataset page. This field will only exist for linked datasets i.e. not hosted by UCI
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ucimlrepo-0.0.7.tar.gz
.
File metadata
- Download URL: ucimlrepo-0.0.7.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cff3f9e814367dd60956da999ace473197237b9fce4c07e9a689e77b4ffb59a |
|
MD5 | e4d228c4b01fcea87d2a3a13afa877ef |
|
BLAKE2b-256 | 877cf5a400cc99a5365d153609ebf803084f78b4638b0f7925aa31d9abb62b8e |
File details
Details for the file ucimlrepo-0.0.7-py3-none-any.whl
.
File metadata
- Download URL: ucimlrepo-0.0.7-py3-none-any.whl
- Upload date:
- Size: 8.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a5ce7e21d7ec850a0da4427c47f9dd96fcc6532f1c7e95dcec63eeb40f08026 |
|
MD5 | 0d2573e037a2139365385e8588dbde52 |
|
BLAKE2b-256 | 3b071252560194df2b4fad1cb3c46081b948331c63eb1bb0b97620d508d12a53 |