Datazets is a python package to import well known example data sets.
Project description
datazets
datazets
is Python package
Star this repo if you like it! ⭐️
pip install datazets
Import datazets
# Import library
import datazets as dz
# Import data set
df = dz.get('titanic')
Data sets:
Dataset Name | Shape Size | Type | Description |
---|---|---|---|
meta | (1472, 20) | Continuous | time |
bitcoin | (2522, 2) | Continuous | time |
iris | (150, 3) | Continuous | Classic flower classification dataset with iris species measurements with coordinates |
------------------------ | ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------- |
gas_prices | (6556, 2) | Mixed | time |
ads | (10000, 10) | Discrete | Data on online ads, covering click-through rates and targeting information |
sprinkler | (1000, 4) | Discrete | Synthetic dataset with binary variables for rain and sprinkler probability illustration |
random_discrete | (1000, 5) | Discrete | Synthetic dataset with random discrete variables, useful for probability modeling |
------------------------ | ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------- |
malicious_urls | (387588, 2) | Text | URLs labeled as malicious or benign, useful in cybersecurity |
malicious_phish | (651191, 4) | Text | URLs labeled as malicious or benign, defacement, phishing, malware (cybersecurity) |
------------------------ | ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------- |
stormofswords | (352, 3) | Network | Character data from A Storm of Swords, with relationships, traits, and alliance info |
bigbang | (9, 3) | Network | Data on The Big Bang Theory episodes and characters |
energy | (68, 3) | Network | Data on building energy consumption |
------------------------ | ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------- |
auto_mpg | (392, 8) | Mixed | Data on cars with features for predicting miles per gallon |
breast_cancer | (569, 30) | Mixed | Dataset for breast cancer diagnosis prediction using tumor cell features |
cancer | (4674, 9) | Mixed | Cancer patient data for classification and prediction of diagnosis outcome with Coordinates |
census_income | (32561, 15) | Mixed | US Census data with various demographic and economic factors for income prediction |
elections_rus | (94487, 23) | Mixed | Russian election data with demographic and political attributes |
elections_usa | (24611, 8) | Mixed | US election data with demographic and political attributes |
fifa | (128, 27) | Mixed | FIFA player stats including attributes like skill, position, country, and performance |
marketing_retail | (999, 8) | Mixed | Retail customer data for behavior and segmentation analysis |
predictive_maintenance | (10000, 14) | Mixed | Industrial equipment data for predictive maintenance |
student | (649, 33) | Mixed | Data on student performance with socio-demographic and academic factors |
surfspots | (9413, 4) | Mixed | latlon |
tips | (244, 7) | Mixed | Restaurant tipping data with variables on meal size, day, and tip amount |
titanic | (891, 12) | Mixed | Titanic passenger data with demographic, class, and survival information |
waterpump | (59400, 41) | Mixed | Water pump data with features for predicting functionality and maintenance needs |
------------------------ | ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------- |
cat_and_dog | None | Image | Images of cats and dogs for classification and object recognition |
digits | (1083, 65) | Image | Handwritten digit images (8x8 pixels) for recognition and classification |
faces | (400, 4097) | Image | Images of faces used in facial recognition and feature analysis |
flowers | None | Image | Various flower images for classification and image recognition |
img_peaks1 | (930, 930, 3) | Image | Synthetic peak images for image processing and analysis |
img_peaks2 | (125, 496, 3) | Image | Additional synthetic peak images for image processing |
mnist | (1797, 65) | Image | MNIST handwritten digit images (28x28 pixels) for classification tasks |
scenes | None | Image | Scene images for scene classification tasks |
southern_nebula | None | Image | Images of the Southern Nebula, suitable for astronomical analysis |
------------------------ | ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------- |
blobs | Custom | Continuous | Synthetic data of datapoints in blob shape |
moons | Custom | Continuous | Synthetic data of datapoints in moon shape |
circles | Custom | Continuous | Synthetic data of datapoints in circle shape |
anisotropic | Custom | Continuous | Synthetic data of datapoints with anisotropic shape |
globular | Custom | Continuous | Synthetic data of datapoints with globular shape |
uniform | Custom | Continuous | Synthetic data with uniform shape |
densities | Custom | Continuous | Synthetic data with different densities |
------------------------ | ---------------------- | --------------------- | ----------------------------------------------------------------------------------------------- |
Example:
import datazets as dz
df = dz.get(data='titanic')
import datazets as dz
# Import from url
url='https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data'
df = dz.get(url=url, sep=',')
Maintainer
- Erdogan Taskesen, github: erdogant
Contribute
- All kinds of contributions are welcome!
- If you wish to buy me a Coffee for this work, it is very appreciated :)
Licence
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
datazets-1.1.0.tar.gz
(14.9 kB
view details)
Built Distribution
datazets-1.1.0-py3-none-any.whl
(14.4 kB
view details)
File details
Details for the file datazets-1.1.0.tar.gz
.
File metadata
- Download URL: datazets-1.1.0.tar.gz
- Upload date:
- Size: 14.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27962c727f0c02f370153f183a81fc7e0b33277b95047324c60cae06bad15f99 |
|
MD5 | 88b149fd27fd4da6e02563e5edd5a7aa |
|
BLAKE2b-256 | 1658d629173d37b4b704656b82299568d182ea12fcb8ecf0eff1468d0703dbe0 |
File details
Details for the file datazets-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: datazets-1.1.0-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | edf21e39c480edcd80c0b1fc4b36f9546c5097a1aa7f9276e97f7e990f1424de |
|
MD5 | 1f7e487a7702a029283ea85bc51c9e73 |
|
BLAKE2b-256 | 4471b7012dee713198a598c836da8c446792ab39235714a389ac4db2417a6bca |