Skip to main content

A curated collection of datasets for data analysis & machine learning, downloadable with a single Python command

Project description

opendatasets

opendatasets is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.

Installation

Install the library using pip:

pip install opendatasets --upgrade

Usage - Downloading a dataset

Datasets can be downloaded within a Jupyter notebook or Python script using the opendatasets.download helper function. Here's some sample code for downloading the US Elections Dataset:

import opendatasets as od
dataset_url = 'https://www.kaggle.com/tunguz/us-elections-dataset'
od.download('https://www.kaggle.com/tunguz/us-elections-dataset')

dataset_url can also point to a public Google Drive link or a raw file URL.

Kaggle Credentials

opendatasets uses the Kaggle Official API for donwloading dataset from Kaggle. Follow these steps to find your API credentials:

  1. Sign in to https://kaggle.com/, then click on your profile picture on the top right and select "My Account" from the menu.

  2. Scroll down to the "API" section and click "Create New API Token". This will download a file kaggle.json with the following contents:

{"username":"YOUR_KAGGLE_USERNAME","key":"YOUR_KAGGLE_KEY"}
  1. When you run opendatsets.download, you will be asked to enter your username & Kaggle API, which you can get from the file downloaded in step 2.

Note that you need to download the kaggle.json file only once. You can also place the kaggle.json file in the same directory as the Jupyter notebook, and the credentials will be read automatically.

Some interesting datasets

You can find interesting datasets on Kaggle: https://www.kaggle.com/datasets

You can also create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable)

Other sources to look for datasets:

If you use an external source other than Kaggle, you'll create a new dataset on Kaggle by uploading a CSV file here: https://www.kaggle.com/datasets?new=true (make sure to keep your dataset public, otherwise it will not be downloadable using opendatasets)

Curated Datasets

opendatasets also provides some curated datsets that you can download by passing the Dataset ID to opendatasets.download. Here's an example:

import opendatasets
opendatasets.download('stackoverflow-developer-survey-2020')

The following datasets are available for download.

Dataset ID Description Source
stackoverflow-developer-survey-2020 Stack Overflow Developer Survey 2020 Stack Overflow
owid-covid-19-latest Covid-19 Stats by Our World in Data Our World in Data
state-of-javascript-2016 State of Javascript Annual Survey 2016 StateOfJS
state-of-javascript-2017 State of Javascript Annual Survey 2017 StateOfJS
state-of-javascript-2018 State of Javascript Annual Survey 2018 StateOfJS
state-of-javascript-2019 State of Javascript Annual Survey 2019 StateOfJS
countries-languages-spoken Languages Spoken in Different Countries Infoplease

More datasets will be added soon..

Contributing

This is an open source project and we welcome contributions.

Local Development Setup

  1. Clone the repository:
git clone https://github.com/JovianML/opendatasets.git
  1. Setup the Python environment for development
conda create -n opendatasets python=3.5
conda activate opendatasets
pip install -r requirements.txt
  1. Open up the project in VS code and make your changes. Make sure to install the Python Extension for VS Code and select the opendatasets conda environment.

This package is developed and maintained by the Jovian team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opendatasets-0.1.22.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

opendatasets-0.1.22-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file opendatasets-0.1.22.tar.gz.

File metadata

  • Download URL: opendatasets-0.1.22.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.5.6

File hashes

Hashes for opendatasets-0.1.22.tar.gz
Algorithm Hash digest
SHA256 52b2e0c1cc80d9f44842e3373532d92683f7c0f5c3e72b3f1f3e2750d30da4db
MD5 64bb58a8f8892b8cd71c4430e6807b2c
BLAKE2b-256 1a09d833ab8037b6482166373fadd166067615b2f1c55df3f97c1f3657ee19ca

See more details on using hashes here.

File details

Details for the file opendatasets-0.1.22-py3-none-any.whl.

File metadata

  • Download URL: opendatasets-0.1.22-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.5.6

File hashes

Hashes for opendatasets-0.1.22-py3-none-any.whl
Algorithm Hash digest
SHA256 8d85a6d32fd7831672eddcae366a2488b5b5b5837433c4db02152e38b50e70c9
MD5 662e59283106d3702d4af4ddb0825105
BLAKE2b-256 00e712300c2f886b846375c78a4f32c0ae1cd20bdcf305b5ac45b8d7eceda3ec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page