Skip to main content

A package to download datasets from Kaggle.

Project description

KaggleDownloader

KaggleDownloader is a Python class designed to interact with Kaggle, enabling users to authenticate, search, download, and extract datasets. The class can be used both interactively in Jupyter Notebooks or via the command line.

Prerequisites

Before using the KaggleDownloader class, make sure you have:

Section 1: Using KaggleDownloader in a Jupyter Notebook

You can import and use KaggleDownloader directly within a Jupyter Notebook. Below is a step-by-step guide to authenticate and download datasets using the class methods.

1.1 Example Code

import kaggle_downloader as kd  # Assuming you've saved the class in kaggle_downloader.py
import pandas as pd

# Initialize KaggleDownloader
downloader = kd.KaggleDownloader(api_token_path="./kaggle.json")

# Authenticate with Kaggle API
downloader.authenticate_kaggle()

# Search for datasets related to a theme
downloader.search_datasets("netflix")

# Download a specific dataset by its slug
downloader.download_dataset("shivamb/netflix-shows")

df = pd.read_csv("./netflix_titles.csv", delimiter=',', encoding="utf-8", encoding_errors="replace")
df.head()

1.2 Available Methods

  • get_api_token_path(): Returns the path to the primary Kaggle API token file.
  • get_alternative_token_path(): Returns the path to the alternative Kaggle API token file.
  • get_path_downloads(): Returns the download directory path.
  • set_api_token_path(new_path): Sets a new path for the Kaggle API token.
  • set_alternative_token_path(new_path): Sets a new path for the alternative Kaggle API token.
  • set_path_downloads(new_path): Sets a new path for downloaded datasets.
  • authenticate_kaggle(): Authenticates with the Kaggle API by loading credentials from the token file.
  • authenticate_with_credentials(): Prompts the user to manually enter Kaggle credentials and saves them to a file.
  • search_datasets(dataset_theme): Searches Kaggle for datasets matching a given keyword or theme.
  • download_dataset(dataset_slug): Downloads a dataset from Kaggle to the specified directory.
  • extract_zip(zip_file): Extracts a downloaded zip file to the download directory.
  • check_kaggle_json(): Checks if the Kaggle API token file exists at either the primary or alternative path.
  • create_download_directory(path): Creates the directory where datasets will be saved, if it doesn't already exist.

Section 2: Using KaggleDownloader via Command-Line Interface (CLI)

Alternatively, you can use the KaggleDownloader class via the command line. The main() method allows users to run the class and download datasets by specifying the dataset slug as an argument.

2.1 Example CLI Usage

  1. First, make sure your script is executable:

    chmod +x kaggle_downloader.py
    
  2. Use the following command to download a dataset from Kaggle:

    python kaggle_downloader_package/kaggle_downloader.py benroshan/ecommerce-data
    

This will authenticate with Kaggle (based on your kaggle.json token file) and download the dataset to the directory specified in path_downloads (or the current working directory by default).

2.2 CLI Arguments

  • dataset_slug: The Kaggle dataset identifier (slug) that you want to download, e.g., benroshan/ecommerce-data.

Notes

  • Ensure you have a Kaggle API token in place (kaggle.json).
  • You can specify alternative token paths in case the default one isn't used.
  • If you prefer to manually upload your Kaggle username and key, our KaggleDownloader will ask for them as it cannot find the kaggle.json file.
  • Large datasets will be automatically unzipped if downloaded as zip files.
  • The CLI interface will parse arguments and invoke necessary functions for a seamless experience.

Customization:

  • Replace kaggle_downloader.py with the actual file name if different.
  • Adjust the class import path (from kaggle_downloader import KaggleDownloader) if you organize your code differently.

Contributing

Feel free to contribute to this project by submitting issues, feature requests, or pull requests on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Mariano Gobea Alcoba
Email: gobeamariano@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaggle_downloader_package-0.1.6.tar.gz (5.3 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file kaggle_downloader_package-0.1.6.tar.gz.

File metadata

  • Download URL: kaggle_downloader_package-0.1.6.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/23.5.0

File hashes

Hashes for kaggle_downloader_package-0.1.6.tar.gz
Algorithm Hash digest
SHA256 61f4aee7c3c860a9753a36e97dfa07921e0635b7cf7ee8ca82074c0185a36230
MD5 32b9177bb793ca50fedf239aab50ddac
BLAKE2b-256 ff6f5d1fe733bd7fd337cb5985085c95824c77464dcae1138e5818db09eec9d0

See more details on using hashes here.

File details

Details for the file kaggle_downloader_package-0.1.6-py3-none-any.whl.

File metadata

File hashes

Hashes for kaggle_downloader_package-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b15908a1da75c86eb0bf3517400ab954a904f649b7dd11c3ad3bebdf6e843aa5
MD5 9e356300d55c0b87c5abc16b45949a7e
BLAKE2b-256 e34f0ce4c72af8a15310dedd80b452ef28e4f6033d4452e01973732deb09dd79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page