Skip to main content

A package to download datasets from Kaggle.

Project description

KaggleDownloader

KaggleDownloader is a Python class designed to interact with Kaggle, enabling users to authenticate, search, download, and extract datasets. The class can be used both interactively in Jupyter Notebooks or via the command line.

Prerequisites

Before using the KaggleDownloader class, make sure you have:

Section 1: Using KaggleDownloader in a Jupyter Notebook

You can import and use KaggleDownloader directly within a Jupyter Notebook. Below is a step-by-step guide to authenticate and download datasets using the class methods.

1.1 Example Code

import kaggle_downloader as kd  # Assuming you've saved the class in kaggle_downloader.py
import pandas as pd

# Initialize KaggleDownloader
downloader = kd.KaggleDownloader(api_token_path="./kaggle.json")

# Authenticate with Kaggle API
downloader.authenticate_kaggle()

# Search for datasets related to a theme
downloader.search_datasets("netflix")

# Download a specific dataset by its slug
downloader.download_dataset("shivamb/netflix-shows")

df = pd.read_csv("./netflix_titles.csv", delimiter=',', encoding="utf-8", encoding_errors="replace")
df.head()

1.2 Available Methods

  • get_api_token_path(): Returns the path to the primary Kaggle API token file.
  • get_alternative_token_path(): Returns the path to the alternative Kaggle API token file.
  • get_path_downloads(): Returns the download directory path.
  • set_api_token_path(new_path): Sets a new path for the Kaggle API token.
  • set_alternative_token_path(new_path): Sets a new path for the alternative Kaggle API token.
  • set_path_downloads(new_path): Sets a new path for downloaded datasets.
  • authenticate_kaggle(): Authenticates with the Kaggle API by loading credentials from the token file.
  • authenticate_with_credentials(): Prompts the user to manually enter Kaggle credentials and saves them to a file.
  • search_datasets(dataset_theme): Searches Kaggle for datasets matching a given keyword or theme.
  • download_dataset(dataset_slug): Downloads a dataset from Kaggle to the specified directory.
  • extract_zip(zip_file): Extracts a downloaded zip file to the download directory.
  • check_kaggle_json(): Checks if the Kaggle API token file exists at either the primary or alternative path.
  • create_download_directory(path): Creates the directory where datasets will be saved, if it doesn't already exist.

Section 2: Using KaggleDownloader via Command-Line Interface (CLI)

Alternatively, you can use the KaggleDownloader class via the command line. The main() method allows users to run the class and download datasets by specifying the dataset slug as an argument.

2.1 Example CLI Usage

  1. First, make sure your script is executable:

    chmod +x kaggle_downloader.py
    
  2. Use the following command to download a dataset from Kaggle:

    python kaggle_downloader_package/kaggle_downloader.py benroshan/ecommerce-data
    

This will authenticate with Kaggle (based on your kaggle.json token file) and download the dataset to the directory specified in path_downloads (or the current working directory by default).

2.2 CLI Arguments

  • dataset_slug: The Kaggle dataset identifier (slug) that you want to download, e.g., benroshan/ecommerce-data.

Notes

  • Ensure you have a Kaggle API token in place (kaggle.json).
  • You can specify alternative token paths in case the default one isn't used.
  • If you prefer to manually upload your Kaggle username and key, our KaggleDownloader will ask for them as it cannot find the kaggle.json file.
  • Large datasets will be automatically unzipped if downloaded as zip files.
  • The CLI interface will parse arguments and invoke necessary functions for a seamless experience.

Customization:

  • Replace kaggle_downloader.py with the actual file name if different.
  • Adjust the class import path (from kaggle_downloader import KaggleDownloader) if you organize your code differently.

Contributing

Feel free to contribute to this project by submitting issues, feature requests, or pull requests on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Mariano Gobea Alcoba
Email: gobeamariano@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaggle_downloader_package-0.1.7.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file kaggle_downloader_package-0.1.7.tar.gz.

File metadata

  • Download URL: kaggle_downloader_package-0.1.7.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.3 Darwin/23.5.0

File hashes

Hashes for kaggle_downloader_package-0.1.7.tar.gz
Algorithm Hash digest
SHA256 2551dc9fc148bd94d561822f271732c864f9484af168b807d772ab400544de3a
MD5 7e740e5b24d6d28cb0e2cb233d2f808a
BLAKE2b-256 ef0ec84a9fd17d5f17e14f2887e81f820c15a86e2e20ab42912aa9d89775f91c

See more details on using hashes here.

File details

Details for the file kaggle_downloader_package-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for kaggle_downloader_package-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 18e523fc86a5cfcf081df423ed2429805d1bf4e0ec6060bf897d59ba1fd973a3
MD5 fc585c63b48abda5fd8b3eab27f38ee4
BLAKE2b-256 bfe841087ecf466101f1eb03d448d3ada2b33b86a417f5c2c98e5db415617110

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page