Skip to main content

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Project description

KaggleEase

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Open In Colab


Installation

pip install kaggleease

Quick Start

from kaggleease import load
df = load("titanic")

Advanced Features

Progress Indication

Large downloads show progress bars automatically:

from kaggleease import load
df = load("large-dataset", timeout=600)  # Progress shown for large files

Thread-Safe Authentication

Multiple concurrent operations are supported safely:

import threading
from kaggleease import load

def load_dataset(dataset):
    return load(dataset)

# Safe to run concurrently
threads = [threading.Thread(target=lambda: load_dataset("dataset")) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

Retry Logic with Exponential Backoff

Network failures are automatically retried:

  • First retry after 1s
  • Second retry after 2s
  • Third retry after 4s

Notebook Magic

KaggleEase comes with powerful notebook magics:

# Load the titanic dataset into a variable named 'df'
%kaggle load titanic --as df --timeout 600

# Preview the first few rows of a dataset
%kaggle preview titanic --timeout 30

# Search for datasets
%kaggle search "credit risk" --timeout 30

# Load specific file from dataset
%kaggle load kaggle/titanic --file train.csv --as train_df

Command Line Interface

KaggleEase provides a comprehensive CLI:

# Load a dataset with custom timeout
kaggleease load kaggle/titanic --timeout 600

# Preview a dataset
kaggleease preview kaggle/titanic

# Search for datasets with result limit
kaggleease search "credit risk" --top 10

# Load specific file from dataset
kaggleease load kaggle/titanic --file train.csv

API Reference

load(dataset_handle, file=None, timeout=300)

Load a Kaggle dataset into a pandas DataFrame with automatic progress indication for large files.

Parameters:

  • dataset_handle (str): The Kaggle dataset handle in the format "owner/dataset-name"
  • file (str, optional): Specific file to load from the dataset
  • timeout (int): Timeout in seconds for API calls and downloads (default: 300)

Returns:

  • pandas.DataFrame: The loaded dataset

Features:

  • Automatic progress indication for files > 100MB
  • Thread-safe authentication
  • Retry logic with exponential backoff
  • Structured logging

search(query, top=5, timeout=30)

Search for Kaggle datasets.

Parameters:

  • query (str): Search query
  • top (int): Maximum number of results to return (default: 5)
  • timeout (int): Timeout in seconds for the search operation (default: 30)

Returns:

  • list: List of dataset information dictionaries

ProgressBar and show_progress

Progress indication utilities for large downloads:

from kaggleease.progress import ProgressBar

progress = ProgressBar(1000, "Download")
progress.update(500)  # Update with downloaded bytes
progress.complete()

Troubleshooting

Authentication Error

If you encounter authentication errors, make sure you have your Kaggle API credentials set up:

  1. Go to https://www.kaggle.com/account and download your kaggle.json file
  2. Place it in ~/.kaggle/kaggle.json (or %USERPROFILE%\.kaggle\kaggle.json on Windows)
  3. Ensure the file has restricted permissions (600)

Large Dataset Warning

When loading large datasets (>5GB), you'll see a warning. Consider if you really need the entire dataset or if you can work with a subset.

Timeout Errors

If you're experiencing timeout errors, try increasing the timeout value:

# Increase timeout to 600 seconds
df = load("titanic", timeout=600)

Progress Indication

For large downloads, progress is shown automatically. For manual control:

from kaggleease.progress import show_progress
progress = show_progress(250, 1000, "Download")  # 250 bytes of 1000 total

Thread Safety

The library is thread-safe for concurrent operations. Multiple threads can safely call load() simultaneously.

Security Considerations

  • Authentication credentials are stored securely with proper file permissions (600)
  • Thread-safe authentication prevents credential corruption in concurrent environments
  • All network operations include timeout protection

(GIF placeholder: A short animation showing the %kaggle load titanic magic in action)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaggleease-1.0.0.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaggleease-1.0.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file kaggleease-1.0.0.tar.gz.

File metadata

  • Download URL: kaggleease-1.0.0.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9e00e6ffed1e4306297aaa37a57dfbbe148a405973e55f69985e594cb74e0238
MD5 c1e5405a9a2e67a3ab863048943e32fc
BLAKE2b-256 c7e1e52e1a97e0da20e640f7b7ab24b75b3959f0662057a10011942b8f56a688

See more details on using hashes here.

File details

Details for the file kaggleease-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: kaggleease-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bd844dd58d760e179238398ed90453ec5da05bf00368ebe95d33f447d81d5368
MD5 d3ec86077a2caec9ef0782a843bfe3e3
BLAKE2b-256 97af4e45e329f8432ce08edda1c4552d0ff39648027f98e9178a3708e0e2d948

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page