Skip to main content

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Project description

KaggleEase

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Open In Colab


Installation

pip install kaggleease

Quick Start

from kaggleease import load
# Intelligent handle resolution - searches and finds 'kaggle/titanic' for you
df = load("titanic") 

✨ New in v1.3.0: Intelligent Loader

KaggleEase is now "Intelligent." It doesn't just fail; it helps you fix your mistakes.

  • Fuzzy Handle Matching: Made a typo? load("titanik") will suggest titanic.
  • Implicit Slugs: No need for owners. load("housing") finds the most relevant dataset.
  • Actionable Errors: In notebooks, errors appear as clean Markdown blocks with 💡 Fix Suggestions.
  • Zero-Dependency REST: Purged the heavy kaggle library (~10MB) for a lightweight 50KB core.

Usage in Google Colab

1. Install the package

In a Colab cell, run:

!pip install kaggleease

2. Use the module

from kaggleease import load

df = load("titanic")
print(df.head())

3. Magic commands in Colab

# Load dataset into a variable named 'df'
%kaggle load titanic --as df

# Preview a dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"

Usage in Local Jupyter Notebooks

1. Install the package

pip install kaggleease

2. Set up Kaggle credentials

  • Go to https://www.kaggle.com/account
  • Download your kaggle.json file
  • Place it in ~/.kaggle/kaggle.json (or %USERPROFILE%\.kaggle\kaggle.json on Windows)
  • Set file permissions to 600 (read/write for owner only)

3. Use the module

from kaggleease import load

df = load("titanic")
print(df.head())

4. Magic commands in Jupyter

# Load dataset into a specific variable
%kaggle load titanic --as df

# Preview dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"

Advanced Features

Progress Indication

Large downloads show progress bars automatically:

from kaggleease import load
df = load("large-dataset", timeout=600)  # Progress shown for large files

Thread-Safe Authentication

Multiple concurrent operations are supported safely:

import threading
from kaggleease import load

def load_dataset(dataset):
    return load(dataset)

# Safe to run concurrently
threads = [threading.Thread(target=lambda: load_dataset("dataset")) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

Retry Logic with Exponential Backoff

Network failures are automatically retried:

  • First retry after 1s
  • Second retry after 2s
  • Third retry after 4s

Notebook Magic

KaggleEase comes with powerful notebook magics:

# Load the titanic dataset into a variable named 'df'
%kaggle load titanic --as df --timeout 600

# Preview the first few rows of a dataset
%kaggle preview titanic --timeout 30

# Search for datasets
%kaggle search "credit risk" --timeout 30

# Load specific file from dataset
%kaggle load kaggle/titanic --file train.csv --as train_df

Command Line Interface

KaggleEase provides a comprehensive CLI:

# Load a dataset with custom timeout
kaggleease load kaggle/titanic --timeout 600

# Preview a dataset
kaggleease preview kaggle/titanic

# Search for datasets with result limit
kaggleease search "credit risk" --top 10

# Load specific file from dataset
kaggleease load kaggle/titanic --file train.csv

# 🚀 NEW: Setup shell completion
kaggleease completion --shell zsh # or bash/fish

API Reference

load(dataset_handle, file=None, timeout=300)

Load a Kaggle dataset into a pandas DataFrame with automatic progress indication for large files.

Parameters:

  • dataset_handle (str): The Kaggle dataset handle in the format "owner/dataset-name"
  • file (str, optional): Specific file to load from the dataset
  • timeout (int): Timeout in seconds for API calls and downloads (default: 300)

Returns:

  • pandas.DataFrame: The loaded dataset

Features:

  • Automatic progress indication for files > 100MB
  • Thread-safe authentication
  • Retry logic with exponential backoff
  • Structured logging

search(query, top=5, timeout=30)

Search for Kaggle datasets.

Parameters:

  • query (str): Search query
  • top (int): Maximum number of results to return (default: 5)
  • timeout (int): Timeout in seconds for the search operation (default: 30)

Returns:

  • list: List of dataset information dictionaries

ProgressBar and show_progress

Progress indication utilities for large downloads:

from kaggleease.progress import ProgressBar

progress = ProgressBar(1000, "Download")
progress.update(500)  # Update with downloaded bytes
progress.complete()

Troubleshooting

Authentication Error

If you encounter authentication errors, make sure you have your Kaggle API credentials set up:

  1. Go to https://www.kaggle.com/account and download your kaggle.json file
  2. Place it in ~/.kaggle/kaggle.json (or %USERPROFILE%\.kaggle\kaggle.json on Windows)
  3. Ensure the file has restricted permissions (600)

Large Dataset Warning

When loading large datasets (>5GB), you'll see a warning. Consider if you really need the entire dataset or if you can work with a subset.

Timeout Errors

If you're experiencing timeout errors, try increasing the timeout value:

# Increase timeout to 600 seconds
df = load("titanic", timeout=600)

Progress Indication

For large downloads, progress is shown automatically. For manual control:

from kaggleease.progress import show_progress
progress = show_progress(250, 1000, "Download")  # 250 bytes of 1000 total

Thread Safety

The library is thread-safe for concurrent operations. Multiple threads can safely call load() simultaneously.

Security Considerations

  • Authentication credentials are stored securely with proper file permissions (600)
  • Thread-safe authentication prevents credential corruption in concurrent environments
  • All network operations include timeout protection

(GIF placeholder: A short animation showing the %kaggle load titanic magic in action)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaggleease-1.3.6.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaggleease-1.3.6-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file kaggleease-1.3.6.tar.gz.

File metadata

  • Download URL: kaggleease-1.3.6.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.3.6.tar.gz
Algorithm Hash digest
SHA256 02f7c260e579e37e6ec50d62317bfabde536f19fb6dab5c99232e43b0bd3af0b
MD5 49d7fd9351310fd38ce5e2ebdc582b2b
BLAKE2b-256 9a1c190415c9c2156ffb1dc0989f09fd63ec7990b8c752b387a90e138cfc4445

See more details on using hashes here.

File details

Details for the file kaggleease-1.3.6-py3-none-any.whl.

File metadata

  • Download URL: kaggleease-1.3.6-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.3.6-py3-none-any.whl
Algorithm Hash digest
SHA256 5e409243758e6e220e1eae4c19373c20c30c24e0052658cb64617f7ebd000fb5
MD5 3802f08dd20a2556a2f2bea2b3274a9a
BLAKE2b-256 4bdc25f84dfc9f258cbe56734333c1ff3ac796608ccb82c532b9240335c9161a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page