Skip to main content

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Project description

KaggleEase

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Open In Colab


Installation

pip install kaggleease

Quick Start

from kaggleease import load
# Intelligent handle resolution - searches and finds 'kaggle/titanic' for you
df = load("titanic") 

✨ New in v1.3.0: Intelligent Loader

KaggleEase is now "Intelligent." It doesn't just fail; it helps you fix your mistakes.

  • Fuzzy Handle Matching: Made a typo? load("titanik") will suggest titanic.
  • Implicit Slugs: No need for owners. load("housing") finds the most relevant dataset.
  • Actionable Errors: In notebooks, errors appear as clean Markdown blocks with 💡 Fix Suggestions.
  • Zero-Dependency REST: Purged the heavy kaggle library (~10MB) for a lightweight 50KB core.

Usage in Google Colab

1. Install the package

In a Colab cell, run:

!pip install kaggleease

2. Use the module

from kaggleease import load

df = load("titanic")
print(df.head())

3. Magic commands in Colab

# Load dataset into a variable named 'df'
%kaggle load titanic --as df

# Preview a dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"

Usage in Local Jupyter Notebooks

1. Install the package

pip install kaggleease

2. Set up Kaggle credentials

  • Go to https://www.kaggle.com/account
  • Download your kaggle.json file
  • Place it in ~/.kaggle/kaggle.json (or %USERPROFILE%\.kaggle\kaggle.json on Windows)
  • Set file permissions to 600 (read/write for owner only)

3. Use the module

from kaggleease import load

df = load("titanic")
print(df.head())

4. Magic commands in Jupyter

# Load dataset into a specific variable
%kaggle load titanic --as df

# Preview dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"

Advanced Features

Progress Indication

Large downloads show progress bars automatically:

from kaggleease import load
df = load("large-dataset", timeout=600)  # Progress shown for large files

Thread-Safe Authentication

Multiple concurrent operations are supported safely:

import threading
from kaggleease import load

def load_dataset(dataset):
    return load(dataset)

# Safe to run concurrently
threads = [threading.Thread(target=lambda: load_dataset("dataset")) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

Retry Logic with Exponential Backoff

Network failures are automatically retried:

  • First retry after 1s
  • Second retry after 2s
  • Third retry after 4s

Notebook Magic

KaggleEase comes with powerful notebook magics:

# Load the titanic dataset into a variable named 'df'
%kaggle load titanic --as df --timeout 600

# Preview the first few rows of a dataset
%kaggle preview titanic --timeout 30

# Search for datasets
%kaggle search "credit risk" --timeout 30

# Load specific file from dataset
%kaggle load kaggle/titanic --file train.csv --as train_df

Command Line Interface

KaggleEase provides a comprehensive CLI:

# Load a dataset with custom timeout
kaggleease load kaggle/titanic --timeout 600

# Preview a dataset
kaggleease preview kaggle/titanic

# Search for datasets with result limit
kaggleease search "credit risk" --top 10

# Load specific file from dataset
kaggleease load kaggle/titanic --file train.csv

# 🚀 NEW: Setup shell completion
kaggleease completion --shell zsh # or bash/fish

API Reference

load(dataset_handle, file=None, timeout=300)

Load a Kaggle dataset into a pandas DataFrame with automatic progress indication for large files.

Parameters:

  • dataset_handle (str): The Kaggle dataset handle in the format "owner/dataset-name"
  • file (str, optional): Specific file to load from the dataset
  • timeout (int): Timeout in seconds for API calls and downloads (default: 300)

Returns:

  • pandas.DataFrame: The loaded dataset

Features:

  • Automatic progress indication for files > 100MB
  • Thread-safe authentication
  • Retry logic with exponential backoff
  • Structured logging

search(query, top=5, timeout=30)

Search for Kaggle datasets.

Parameters:

  • query (str): Search query
  • top (int): Maximum number of results to return (default: 5)
  • timeout (int): Timeout in seconds for the search operation (default: 30)

Returns:

  • list: List of dataset information dictionaries

ProgressBar and show_progress

Progress indication utilities for large downloads:

from kaggleease.progress import ProgressBar

progress = ProgressBar(1000, "Download")
progress.update(500)  # Update with downloaded bytes
progress.complete()

Troubleshooting

Authentication Error

If you encounter authentication errors, make sure you have your Kaggle API credentials set up:

  1. Go to https://www.kaggle.com/account and download your kaggle.json file
  2. Place it in ~/.kaggle/kaggle.json (or %USERPROFILE%\.kaggle\kaggle.json on Windows)
  3. Ensure the file has restricted permissions (600)

Large Dataset Warning

When loading large datasets (>5GB), you'll see a warning. Consider if you really need the entire dataset or if you can work with a subset.

Timeout Errors

If you're experiencing timeout errors, try increasing the timeout value:

# Increase timeout to 600 seconds
df = load("titanic", timeout=600)

Progress Indication

For large downloads, progress is shown automatically. For manual control:

from kaggleease.progress import show_progress
progress = show_progress(250, 1000, "Download")  # 250 bytes of 1000 total

Thread Safety

The library is thread-safe for concurrent operations. Multiple threads can safely call load() simultaneously.

Security Considerations

  • Authentication credentials are stored securely with proper file permissions (600)
  • Thread-safe authentication prevents credential corruption in concurrent environments
  • All network operations include timeout protection

(GIF placeholder: A short animation showing the %kaggle load titanic magic in action)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaggleease-1.3.2.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaggleease-1.3.2-py3-none-any.whl (20.0 kB view details)

Uploaded Python 3

File details

Details for the file kaggleease-1.3.2.tar.gz.

File metadata

  • Download URL: kaggleease-1.3.2.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.3.2.tar.gz
Algorithm Hash digest
SHA256 7cc2d15123be9302c77ce54f09f7b8f263ee2872409d797081e313baa88989f4
MD5 83150aee0a212fef993b96ea4aba3313
BLAKE2b-256 8e0a547ef01f6225218252b6bd7ce44e4ad9d08afc592b559835a1f8bdb2d777

See more details on using hashes here.

File details

Details for the file kaggleease-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: kaggleease-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 20.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5bc5e01f9a9518a0eaea0a230e93809257dd78c1d3ac5a6650639347113c03a7
MD5 bf405ad1171d571a933a87ef4848e7bc
BLAKE2b-256 a982d833ab0bbaffed8ea6ed08ecca1822408949aa4004f586e4f6d009431212

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page