Skip to main content

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Project description

KaggleEase

The fastest, notebook-first way to load Kaggle datasets into pandas with one line.

Open In Colab


Installation

pip install kaggleease

Quick Start

!pip install kaggleease --upgrade
from kaggleease import load, search, __version__
print(__version__) # Should be 1.3.8

df = load("titanic") # Now handles competitions automatically!

Usage in Google Colab

1. Install the package

In a Colab cell, run:

!pip install kaggleease

2. Use the module

from kaggleease import load

df = load("titanic")
print(df.head())

3. Magic commands in Colab

# Load dataset into a variable named 'df'
%kaggle load titanic --as df

# Preview a dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"

Usage in Local Jupyter Notebooks

1. Install the package

pip install kaggleease

2. Set up Kaggle credentials

  • Go to https://www.kaggle.com/account
  • Download your kaggle.json file
  • Place it in ~/.kaggle/kaggle.json (or %USERPROFILE%\.kaggle\kaggle.json on Windows)
  • Set file permissions to 600 (read/write for owner only)

3. Use the module

from kaggleease import load

df = load("titanic")
print(df.head())

4. Magic commands in Jupyter

# Load dataset into a specific variable
%kaggle load titanic --as df

# Preview dataset
%kaggle preview titanic

# Search for datasets
%kaggle search "credit risk"

Advanced Features

Progress Indication

Large downloads show progress bars automatically:

from kaggleease import load
df = load("large-dataset", timeout=600)  # Progress shown for large files

Thread-Safe Authentication

Multiple concurrent operations are supported safely:

import threading
from kaggleease import load

def load_dataset(dataset):
    return load(dataset)

# Safe to run concurrently
threads = [threading.Thread(target=lambda: load_dataset("dataset")) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()

Retry Logic with Exponential Backoff

Network failures are automatically retried:

  • First retry after 1s
  • Second retry after 2s
  • Third retry after 4s

Notebook Magic

KaggleEase comes with powerful notebook magics:

# Load the titanic dataset into a variable named 'df'
%kaggle load titanic --as df --timeout 600

# Preview the first few rows of a dataset
%kaggle preview titanic --timeout 30

# Search for datasets
%kaggle search "credit risk" --timeout 30

# Load specific file from dataset
%kaggle load kaggle/titanic --file train.csv --as train_df

Command Line Interface

KaggleEase provides a comprehensive CLI:

# Load a dataset with custom timeout
kaggleease load kaggle/titanic --timeout 600

# Preview a dataset
kaggleease preview kaggle/titanic

# Search for datasets with result limit
kaggleease search "credit risk" --top 10

# Load specific file from dataset
kaggleease load kaggle/titanic --file train.csv

API Reference

load(dataset_handle, file=None, timeout=300)

Load a Kaggle dataset into a pandas DataFrame with automatic progress indication for large files.

Parameters:

  • dataset_handle (str): The Kaggle dataset handle in the format "owner/dataset-name"
  • file (str, optional): Specific file to load from the dataset
  • timeout (int): Timeout in seconds for API calls and downloads (default: 300)

Returns:

  • pandas.DataFrame: The loaded dataset

Features:

  • Automatic progress indication for files > 100MB
  • Thread-safe authentication
  • Retry logic with exponential backoff
  • Structured logging

search(query, top=5, timeout=30)

Search for Kaggle datasets.

Parameters:

  • query (str): Search query
  • top (int): Maximum number of results to return (default: 5)
  • timeout (int): Timeout in seconds for the search operation (default: 30)

Returns:

  • list: List of dataset information dictionaries

ProgressBar and show_progress

Progress indication utilities for large downloads:

from kaggleease.progress import ProgressBar

progress = ProgressBar(1000, "Download")
progress.update(500)  # Update with downloaded bytes
progress.complete()

Troubleshooting

Authentication Error

If you encounter authentication errors, make sure you have your Kaggle API credentials set up:

  1. Go to https://www.kaggle.com/account and download your kaggle.json file
  2. Place it in ~/.kaggle/kaggle.json (or %USERPROFILE%\.kaggle\kaggle.json on Windows)
  3. Ensure the file has restricted permissions (600)

Large Dataset Warning

When loading large datasets (>5GB), you'll see a warning. Consider if you really need the entire dataset or if you can work with a subset.

Timeout Errors

If you're experiencing timeout errors, try increasing the timeout value:

# Increase timeout to 600 seconds
df = load("titanic", timeout=600)

Progress Indication

For large downloads, progress is shown automatically. For manual control:

from kaggleease.progress import show_progress
progress = show_progress(250, 1000, "Download")  # 250 bytes of 1000 total

Thread Safety

The library is thread-safe for concurrent operations. Multiple threads can safely call load() simultaneously.

Intelligence Features (v1.3.8+)

KaggleEase is now Universally Resilient:

  • Automatic Competition Detection: load("titanic") now works perfectly.
  • Universal Formats: Native support for CSV, Parquet, JSON, Excel (.xlsx, .xls), and SQLite (.sqlite, .db).
  • No-Crash Fallback: If a dataset contains images or other non-tabular files, load() returns the local directory path instead of crashing.
  • Deep Scanning: Finds your data even if it's buried in subdirectories.
  • Implicit Resolution: Still resolves load("slug") to the best owner/slug match.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaggleease-1.3.8.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaggleease-1.3.8-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file kaggleease-1.3.8.tar.gz.

File metadata

  • Download URL: kaggleease-1.3.8.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.3.8.tar.gz
Algorithm Hash digest
SHA256 72c3b74bc10cdd4d6a8ecce3254fe2e4339684f8d8f2890e42eeebf1136970c4
MD5 d108f2c472708105bc3511263e2510b8
BLAKE2b-256 1fb6a4ec24a7c1b04d9cba0ea53f5e1d793814f514dc1a5f7d196758454bd36f

See more details on using hashes here.

File details

Details for the file kaggleease-1.3.8-py3-none-any.whl.

File metadata

  • Download URL: kaggleease-1.3.8-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kaggleease-1.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 f57a71ce8716fa82b5525e974576dbaa9ec23380077c2ab019546e20cec967a9
MD5 1ff5280700850fba09372dc03825960a
BLAKE2b-256 629004a72f65979c13546d862f65ac3f395ebd8b600bfc1d8bf6489da3c7c0f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page