The fastest, notebook-first way to load Kaggle datasets into pandas with one line.
Project description
KaggleEase
The fastest, notebook-first way to load Kaggle datasets into pandas with one line.
Installation
pip install kaggleease
Quick Start
from kaggleease import load
# Intelligent handle resolution - searches and finds 'kaggle/titanic' for you
df = load("titanic")
✨ New in v1.3.0: Intelligent Loader
KaggleEase is now "Intelligent." It doesn't just fail; it helps you fix your mistakes.
- Fuzzy Handle Matching: Made a typo?
load("titanik")will suggesttitanic. - Implicit Slugs: No need for owners.
load("housing")finds the most relevant dataset. - Actionable Errors: In notebooks, errors appear as clean Markdown blocks with 💡 Fix Suggestions.
- Zero-Dependency REST: Purged the heavy
kagglelibrary (~10MB) for a lightweight 50KB core.
Usage in Google Colab
1. Install the package
In a Colab cell, run:
!pip install kaggleease
2. Use the module
from kaggleease import load
df = load("titanic")
print(df.head())
3. Magic commands in Colab
# Load dataset into a variable named 'df'
%kaggle load titanic --as df
# Preview a dataset
%kaggle preview titanic
# Search for datasets
%kaggle search "credit risk"
Usage in Local Jupyter Notebooks
1. Install the package
pip install kaggleease
2. Set up Kaggle credentials
- Go to https://www.kaggle.com/account
- Download your
kaggle.jsonfile - Place it in
~/.kaggle/kaggle.json(or%USERPROFILE%\.kaggle\kaggle.jsonon Windows) - Set file permissions to 600 (read/write for owner only)
3. Use the module
from kaggleease import load
df = load("titanic")
print(df.head())
4. Magic commands in Jupyter
# Load dataset into a specific variable
%kaggle load titanic --as df
# Preview dataset
%kaggle preview titanic
# Search for datasets
%kaggle search "credit risk"
Advanced Features
Progress Indication
Large downloads show progress bars automatically:
from kaggleease import load
df = load("large-dataset", timeout=600) # Progress shown for large files
Thread-Safe Authentication
Multiple concurrent operations are supported safely:
import threading
from kaggleease import load
def load_dataset(dataset):
return load(dataset)
# Safe to run concurrently
threads = [threading.Thread(target=lambda: load_dataset("dataset")) for _ in range(5)]
for t in threads: t.start()
for t in threads: t.join()
Retry Logic with Exponential Backoff
Network failures are automatically retried:
- First retry after 1s
- Second retry after 2s
- Third retry after 4s
Notebook Magic
KaggleEase comes with powerful notebook magics:
# Load the titanic dataset into a variable named 'df'
%kaggle load titanic --as df --timeout 600
# Preview the first few rows of a dataset
%kaggle preview titanic --timeout 30
# Search for datasets
%kaggle search "credit risk" --timeout 30
# Load specific file from dataset
%kaggle load kaggle/titanic --file train.csv --as train_df
Command Line Interface
KaggleEase provides a comprehensive CLI:
# Load a dataset with custom timeout
kaggleease load kaggle/titanic --timeout 600
# Preview a dataset
kaggleease preview kaggle/titanic
# Search for datasets with result limit
kaggleease search "credit risk" --top 10
# Load specific file from dataset
kaggleease load kaggle/titanic --file train.csv
# 🚀 NEW: Setup shell completion
kaggleease completion --shell zsh # or bash/fish
API Reference
load(dataset_handle, file=None, timeout=300)
Load a Kaggle dataset into a pandas DataFrame with automatic progress indication for large files.
Parameters:
dataset_handle(str): The Kaggle dataset handle in the format "owner/dataset-name"file(str, optional): Specific file to load from the datasettimeout(int): Timeout in seconds for API calls and downloads (default: 300)
Returns:
pandas.DataFrame: The loaded dataset
Features:
- Automatic progress indication for files > 100MB
- Thread-safe authentication
- Retry logic with exponential backoff
- Structured logging
search(query, top=5, timeout=30)
Search for Kaggle datasets.
Parameters:
query(str): Search querytop(int): Maximum number of results to return (default: 5)timeout(int): Timeout in seconds for the search operation (default: 30)
Returns:
list: List of dataset information dictionaries
ProgressBar and show_progress
Progress indication utilities for large downloads:
from kaggleease.progress import ProgressBar
progress = ProgressBar(1000, "Download")
progress.update(500) # Update with downloaded bytes
progress.complete()
Troubleshooting
Authentication Error
If you encounter authentication errors, make sure you have your Kaggle API credentials set up:
- Go to https://www.kaggle.com/account and download your
kaggle.jsonfile - Place it in
~/.kaggle/kaggle.json(or%USERPROFILE%\.kaggle\kaggle.jsonon Windows) - Ensure the file has restricted permissions (600)
Large Dataset Warning
When loading large datasets (>5GB), you'll see a warning. Consider if you really need the entire dataset or if you can work with a subset.
Timeout Errors
If you're experiencing timeout errors, try increasing the timeout value:
# Increase timeout to 600 seconds
df = load("titanic", timeout=600)
Progress Indication
For large downloads, progress is shown automatically. For manual control:
from kaggleease.progress import show_progress
progress = show_progress(250, 1000, "Download") # 250 bytes of 1000 total
Thread Safety
The library is thread-safe for concurrent operations. Multiple threads can safely call load() simultaneously.
Security Considerations
- Authentication credentials are stored securely with proper file permissions (600)
- Thread-safe authentication prevents credential corruption in concurrent environments
- All network operations include timeout protection
(GIF placeholder: A short animation showing the %kaggle load titanic magic in action)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kaggleease-1.3.1.tar.gz.
File metadata
- Download URL: kaggleease-1.3.1.tar.gz
- Upload date:
- Size: 18.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d8b9e4bb9b9b5264aec903250dc94005164f2df95abd8bc25a28af7f7198cd4
|
|
| MD5 |
2244222251887142aa18d09568b793c2
|
|
| BLAKE2b-256 |
902bb8810cad87fdaa913e3940f5b64eaeb8b405d32b516d584e9e770c5130f9
|
File details
Details for the file kaggleease-1.3.1-py3-none-any.whl.
File metadata
- Download URL: kaggleease-1.3.1-py3-none-any.whl
- Upload date:
- Size: 20.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45f620ecd42f9d5de16bcd244facdaeba76d5e3b78d8ecd21b6f8de76578110a
|
|
| MD5 |
b98f092296008842bf923c420dc5fc7f
|
|
| BLAKE2b-256 |
3a312df5753f5dd6b03bc062b27e0d5b2e9a0d052b31e184560a0a7f9b0fe9f4
|