Skip to main content

Auto-sync Kaggle notebook outputs to Google Drive or local machine via ngrok

Project description

kgout

Auto-sync Kaggle notebook outputs to Google Drive or your local machine.

PyPI version License: MIT Python 3.8+

When running long ML experiments on Kaggle, kernels can time out or sessions expire — and your output files disappear. kgout watches /kaggle/working/ in the background and automatically syncs new or modified files to Google Drive or exposes them via an ngrok tunnel for instant local download.

Drop it into any notebook as a single cell.

Install

# With local/ngrok tunnel support (recommended)
pip install kgout[local]

# With Google Drive support
pip install kgout[gdrive]

# Everything
pip install kgout[all]

Quick Start

Local Download via ngrok (Recommended)

Exposes your /kaggle/working/ directory as a public URL — open it in any browser on your phone, laptop, anywhere. Every new file appears instantly.

import os
os.environ["NGROK_AUTH_TOKEN"] = "your_token_here"  # free at ngrok.com

from kgout import KgOut

with KgOut("local") as kg:
    # ┌────────────────────────────────────────────────┐
    # │  kgout — files available at:                   │
    # │  https://abc123.ngrok-free.app                 │
    # └────────────────────────────────────────────────┘

    # ... your training code ...
    # Every new file saved to /kaggle/working/ is instantly
    # browsable and downloadable from the URL above.
    pass

How it works: kgout starts a file server on localhost, creates an ngrok tunnel to it, and gives you the public URL. The file server serves your watch directory live — any file your notebook saves appears immediately in the browser. A background watcher thread logs every new file and its direct download link.

Google Drive Auto-Upload

Every new CSV, checkpoint, or plot auto-uploads to a Drive folder the moment it's saved.

from kgout import KgOut

with KgOut(
    "gdrive",
    folder_id="1ABCxyz_your_drive_folder_id",
    credentials="/kaggle/input/my-secrets/service_account.json",
) as kg:
    # ... your training code ...
    pass

Both at Once

with KgOut(
    dest=["local", "gdrive"],
    folder_id="1ABCxyz",
    credentials="/path/to/sa.json",
) as kg:
    pass

Manual start/stop (no context manager)

kg = KgOut("local")
kg.start()

# ... long training ...

print(kg.stats)  # {'files_tracked': 12, 'events_fired': 5}

kg.stop()

Configuration

Parameter Default Description
dest "local" "local", "gdrive", or ["local", "gdrive"]
watch_dir /kaggle/working Directory to watch (recursive)
interval 30 Seconds between scans (min: 5)
ignore see below Glob patterns for files to skip
snapshot_existing True If True, skip files that exist before start()
folder_id Google Drive folder ID (required for gdrive)
credentials Service account JSON path (required for gdrive)
ngrok_token ngrok auth token (or set NGROK_AUTH_TOKEN env var)
port 8384 Local file server port
verbose True Enable logging output

Environment Variables

Instead of passing tokens directly, you can set these environment variables:

Variable Used by Description
NGROK_AUTH_TOKEN local destination ngrok authentication token
KGOUT_GDRIVE_CREDENTIALS gdrive destination Path to service account JSON

See .env.example in the repo for a template.

Default Ignore Patterns

These files are never synced:

  • *.ipynb, *.pyc, *.tmp, *.lock, *.log, *.swp, *.swo
  • .DS_Store, Thumbs.db
  • Hidden files (starting with .)
  • Directories: .ipynb_checkpoints, __pycache__, .git

Override with ignore=["*.csv"] or pass ignore=[] to sync everything.

Setting Up ngrok (for local destination)

  1. Create a free account at ngrok.com
  2. Copy your auth token from the dashboard
  3. In your Kaggle notebook:
    import os
    os.environ["NGROK_AUTH_TOKEN"] = "your_token"
    
    Or pass it directly: KgOut("local", ngrok_token="your_token")

Tip: On Kaggle, you can store the token as a Kaggle Secret and load it with:

from kaggle_secrets import UserSecretsClient
os.environ["NGROK_AUTH_TOKEN"] = UserSecretsClient().get_secret("NGROK_AUTH_TOKEN")

Setting Up Google Drive (for gdrive destination)

  1. Go to Google Cloud Console
  2. Create a project (or use existing) and enable the Google Drive API
  3. Go to IAM & Admin > Service Accounts > Create a service account
  4. Create a key (JSON) > download it
  5. Upload the JSON to Kaggle as a private dataset (e.g., my-secrets)
  6. In Google Drive, right-click your target folder > Share > paste the service account email (the client_email field in the JSON) > give it Editor access
  7. Copy the folder ID from the Drive URL: https://drive.google.com/drive/folders/THIS_PART_IS_THE_ID

Security

kgout takes the following security measures:

  • Localhost-only binding: The HTTP file server binds to 127.0.0.1, not 0.0.0.0. Only the ngrok tunnel can reach it — not other devices on the same network.
  • Path traversal protection: Requests that attempt to escape the served directory (e.g., /../../../etc/passwd) are blocked.
  • Security headers: All HTTP responses include X-Content-Type-Options: nosniff, X-Frame-Options: DENY, and a Content Security Policy.
  • No symlink following: The watcher uses followlinks=False to prevent symlink-based escapes.
  • Dangerous directory guard: Attempting to watch /, /etc, /home, or other sensitive paths raises a ValueError.
  • Credential masking: ngrok tokens are redacted from error messages.
  • Partial file guard: Files are only synced after they haven't been modified for 2 seconds, preventing sync of half-written files.
  • Minimal GDrive scope: Uses drive.file scope — the service account can only access files it created, not your entire Drive.

See SECURITY.md for the full security policy and vulnerability reporting.

How It Works

  1. Snapshot: On start(), kgout fingerprints all existing files (mtime + size) so they don't trigger syncs
  2. Poll: A daemon thread scans the watch directory every N seconds
  3. Settle check: Files modified in the last 2 seconds are skipped (still being written)
  4. Compare: Each file's fingerprint is compared against the snapshot
  5. Sync: New or modified files are sent to the configured destination(s)
  6. Cleanup: On stop() (or context manager exit), watcher thread and tunnels shut down

The watcher runs as a daemon thread — it won't block your notebook or prevent kernel shutdown.

Known Limitations

  • Polling-based: Uses periodic scanning, not filesystem events — there's a configurable delay (interval)
  • ngrok free tier: Limited to 1 tunnel; sessions may disconnect after ~2 hours
  • GDrive flat upload: Subdirectories are flattened to filenames (e.g., subdir_file.csv) in v1.0
  • Public URL: Anyone with the ngrok URL can download files. Don't share it with untrusted parties.

Development

git clone https://github.com/vybhavchaturvedi/kgout
cd kgout
pip install -e ".[dev,all]"
pytest tests/ -v

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kgout-1.0.0.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kgout-1.0.0-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file kgout-1.0.0.tar.gz.

File metadata

  • Download URL: kgout-1.0.0.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.2

File hashes

Hashes for kgout-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1dd8044a72d8ca1dd1a23f889ef110fb4dd7c4ab6800179a7c518b8a609f60d1
MD5 b12ac0bc87bf0780e2f26f278a08a49f
BLAKE2b-256 abbb829b4f6cb3d8f203f1bbcc532d10a689ff341d48941c1fd06ce57bf90355

See more details on using hashes here.

File details

Details for the file kgout-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: kgout-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.2

File hashes

Hashes for kgout-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 08444c7165e0c8461e7df96557f0f834c75b60a9b111639c190a099a0032bace
MD5 a26eb7fbc575ce5e6a00cba3f7baa6c8
BLAKE2b-256 efc458454ddbbf7bbe6b270ae546f2bc88ad951f9ad4df15b9b60e1f5e866992

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page