Skip to main content

Run Ollama coding models on your own Kaggle GPU instance

Project description

KaggleLLM-IDE

Run frontier coding models on your own free Kaggle GPU — no shared server, no VRAM limits on your laptop.

Each user brings their own Kaggle account. The app spins up a private GPU kernel, installs Ollama, pulls your chosen model, and tunnels the inference endpoint directly to your browser via ngrok. Your prompts never touch anyone else's server.

You → Setup form → Kaggle API → Your GPU kernel → ngrok tunnel → Chat UI

Why this exists

Most coding LLMs worth running (Qwen2.5-Coder 32B, DeepSeek-Coder-V2) need 16–20 GB of VRAM. Most laptops have 4–8 GB. Kaggle gives every verified account 30 free GPU hours per week on P100/T4 instances. This app is the bridge.


Supported models

Model VRAM Best for
qwen2.5-coder:7b ~5 GB Fast completions, everyday coding
qwen2.5-coder:14b ~9 GB Balanced speed + quality
qwen2.5-coder:32b ~20 GB Best quality (P100 only)
deepseek-coder-v2:16b ~10 GB Strong reasoning + code
codellama:13b ~8 GB General purpose

Prerequisites

You need three things before starting:

1. Kaggle account + API key

  1. Sign up at kaggle.com
  2. Go to Settings → API → Create New Token — this downloads kaggle.json
  3. Your username and key are inside that file

2. Phone verification (required for GPU)

Kaggle requires phone verification to unlock GPU kernels. Complete it at: kaggle.com/settings/phone-verification

3. ngrok auth token

  1. Sign up at ngrok.com (free)
  2. Go to Dashboard → Your Authtoken
  3. Copy the token

Getting started

Backend

cd backend
pip install -r requirements.txt
uvicorn main:app --reload
# Runs on http://localhost:8000

Frontend

cd frontend
npm install
npm run dev
# Runs on http://localhost:5173

Open http://localhost:5173, fill in your credentials, pick a model, and hit Launch GPU Kernel.

First launch takes 5–8 minutes (Ollama install + model download). Subsequent launches of the same model are faster since Kaggle caches the environment.


Project structure

kaggle-gpu-ide/
├── backend/
│   ├── main.py              # FastAPI app — session lifecycle endpoints
│   ├── kaggle_runner.py     # Kaggle API client — push, poll, stop kernels
│   ├── auth.py              # Credential + GPU access verification
│   └── requirements.txt
├── frontend/
│   ├── index.html
│   ├── vite.config.js
│   ├── package.json
│   └── src/
│       ├── App.jsx          # Setup / launching / chat screens
│       ├── index.css
│       ├── main.jsx
│       ├── hooks/
│       │   ├── useSession.js   # Session state + polling
│       │   └── useChat.js      # Message history + inference
│       └── lib/
│           └── api.js          # Backend + tunnel API clients
├── kernel/
│   └── kernel_bootstrap.py  # Runs ON Kaggle GPU — installs Ollama + ngrok
└── pyproject.toml

How it works

1. You enter Kaggle credentials + ngrok token in the setup form

2. Backend calls POST /auth/verify
   ├── Checks API key is valid
   └── Confirms GPU access is enabled on your account

3. Backend calls Kaggle API to push kernel_bootstrap.py
   to your Kaggle account with GPU + internet enabled

4. kernel_bootstrap.py runs on Kaggle's GPU:
   ├── Installs Ollama
   ├── Pulls your chosen model
   ├── Starts a FastAPI inference server on port 8000
   └── Opens an ngrok tunnel → prints [TUNNEL_URL]https://...[/TUNNEL_URL]

5. Backend polls the kernel log every 8s until the URL appears

6. Frontend receives the tunnel URL → all prompts go DIRECTLY
   to your Kaggle kernel — no inference traffic through any shared server

API reference

Backend (http://localhost:8000)

Method Path Description
POST /auth/verify Validate credentials + GPU access
POST /session/start Push kernel + begin polling
GET /session/{id}/status launching / ready / error
DELETE /session/{id} Stop kernel + clear session
GET /health Liveness check

Kernel tunnel (https://<ngrok-url>)

Method Path Description
GET /health Kernel liveness
GET /models List pulled Ollama models
POST /generate Run inference

Environment variables

Create a .env file in backend/ (see .env.example):

# Not strictly required — credentials are passed per-request from the frontend.
# Set these if you want to hardcode a single-user deployment.
KAGGLE_USERNAME=
KAGGLE_KEY=
NGROK_AUTHTOKEN=
SECRET_KEY=

Kaggle GPU quota

  • 30 hours/week per verified account
  • Resets every Sunday
  • Kernels auto-terminate after 12 hours of continuous runtime
  • GPU types: Tesla P100 (16 GB VRAM) or Tesla T4 (15 GB VRAM) — assigned automatically

Contributing

# Fork → clone → create a feature branch
git checkout -b feat/your-feature

# Make changes, then open a PR against dev (not main)

Please keep PRs focused. One feature or fix per PR.


License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kaggle_gpu_ide-0.1.0.tar.gz (42.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kaggle_gpu_ide-0.1.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file kaggle_gpu_ide-0.1.0.tar.gz.

File metadata

  • Download URL: kaggle_gpu_ide-0.1.0.tar.gz
  • Upload date:
  • Size: 42.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaggle_gpu_ide-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f4438459ed481c876344edb45625f0b35440101c86332d6a98da9cc423c915ae
MD5 8f24a4776fed5db08404df350b04f722
BLAKE2b-256 05a776e210b6ac17b9f95da0d61fb65fdf7aa22cb18f8cfa79d3a423e47cf9d9

See more details on using hashes here.

File details

Details for the file kaggle_gpu_ide-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kaggle_gpu_ide-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kaggle_gpu_ide-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b2b7f40f6decf31204e4c5d944438e6e17c345a2e6a879fdfde67befbc8abaf
MD5 db513426b5b0e09185def0c325b4af03
BLAKE2b-256 f4a56cc35d14c8ace90b880689f7a83d4eb02f87ba6f2e0b22ed4536c393a941

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page