Skip to main content

Use Kaggle free GPU notebooks as an LLM inference backend.

Project description

Tokenless

Run LLMs on Kaggle's free GPU notebooks with an OpenAI-compatible local client.

CI License: MIT Python

What It Does

tokenless starts a private Kaggle GPU script kernel, installs Ollama, pulls the requested model, opens a temporary Cloudflare tunnel, and lets your local Python process call the model through the OpenAI-compatible API.

For gpt-oss:20b, the model is downloaded once during llm.start(). Repeated llm.send(...) calls reuse the running Kaggle kernel until llm.stop().

Quick Start

Install from source:

pip install -e .

Set your Kaggle credentials:

export KAGGLE_USERNAME="your_username"
export KAGGLE_KEY="your_api_key"

On PowerShell:

$env:KAGGLE_USERNAME = "your_username"
$env:KAGGLE_KEY = "your_api_key"

Run your first prompt:

from tokenless import TokenlessLLM

llm = TokenlessLLM(model="gpt-oss:20b")
llm.start()
msg = llm.send("Explain asyncio in two sentences.")
print(msg)
llm.stop()

llm.start() shows a terminal progress bar while Kaggle starts the GPU kernel, installs Ollama, downloads the model, and opens the public endpoint. Pass show_progress=False to disable it.

Supported Models

Model Backend Notes
gpt-oss:20b Ollama persistent kernel Downloads once per start()
llama3.1-8b Endpoint mode / smoke path Coming Soon
mistral-7b Endpoint mode / smoke path Coming Soon
gemma-2-9b Endpoint mode / smoke path Coming Soon
qwen2.5-7b Endpoint mode / smoke path Coming Soon

Agent Integrations

OpenAI Agents SDK

pip install -e ".[agents]"
from agents import Agent, Runner, function_tool
from tokenless import TokenlessLLM

@function_tool
def word_count(text: str) -> int:
    """Count the number of words in a piece of text."""
    return len(text.split())

with TokenlessLLM(model="gpt-oss:20b") as llm:
    agent = Agent(
        name="Tokenless assistant",
        instructions="Use tools when helpful. Keep answers concise.",
        model=llm.as_agents_model(),
        tools=[word_count],
    )
    result = Runner.run_sync(agent, "Explain asyncio, then count the words.")
    print(result.final_output)

If you are running in Jupyter, IPython, VS Code interactive, or another environment that already has an event loop, use the async runner instead:

from agents import Agent, Runner, function_tool
from tokenless import TokenlessLLM

@function_tool
def word_count(text: str) -> int:
    """Count the number of words in a piece of text."""
    return len(text.split())

llm = TokenlessLLM(model="gpt-oss:20b")
llm.start()

agent = Agent(
    name="Tokenless assistant",
    instructions="Use tools when helpful. Keep answers concise.",
    model=llm.as_agents_model(),
    tools=[word_count],
)

result = await Runner.run(agent, "Explain asyncio, then count the words.")
print(result.final_output)

llm.stop()

Tokenless uses the Agents SDK chat-completions model adapter because Ollama exposes an OpenAI-compatible Chat Completions endpoint.

Strands Agents

pip install -e ".[strands]"
from tokenless import TokenlessLLM
from strands import Agent, tool

@tool
def word_count(text: str) -> int:
    """Count the number of words in a piece of text."""
    return len(text.split())

with TokenlessLLM(model="gpt-oss:20b") as llm:
    agent = Agent(
        model=llm.as_strands_model(params={"temperature": 0.2}),
        tools=[word_count],
        system_prompt="Use tools when helpful. Keep answers concise.",
    )
    result = agent("Explain asyncio, then count the words in your explanation.")
    print(result)

LangChain

pip install -e ".[langchain]"
from tokenless import TokenlessLLM

with TokenlessLLM(model="gpt-oss:20b") as llm:
    chat = llm.as_langchain_llm(temperature=0.2, max_tokens=512)
    response = chat.invoke("Explain asyncio in two sentences.")
    print(response.content)

For a LangGraph ReAct agent:

from langchain_core.tools import tool
from langgraph.prebuilt import create_react_agent
from tokenless import TokenlessLLM

@tool
def word_count(text: str) -> int:
    """Count the number of words in a piece of text."""
    return len(text.split())

with TokenlessLLM(model="gpt-oss:20b") as llm:
    chat = llm.as_langchain_llm(temperature=0.2, max_tokens=512)
    agent = create_react_agent(chat, tools=[word_count])
    result = agent.invoke(
        {
            "messages": [
                (
                    "user",
                    "Explain asyncio in two sentences, then count the words "
                    "in your explanation.",
                )
            ]
        }
    )
    print(result["messages"][-1].content)

Install all optional integrations with:

pip install -e ".[all]"

How It Works

  1. TokenlessLLM.start() uploads a long-running private Kaggle script kernel.
  2. The kernel installs dependencies, starts Ollama, and pulls gpt-oss:20b.
  3. The kernel starts a Cloudflare tunnel for Ollama's OpenAI-compatible endpoint.
  4. The kernel posts the tunnel URL back through a one-time rendezvous topic.
  5. llm.send(...) sends prompts to the running endpoint.
  6. llm.stop() deletes the Kaggle kernel.

Security Notes

  • Do not commit Kaggle credentials, .env files, or ~/.kaggle/kaggle.json.
  • The Cloudflare tunnel URL is temporary but public while the kernel is running.
  • Stop the kernel with llm.stop() when you are done.

Development

pip install -e ".[dev]"
python -m pytest
python -m ruff check .
python -m py_compile tokenless/client.py tokenless/notebook.py

CI runs offline tests only and does not require Kaggle credentials. To run the live Kaggle smoke test locally:

export KAGGLE_USERNAME="your_username"
export KAGGLE_KEY="your_api_key"
python -m pytest --run-live-kaggle -m live_kaggle

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tokenless-0.1.0.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tokenless-0.1.0-py3-none-any.whl (22.3 kB view details)

Uploaded Python 3

File details

Details for the file tokenless-0.1.0.tar.gz.

File metadata

  • Download URL: tokenless-0.1.0.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for tokenless-0.1.0.tar.gz
Algorithm Hash digest
SHA256 145263a8880f52c1b34c551dfa9071a67c19fa9a7cb69f3fd0ddda5bbb1c0523
MD5 5ca3ad8f70f1bffaac044fbeaf0b18a1
BLAKE2b-256 3c0c2fec827ec5b44783bfeb65a8ce6054837caf654ffe4a9bb609765e7ed08d

See more details on using hashes here.

File details

Details for the file tokenless-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tokenless-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for tokenless-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9eb49d5ec9a9fb9419a3fec0d6d9734984155354488adcdaa6473c3a98b8d0c8
MD5 363f9427d1ce8572c8cb4886a947052f
BLAKE2b-256 9d8f75e0eba79ca486d0e54b0e88bdb8e71a47b264b6619f373475bce3229ccc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page