Get to da cluster! Remote function execution for Databricks.

These details have not been verified by PyPI

Project links

Project description

Choppa

Get to da cluster

Run Python in Databricks straight from your laptop

Because Running Code Shouldn't Be Hard

So you want to run something in Databricks? Strap in because they expect you to build jobs with their nifty homebrew orchestrator, deploy environments using better-than-Terraform bundles, develop in their hosted monaco UI (which is waaay better than whatever VSCode has), and, oh. Remote development? Like from your laptop? Did we mention their hosted notebooks already? They come with AI and serverless

You don't want to do any of that. You want to write some code and run it. Like a normal person.

Installation

pip install choppa

Configuration

Choppa needs to know what cluster to run stuff on. In-order of precedence, Choppa will use the cluster:

set via the cluster_id parameter when you instanciate Choppa
whatever you put in the environment variable DATABRICKS_CLUSTER_ID
the value of cluster_id in ~/.databrickscfg
- if the environment variable DATABRICKS_CONFIG_PROFILE is set, using that profile
- otherwise using the DEFAULT profile

Usage

from choppa import Choppa

dutch = Choppa()

@dutch.remote
def add(a: int, b: int) -> int:
    return a + b

add(1, 2)  # 3

Donezo. You can probably stop reading now because that covers 99% of the frustration of Databricks development with just a freaking decorator

Advanced Usage

Scope

Choppa only instantiates remote environments for contexts that are possible to scope without having to inspect frames or mess with function ASTs. Or, put another way: Only functions and arguments are in-scope.

from choppa import Choppa

EXPONENT = 10

dutch = Choppa()

# This version works but is pretty boring
@dutch.remote
def an_option(a: int, exponent: int) -> int:
    return a ** exponent

# This one uses ONE WEIRD TRICK to always produce the exact same result!
@dutch.remote
def another_option(a: int) -> int:
    return a ** EXPONENT

Caching

Consider this straightforward workflow

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.getOrCreate()

def get_stuff() -> list[Row]:
    return spark.table("huge_table").limit(1_000_000_000).collect()

def analyze_stuff(rows: list[Row]):
    return len(rows)

data = get_stuff()
result = analyze_stuff(data)

I bet you're having fun downloading those billion rows from Databricks! Haven't even gotten to your analysis yet and you're already wishing computers came with hardware...

You'd maybe get to your analysis() sooner if you could cache the result on Databricks and only pass a reference over the network

from choppa import Choppa

choppa = Choppa(
    artifact_dir="/Workspace/Users/you@company/artifacts",
    max_result_size=2**10
)

@choppa.remote
def get_stuff() -> list[Row]:
    return spark.table("huge_table").limit(1_000_000_000).collect()

def analyze_stuff(rows: list[Row]):
    return len(rows)


ref = get_stuff() # type: ArtifactRef

Since a literal billion rows will blow through 1K bytes the result isn't returned. But then you have this ArtifactRef thing and need analyze_stuff() to use your actual data. Your could always materialize the artifact and run your analysis locally

data = ref.dereference() # hahaha, that's right- it's C all over again. sucker!
result = analyze_stuff(data)

Yeah, it's a cute trick but doesn't have a lot of value since you still need to download data eventually. Hmm.... I know! You could let Choppa automagically deal with ArtifactRefs behind the scenes (it does), run everything on Databricks (you should), and just run your code (the freakin' dream)

from choppa import Choppa

choppa = Choppa(
    artifact_dir="/Workspace/Users/you@company/artifacts",
    result_size_max=2**10
)

@choppa.remote
def get_stuff() -> list[Row]:
    return spark.table("huge_table").limit(1_000_000_000).collect()

@choppa.remote
def analyze_stuff(rows: list[Row]):
    return len(rows)

data = get_stuff()
result = analyze_stuff(data)

There are actually 2 decorators you can use if you want to be a bit more certain with what is returned as a reference

choppa.artifact will always cache results, returning an ArtifactRef object

choppa.remote will opportunistically return your data but fall back to an ArtifactRef if the serialized value is larger > than result_size_max. If you don't set result_size_max or set it to None then choppa.remote will always return your data

Context Managers

There's not a ton of savings to be had but you can use a context manager to group remote calls together. This does not invalidate the stuff I said about variables not being in-scope. What you get is faster execution because the remote process is reused for multiple function calls. You could probably get cute and create globals inside remote functions and have them persist in memory without having to write to disk or be sent over the network.. That's actually a pretty good idea. I'll think about it for version 2. Anyway, here's an example

from choppa import Choppa

dutch = Choppa()

@dutch.remote
def some_math(a: int, b: int) -> int:
    return a + b

with dutch.session():
    x = [some_math(y,1) for y in range(1_000)]

Async / Fire-and-Forget

And because my wife loves the idea of me turning off my laptop on occasion, maybe you just want to yeet a hard job at Databricks and walk away for a while. Easy peasy

from choppa import Choppa

dutch = Choppa()

@dutch.submit
def slow_job():
    # ... hours of processing ...
    return results

# Returns immediately
handle = slow_job() # type: RemoteHandle

# Later...
ref = handle.wait()
data = dutch.dereference(ref)

# or another option
while not handle.done():
    pass
ref = handle.get_pointer() # type: ArtifactRef
data = dutch.dereference(ref)

Requirements

Python 3.10+
databricks-sdk >= 0.20.0
Authenticated workspace (env vars, profile, or Azure CLI)

License

MIT

Hey, boss, I just made literally every researcher's job easier, made them more productive, made them happier. Every IC who works for you and a significant chunk of data science people across the BU. I'm just talking out loud here but maybe now I can get that promotion?

(huh? what are 'people skills'...)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Feb 6, 2026

This version

0.1.0

Dec 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

choppa-0.1.0.tar.gz (18.1 kB view details)

Uploaded Dec 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

choppa-0.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Dec 22, 2025 Python 3

File details

Details for the file choppa-0.1.0.tar.gz.

File metadata

Download URL: choppa-0.1.0.tar.gz
Upload date: Dec 22, 2025
Size: 18.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for choppa-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`79d1c588f30447bbce8a7dbb374a5c1da2589867bcdd0343996bba7c162bc39a`
MD5	`f0315b28eeb5d16d26edad0bfbe8feff`
BLAKE2b-256	`7f7a000160ca7a97120ea150ab19f923f90c261a7c524536b80efc5405ca5a9a`

See more details on using hashes here.

File details

Details for the file choppa-0.1.0-py3-none-any.whl.

File metadata

Download URL: choppa-0.1.0-py3-none-any.whl
Upload date: Dec 22, 2025
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.5

File hashes

Hashes for choppa-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d0f0493c487b580375efa5383dd81a6bf42635262d0c91669bb7fb90d9ec0373`
MD5	`aa8cc72691aba6858647229611515e76`
BLAKE2b-256	`dfb5c06991c01eebd1d0f60dec1405bef0fc507bd87a9aabfbe11da1899999d6`

See more details on using hashes here.

choppa 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Choppa

Because Running Code Shouldn't Be Hard

Installation

Configuration

Usage

Advanced Usage

Scope

Caching

Context Managers

Async / Fire-and-Forget

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes