Skip to main content

Get to da cluster! Remote function execution for Databricks.

Project description

Choppa

Get to da cluster

Run Python in Databricks straight from your laptop

PyPI version Python 3.10+ License: MIT

Because Running Code Shouldn't Be Hard

So you want to run something in Databricks? Strap in because they expect you to build jobs with their nifty homebrew orchestrator, deploy environments using better-than-Terraform bundles, develop in their hosted monaco UI (which is waaay better than whatever VSCode has), and, oh. Remote development? Like from your laptop? Did we mention their hosted notebooks already? They come with AI and serverless

You don't want to do any of that. You want to write some code and run it. Like a normal person.

Installation

pip install choppa

Configuration

Choppa will search and use the first cluster identifier it finds via:

  • DATABRICKS_CLUSTER_ID environment variable
  • If DATABRICKS_CONFIG_PROFILE environment variable is set then the cluster_id in ~/.databrickscfg for that profile
  • The cluster_id defined in ~/.databrickscfg's DEFAULT profile

You can also manually set the cluster whenever you want with

choppa.set_cluster(cluster_id="8675309")

Quickstart

import choppa

@choppa.remote
def add(a: int, b: int) -> int:
    return a + b

add(1, 2)  # 3

Donezo. You can probably stop reading now because that covers 99% of the frustration of Databricks development with just a freaking decorator

Slowstart

Scope

Global variables are handy but don't work with Choppa

Do this

@choppa.remote
def some_math(a: int, exponent: int) -> int:
    return a ** exponent

some_math(2, 10) # 1024

Don't do this

EXPONENT = 10

@choppa.remote
def some_math(a: int) -> int:
    return a ** EXPONENT

some_math(2)  # RemoteExecitionFailed: name 'EXPONENT' is not defined

Context Managers

Normally each call to a @choppa.remote function uses its own execution context on your cluster. If that's confusing then just pretend I said 'process' instead, it's close enough. You can group work into a single process via a context manager

Do this

@choppa.remote 
def some_math(a: int, b: int) -> int:
    return a + b

# 1 context
with choppa.session():
    x = [some_math(y, 1) for y in range(1_000)] 

Don't do this

@choppa.remote 
def some_math(a: int, b: int) -> int:
    return a + b

# 1 bajillion contexts
x = [some_math(y, 1) for y in range(1_000)] 

Requirements

  • Python 3.10+
  • databricks-sdk >= 0.20.0
  • cloudpickle (for serializing arguments and results)

License

MIT


Hey, boss, I just made literally every researcher's job easier, made them more productive, made them happier. Everyone who works for you and a significant chunk of data science people across the BU. I'm just talking out loud here but maybe now I can get that promotion?

(huh? what are 'people skills'...)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

choppa-0.2.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

choppa-0.2.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file choppa-0.2.0.tar.gz.

File metadata

  • Download URL: choppa-0.2.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for choppa-0.2.0.tar.gz
Algorithm Hash digest
SHA256 43b751e7554f107c5a9d828da4916445a29b3225c36a127dc2e73b0ea7162bb6
MD5 80acf711be5fd8b69dc017c17eebef15
BLAKE2b-256 4f1cf069025dd481c8eacf7817fc3cfaacb7c9e29b5e58322fca8d4a4761cd49

See more details on using hashes here.

File details

Details for the file choppa-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: choppa-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for choppa-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 86c330ee4bb5319784199d794b68ac58fb176801df38fe01e98eaf6cb679eda2
MD5 9d8c25bea46f579d8b61c28734c6ff10
BLAKE2b-256 7f4a8cdb2bc02e13857243b643f567e0d82231d580ee60d60783f3610bd7963c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page