Update libraries on Databricks

Project description

Apparate

Make your libraries magically appear in Databricks.

Note 6/15/20: Our team previously had a tradition of naming projects with terms or characters from the Harry Potter series, but we are disappointed by J.K. Rowling’s persistent transphobic comments. In response, we will be renaming this repository, and are working to develop an inclusive solution that minimizes disruption to our users.

Why we built this

When our team started setting up CI/CD for the various packages we maintain, we encountered some difficulties integrating Jenkins with Databricks.

We write a lot of Python + PySpark packages in our data science work, and we often deploy these as batch jobs run on a schedule using Databricks. However, each time we merged in a new change to one of these libraries we would have to manually create an egg, upload it using the Databricks GUI, go find all the jobs that used the library, and update each one to point to the new job. As our team and set of libraries and jobs grew, this became unsustainable (not to mention a big break from the CI/CD philosophy...).

As we set out to automate this using Databrick's library API, we realized that this task required using two versions of the API and many dependant API calls. Instead of trying to recreate that logic in each Jenkinsfile, we wrote apparate. Now you can enjoy the magic as well!

Apparate now works for both .egg and .jar files to support Python + PySpark and Scala + Spark libaries. Take advantage of apparate's ability to update jobs, make sure you're following one of the following naming conventions:

new_library-1.0.0-py3.6.egg
new_library-1.0.0-SNAPSHOT-py3.6.egg
new_library-1.0.0-SNAPSHOT-my-branch-py3.6.egg
new_library-1.0.0.egg
new_library-1.0.0-SNAPSHOT.egg
new_library-1.0.0-SNAPSHOT-my-branch.egg
new_library-1.0.0.jar
new_library-1.0.0-SNAPSHOT.jar
new_library-1.0.0-SNAPSHOT-my-branch.jar

Where the first number in the version (in this case 1) is a major version signaling breaking changes.

What it does

Apparate is a tool to manage libraries in Databricks in an automated fashion. It allows you to move away from the point-and-click interface for your development work and for deploying production-level libraries for use in scheduled Databricks jobs.

For a more detailed API and tutorials, check out the docs.

Installation

Note: apparate requires python3, and currently only works on Databricks accounts that run AWS (not Azure)

Apparate is hosted on PyPi, so to get the latest version simply install via pip:

pip install apparate

You can also install from source, by cloning the git repository https://github.com/ShopRunner/apparate.git and installing via easy_install:

git clone https://github.com/ShopRunner/apparate.git
cd apparate
easy_install .

Setup

Configuration

Apparate uses a .apparatecfg to store information about your Databricks account and setup. To create this file, run:

apparate configure

You will be asked for your Databricks host name (the url you use to access the account - something like https://my-organization.cloud.databricks.com), an access token, and your production folder. This should be a folder your team creates to keep production-ready libraries. By isolating production-ready libraries in their own folder, you ensure that apparate will never update a job to use a library still in development/testing.

Databricks API token

The API tokens can be generated in Databricks under Account Settings -> Access Tokens. To upload an egg to any folder in Databricks, you can use any token. To update jobs, you will need a token with admin permissions, which can be created in the same manner by an admin on the account.

Usage notes

While libraries can be uploaded to folders other than your specified production library, no libraries outside of this folder will ever be deleted and no jobs using libraries outside of this folder will be updated.

If you try to upload a library to Databricks that already exists there with the same version, a warning will be printed instructing the user to update the version if a change has been made. Without a version change the new library will not be uploaded.

Contributing

See a way for apparate to improve? We welcome contributions in the form of issues or pull requests!

Please check out the contributing page for more information.

License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Project details

Release history Release notifications | RSS feed

This version

2.3.0

Jul 23, 2020

2.2.3

Jun 15, 2020

2.2.2

Feb 15, 2019

2.2.0

Dec 4, 2018

2.1.0

Oct 11, 2018

2.0.0

Oct 11, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apparate-2.3.0.tar.gz (13.4 kB view details)

Uploaded Jul 23, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

apparate-2.3.0-py3-none-any.whl (14.4 kB view details)

Uploaded Jul 23, 2020 Python 3

File details

Details for the file apparate-2.3.0.tar.gz.

File metadata

Download URL: apparate-2.3.0.tar.gz
Upload date: Jul 23, 2020
Size: 13.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6

File hashes

Hashes for apparate-2.3.0.tar.gz
Algorithm	Hash digest
SHA256	`a8cfef3f5ed1cd19faaf9022820360369ad77924b8349d5cce515845ba5b8426`
MD5	`ecfe67e3c828900ffc798ba1f4312c83`
BLAKE2b-256	`0213678fe4d27d623b3d8d738d401e43b8eb9c9db41d5eef5ad3c4f73ba1d0bd`

See more details on using hashes here.

File details

Details for the file apparate-2.3.0-py3-none-any.whl.

File metadata

Download URL: apparate-2.3.0-py3-none-any.whl
Upload date: Jul 23, 2020
Size: 14.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.7.6

File hashes

Hashes for apparate-2.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a0d523e0968e20741bbec2f16738a8a9603ac83d8f7fe96bc2d61d586b9d3561`
MD5	`56b9612a23ca84608ff0b71580b91e74`
BLAKE2b-256	`8428b5643b4678fdf48eb6fc44eb59cf7cc3d9d17d343265882e381dcaca3952`

See more details on using hashes here.

apparate 2.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Apparate

Why we built this

What it does

Installation

Setup

Configuration

Databricks API token

Usage notes

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes