Skip to main content

Pyspark tools for everyday use

Project description

pyspark-me

Pyspark and Databricks tools for everyday life

Synopsis

Create Databricks connection

# Get Databricks workspace connection
dbc = pysparkme.databricks.connect(
        bearer_token='dapixyzabcd09rasdf',
        url='https://westeurope.azuredatabricks.net')

DBFS

# Get list of items at path /FileStore
dbc.dbfs.ls('/FileStore')

# Check if file or directory exists
dbc.dbfs.exists('/path/to/heaven')

# Make a directory and it's parents
dbc.dbfs.mkdirs('/path/to/heaven')

# Delete a directory recusively
dbc.dbfs.rm('/path', recursive=True)

# Download file block starting 1024 with size 2048
dbc.dbfs.read('/data/movies.csv', 1024, 2048)

# Download entire file
dbc.dbfs.read_all('/data/movies.csv')

Databricks workspace

# List root workspace directory
dbc.workspace.ls('/')

# Check if workspace item exists
dbc.workspace.exists('/explore')

# Check if workspace item is a directory
dbc.workspace.is_directory('/')

# Export notebook in default (SOURCE) format
dbc.workspace.export('/my_notebook')

# Export notebook in HTML format
dbc.workspace.export('/my_notebook', 'HTML')

Databricks CLI

Get CLI help

python -m pysparkme.databricks.cli --help

Export the whole Databricks workspace into a directory explore/export. Databricks token is taken from DATABRICKS_BEARER_TOKEN environment variable.

python -m pysparkme.databricks.cli workspace export -o explore/export ''

DBFS

# List items on DBFS
python -m pysparkme.databricks.cli dbfs ls --json-indent 2 ''
# Download a file and print to STDOUT
python -m pysparkme.databricks.cli dbfs get ml-latest-small/movies.csv
# Download recursively entire directory and store locally
python -m pysparkme.databricks.cli dbfs get -o ml-local ml-latest-small

Build and publish

python setup.py sdist bdist_wheel
python -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspark-me-0.0.5.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyspark_me-0.0.5-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file pyspark-me-0.0.5.tar.gz.

File metadata

  • Download URL: pyspark-me-0.0.5.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6rc1

File hashes

Hashes for pyspark-me-0.0.5.tar.gz
Algorithm Hash digest
SHA256 d3c2ae5a10654ce2682ddc5ff579285d188ec9605875a5f918384e77f71c9e3e
MD5 8b0ec5f52f31f0279fe5cf82fb6eca4c
BLAKE2b-256 de37240c101668eda0967b53586d8653a54cac963d4ca95c90ec6fc62a131e63

See more details on using hashes here.

File details

Details for the file pyspark_me-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: pyspark_me-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.6rc1

File hashes

Hashes for pyspark_me-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 feb3a0913626c12a5d0a58ac126698fb0e5f28d4b03ed657d27c0f4f9301b272
MD5 fb36649e95d215288907aaff86bb3a62
BLAKE2b-256 77c5a1340daf66f8ef1491576445042d9d765546d0d57181ba9873336ab11cd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page