Interactive command line client for Databricks DBFS

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Introduction

fastdbfs is an interactive command line client for Databricks DBFS API with the aim of being fast, friendly and feature rich.

fastdbfs is still in alpha state. Anything may change at this point. You should expect bugs and even data corruption or data lost at times.

Comments, bug reports, ideas and patches or pull requests are very welcome.

Installation

As development of fastdbfs progresses, from time to time and at points where it is considered to be more or less stable, releases on PyPi are done.

Those versions can be installed using pip as follows:

pip install fastdbfs

But at this early development stage, getting fastdbfs directly from GitHub is still probably a better idea, even if it may be broken at times, you will enjoy newer features.

You can do it as follows:

git clone https://github.com/salva/fastdbfs.git
cd fastdbfs
python setup.py install --user

Usage

Once the program is installed, just invoke it from the command line as fastdbfs.

I don't recomend at this point the usage of the python modules directly as the interfaces are not stable yet.

Configuration

fastdbfs reads its configuration from ~/.config/fastdbfs and ~/.databrickscfg (on windows that translates to something like C:\Users\migueldcsa\.config\fastdbfs and C:\Users\migueldcsa\.databrickscfg respectively).

The official Databricks documentation contains instructions on how to create a token and how to obtain the cluster ID.

Sample configuration

[fastdbfs]
pager=less
editor=vi

[logging]
filename=/tmp/fastdbfs.log
level=INFO

[DEFAULT]
host = https://westeurope.azuredatabricks.net
token = dapi1234567890abcdef1234567890abcdef
cluster_id = 0111-222222-yoooh666

[other-environment]
host = https://westeurope.azuredatabricks.net
token = dapi0123456789abcdef0123456789abcdef
cluster_id = 0222-222222-yoooh666

A commented version of this configuration file is available here.

Commands

Once fastdbfs is launched, a prompt appears and the following commands can be used:

`open [profile]`

Sets the active Databricks profile used for communicating.

By default it uses DEFAULT.

`cd [directory]`

Changes the remote current directory.

`lcd [directory]`

Sets the local working directory.

`lpwd`

Shows the working local directory

`ls [OPTS] [directory]`:

List the contents of the remote directory.

The supported options are as follows:

-l, --long: Include file properties size and modification time.
-h, --human: Print file sizes in a human friendly manner.

`find [OPTS] [RULES] [directory]`

List files recursively according to a set of rules.

Supported options are as follows:

--nowarn: Do not report errors happening while walking the file system tree.
-l, --ls: Display file and directory properties.
-h, --human: Show sizes like 1K, 210M, 4G, etc.

In addition to those, find also supports the following set of options for selectively picking the files that should be displayed:
--min-size=SIZE, --max-size=SIZE: Picks files according to their size (this rule is ignored for directories).
--newer-than=date, --older-than=date: Picks entries according to their modification time.
--name=GLOB_PATTERN: Picks only files and directories whose basename matches the given glob pattern.
--iname=GLOB_PATTERN: Case insensitive version of name.
--re=REGULAR_EXPRESSION: Picks only entries whose basename matches the given regular expression.
--ire=REGULAR_EXPRESSION: Case insensitive version of re.
--wholere=REGULAR_EXPRESSION: Picks file names whose relative path matches the given regular expression.
--iwholere=REGULAR_EXPRESSION: Case insensitive version of wholere.
--external-filter=CMD: Filters entries using an external command.

Also, any of the rules above can be negated preceding it by --exclude, for instance --exclude-iname=*.jpeg

The rules above never cut the file system traversal. So for instance, a rule discarding some subdirectory, doesn't preclude that subdirectory for being traversed and its child entries picked.

The basename is the last component of a patch. Relative paths are considered relative to the root directory passed as an argument.

Example:

find --ls --newer-than=yesterday --max-size=100K --iname=*.jpg

`put src [dest]`

Copies the given local file to the remote system.

Supported options are:

-o, --overwrite: When a file already exists at the target location, it is overwritten.

`get [OPTS] src [dest]`

Copies the given remote file to the local system.

Supported options are as follows:

-o, --overwrite: When a file already exists at the target location, it is overwritten.

`rget [OPTS] [RULES] [src [target]]`

Copies the given remote directory to the local system recursively.

The options supported are as follows:

-v, --verbose: Display the names of the files being copied.
--nowarn: Do not show warnings.
-o, --overwrite: Overwrite existing files.
--sync: Copy only files that have changed.

Before transferring a file it is checked whether a local file already exists at the destination, if it is as new as the remote one and if the sizes are the same. The download is skipped whan all these conditions are true.

In addition to those, rget also accepts the same set of predicates as find for selecting the entries to copy.

Examples:

rget --iname *.png --exclude-iwholere=/temp/ . /tmp/pngs

`rput [OPTS] [src [dest]]`

Copies the given local directory to the remote system recursively.

Supported options are:

-o, --overwrite: Overwrites any previously existent remote file.

`rm [OPTS] path`

Delete the remote file or directory.

Supported options are as follows:

-R, --recursive: Delete files and directories recursively.

`mkdir dir`

Creates the directory (and any required parents).

`mkcd dir`

Creates the directory and sets it as the working directory.

`cat file`

Prints the contents of the remote file.

`show [OPTS] file`

Shows the contents of the remote file using the configured pager (for instance, more).

The supported options are as follows:

--pager=PAGER: Picks the pager for displaying the file contents.

The pager can be configured adding an entry pager inside the fastdbfs section in the configuration file. less is used by default.

The commands batcat, less and more are shortcuts that use the corresponding pager.

`edit [OPTS] file`

Retrieves the remote file and opens it using your favorite editor.

Once the editor is closed, the file is copied back to the remote system

The supported options are as follows:

-n, --new: Creates a new file.
--editor=EDITOR: Picks the editor.

By default fastdbfs picks the editor from the environment variable EDITOR. It can also be customized in the configuration file creating an editor entry inside the fastdbfs section.

The commands vi and mg are shortcuts for edit that will use the corresponding editors.

`!cmd ...`

Runs the given command locally.

`exit`

Exits the client.

External filters

TODO

Limitations

Development is primarily done on Linux and only from time to time is fastdbfs tested on Windows. Don't hesitate to report bugs related to this.
The DBFS API has some limitations that fastdbfs can not overcome:
- Directory listings timeout after 1min.
- The API has a throttling mechanism that slows down operations that require a high number of calls (i.e. find, rput, rget).
- The methods provided for uploading data are too simplistic. They can not be parallelized and in some edge cases transfers may become corrupted (fastdbfs tries to protect against that).
- The metadata available is limited to file size and modification time.
Glob expression checking is done using python fnmatch module that only supports a very small set of patterns. Specifically, it lacks support for alternations as in *.{jpeg,jpg,png}.

TODO

Add more commands: mget, mput, glob, etc.
Improve command line parsing allowing for command flags, pipes, redirections, etc.
Autocomplete.
Make history persistent between sessions.
Allow passing commands when the program is launched (for instance, setting the default profile).
Catch C-c during long tasks and perform an orderly cleanup (i.e. remove temporary files).
Improve code quality.

Development and support

The source code for this program is available from https://github.com/salva/fastdbfs.

You can also use the GitHub bug-tracker to report any issues.

Copyright

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.5

May 8, 2021

0.4

May 7, 2021

0.3

May 6, 2021

0.2

Apr 30, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastdbfs-0.5.tar.gz (38.7 kB view hashes)

Uploaded May 8, 2021 Source

Built Distribution

fastdbfs-0.5-py3-none-any.whl (37.0 kB view hashes)

Uploaded May 8, 2021 Python 3

Hashes for fastdbfs-0.5.tar.gz

Hashes for fastdbfs-0.5.tar.gz
Algorithm	Hash digest
SHA256	`a2afa59fe250a7f3e73a83d8cf5369bb791af34ff87b381e71b19952e7e14cff`
MD5	`0b269debf7add5c2b6c610a6d03f5bf0`
BLAKE2b-256	`3011c547542a6d6ddb4a10ae122cb8ec618f390145085555835450f1644dff19`

Hashes for fastdbfs-0.5-py3-none-any.whl

Hashes for fastdbfs-0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f82177449c55ac47ead6a5a5f7f83331009e853392e6a9f8c6d5faaacfc3aecd`
MD5	`23754c44daec128151846a7a5fe936a6`
BLAKE2b-256	`9c73cf606a260eb230387713e8bb088fa76996403c150bf2a8c1b42be69aa3c2`

fastdbfs 0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

Introduction

Installation

Usage

Configuration

Sample configuration

Commands

open [profile]

cd [directory]

lcd [directory]

lpwd

ls [OPTS] [directory]:

find [OPTS] [RULES] [directory]

put src [dest]

get [OPTS] src [dest]

rget [OPTS] [RULES] [src [target]]

rput [OPTS] [src [dest]]

rm [OPTS] path

mkdir dir

mkcd dir

cat file

show [OPTS] file

edit [OPTS] file

!cmd ...

exit

External filters

Limitations

TODO

Development and support

See also

Copyright

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

`open [profile]`

`cd [directory]`

`lcd [directory]`

`lpwd`

`ls [OPTS] [directory]`:

`find [OPTS] [RULES] [directory]`

`put src [dest]`

`get [OPTS] src [dest]`

`rget [OPTS] [RULES] [src [target]]`

`rput [OPTS] [src [dest]]`

`rm [OPTS] path`

`mkdir dir`

`mkcd dir`

`cat file`

`show [OPTS] file`

`edit [OPTS] file`

`!cmd ...`

`exit`