No project description provided
Project description
bucket-pull
Small CLI command to download a bucket directory locally.
Aims to simulate gsutil cp -r
when copying from a bucket to local path.
# download whole bucket to a local dir
bucket-pull gs://mybucketname ./mybucketname
# download a directory and all it's content to a local dir
bucket-pull gs://mybucketname/mydir ./
Get it
pip install bucket-pull
https://pypi.org/project/bucket-pull/
Run it
# as cli
$ bucket-pull gs://mybucketname ./mybucketname
# as a module
$ python -m bucket_pull gs://mybucketname ./mybucketname
Auth & Permissions
The utility makes use of the Google SDK and uses Client-Provided Authentication
To run with exported SA key, you can make use of the GOOGLE_APPLICATION_CREDENTIALS environmental variable
GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-credentials.json bucket-pull gs://bucket/mydir /tmp/some/path
The account you are connecting with will need at least storage.buckets.get
on the bucket, which can be granted with the roles/storage.legacyBucketReader
.
gsutil iam ch serviceAccount:SERVICEACCOUNT@PROJECT.iam.gserviceaccount.com:legacyBucketReader gs://bucket
Some noteable differences
gsutil seems to have this somewhat weird behaviour when the destination path doesn't exist
$ gsutil cp -r gs://smoss-tech-test-bucket/mydir/ ./doesnotexist/actually/
$ echo $?
0
$ ls ./doesnotexist
ls: cannot access './doesnotexist': No such file or directory
It will not create the destination path (not that weird)
but it won't complain either and end exits with 0.
Here bucket-pull diverge and throws an error instead.
Notes on multi-processing
With the -m
flag we can enable multi-processing.
Here bucket-pull
has opted for using the threading
.
So, not true paralellism and we only ever make use of one CPU.
However, since we are mostly IO bound (disk and network) there is
still some gain to be had by using multiple threads waiting for IO.
"very" scientific comparison:
# with multithreading
time ./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/ -m
...
Downloading to /tmp/mydir/32mb.file
Downloading to /tmp/mydir/128mb.file
Downloading to /tmp/mydir/64mb.file
Downloading to /tmp/mydir/a/1.txt
Downloading to /tmp/mydir/a/b/2.txt
./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/ -m 5.86s user 5.24s system 22% cpu 50.149 total
# single thread
time ./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/
...
Downloading to /tmp/mydir/128mb.file
Downloading to /tmp/mydir/32mb.file
Downloading to /tmp/mydir/64mb.file
Downloading to /tmp/mydir/a/1.txt
Downloading to /tmp/mydir/a/b/2.txt
./bucket-pull.py gs://smoss-tech-test-bucket/mydir /tmp/ 4.80s user 4.38s system 12% cpu 1:13.83 total
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bucket_pull-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b356c6d08d4a1d9f2dc1767a42ddadb7abc93ae4157d18e8d9cc0cdcc86e75fd |
|
MD5 | 14aea27412e19b4776446d4288814adb |
|
BLAKE2b-256 | ddf92638bfd0af7e18d6a96f44dc786d2600acb8e8a74f75186e280dfa3a8193 |