Cloud bucket indexer
Project description
Cloud Index
Indexes Google Cloud Platform / Amazon S3 buckets into JSON.
Installation
pip3 install cloudindex
Authentication
-
To index your bucket, you need to create GCP servece account (permissions required: Storage Legacy Bucket Reader & Storage Object Viewer) and then create a key for it. Put a path to keyfile to GOOGLE_APPLICATION_CREDENTIALS env variable or specify it in application command line params.
-
For Amazon S3 JSON key is also required. Amazon doesn't supply JSON keys, create it manually:
{ "aws_access_key_id": "KEYID", "aws_secret_access_key": "SECRETKEY" }
-
Additionally, for S3 you can specify in JSON fields region_name and endpoint_url, e.g. to connect to Digital Ocean Spaces:
{ "aws_access_key_id": "KEYID", "aws_secret_access_key": "SECRETKEY", "region_name": "nyc3", "endpoint_url": "https://nyc3.digitaloceanspaces.com" }
-
Don't forget to set 600 permission on all key files you have.
-
If started on Google Cloud / Amazon EC2, key file may be omitted
Usage
CLI tool
usage: cloud-index [-h] [--version] [-p DIR] [-k FILE] [-s TYPE] [-r]
[-x FILE] [-T DIR] [--fetch-meta] [-c] [-M FILE:field]
BUCKET
Cloud Storage indexer
positional arguments:
BUCKET bucket to index
optional arguments:
-h, --help show this help message and exit
--version Print version and exit
-p DIR, --prefix DIR Bucket object prefix (default: /)
-k FILE, --key-file FILE
Get CS key from file (default for gcs: from
GOOGLE_APPLICATION_CREDENTIALS environment variable)
-s TYPE, --cloud-storage TYPE
Cloud storage type: gcs for Google, s3 for Amazon S3
and compatible (default: gcs)
-r, --recursive Recursively include "subdirectories" and their objects
-x FILE, --exclude FILE
Files (masks) to exclude)
-T DIR, --time-format DIR
Time format (default: %Y-%m-%d %H:%M)
--fetch-meta Fetch custom metadata (for S3), for GCS meta data is
always included
-c, --checksums Get checksums from md5sums, sha1sums and sha256sums
-M FILE:field, --meta-file FILE:field
Additional "<info> <file>" file, e.g. SSL certs
fingerprints etc.
Option "-M" allows to include custom meta info, e.g. you can display fingerprints of SSL certificates hosted in bucket if this data is stored (as "FINGERPRINT FILENAME", one per line) in "FINGERPRINTS" (case insensitive) file:
cloud-index ..... -M FINGERPRINTS:fingerprint
If option "-c" is given to indexer, additional file attributes "md5", "sha1" and "sha256" appear. Indexer will update them if files "md5sums", "sha1sums" or "sha256sums" (case doesn't matter) are present in current directory. File format is standard: "CHECKSUM FILENAME" (one per line). This option is actually equal to
cloud-index ..... -M sha256sums:sha256 -M md5sums:md5 -M sha1sums:sha1
Library
Refer to cloudindex library pydoc for the functions and args.
Nuts and bolts
-
As buckets don't have real folders, sometimes it can't get a proper modification date or calculate folder size.
-
All object meta variables are indexed as well. If object has checksum meta data ("md5"/"sha1"/"sha256") and "-c" option is used, meta data value has higher priority than value from uploaded old-style meta info files.
-
If meta variable called "local-creation-time" is set, object creation date/time are overriden with it. You may set it e.g. when copying file to bucket:
gsutil -h "x-goog-meta-local-creation-time:2020-02-15 23:51:00"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cloudindex-0.0.3.tar.gz
.
File metadata
- Download URL: cloudindex-0.0.3.tar.gz
- Upload date:
- Size: 6.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4eaf6b5a3d9e53a23489037b839a5da9cec78dd7b8dd5082c388216812687eaa |
|
MD5 | e1b9a5cbceb44089c009d90c887240db |
|
BLAKE2b-256 | 9f8fff43f5d79eff7aab8b21e2259f7e3ec03773df967d1e9ea32ad4e816e8e0 |