hfsync
Project description
hfsync
Sync Huggingface Transformer Models to Cloud Storage (GCS/S3 + more soon)
Quickstart
!pip install --upgrade git+https://github.com/trisongz/hfsync.git
!pip install --upgrade hfsync
Usage
from hfsync import GCSAuth, S3Auth # AZAuth (not yet supported)
from hfsync import Sync
# Note: Only use one auth client.
# To have Auth picked up from env vars / implicitly
auth_client = GCSAuth()
auth_client = S3Auth()
# To set Auth directly / explicitly
auth_client = GCSAuth(service_account='service_account', token='gcs_token')
auth_client = S3Auth(access_key='access_key', secret_key='secret_key', session_token='token')
# Local and Cloud Paths
local_path = '/content/model'
cloud_path = 'gs://bucket/model/experiment' # or 's3://bucket/model/experiment'
sync_client = Sync(local_path=local_path, cloud_path=cloud_path, auth_client=auth_client)
# Or Implicitly without an auth_client to have it figure things out based on your cloud path
sync_client = Sync(local_path=local_path, cloud_path=cloud_path)
# After training loop, sync your pretrained model to both local and cloud.
# You don't need to explicitly call model.save_pretrained(path) as this function will do that automatically
results = sync_client.save_pretrained(model, tokenizer)
# results = {
# '/content/model/pytorch_model.bin': 'gs://bucket/model/experiment/pytorch_model.bin'
# ...
# }
# Pull Down from your bucket to local
results = sync_client.sync_to_local(overwrite=False)
# results = {
# 'gs://bucket/model/experiment/pytorch_model.bin': '/content/model/pytorch_model.bin'
# ...
# }
# Or set explicit paths if you are changing paths, for example, different dirs for each checkpoint
new_local = '/content/model2'
results = sync_client.sync_to_local(local_path=new_local, cloud_path=cloud_path, overwrite=False)
# results = {
# 'gs://bucket/model/experiment/pytorch_model.bin': '/content/model2/pytorch_model.bin'
# ...
# }
# Or set paths to use
new_cloud = 'gs://bucket/model/experiment2'
sync_client.set_paths(local_path=new_local, cloud_path=new_cloud)
results = sync_client.save_pretrained(model, tokenizer)
# results = {
# '/content/model2/pytorch_model.bin': 'gs://bucket/model/experiment2/pytorch_model.bin'
# ...
# }
# You can also use the underlying filesystem to copy a file directly
# Implicitly & Explicitly
filename = '/content/model2/pytorch_model.bin'
sync_client.copy(filename) # Copies to the set cloud_path variable -> 'gs://bucket/model/experiment2/pytorch_model.bin'
sync_client.copy(filename, dest='gs://bucket/model/experiment3') # Copies to dest variable -> 'gs://bucket/model/experiment3/pytorch_model.bin'
sync_client.copy(filename, dest='gs://bucket/model/experiment3/model.bin') # Copies to dest variable -> 'gs://bucket/model/experiment3/model.bin'
filename = 'gs://bucket/model/experiment2/pytorch_model.bin'
sync_client.copy(filename) # Copies to the set local_path variable -> '/content/model2/pytorch_model.bin'
sync_client.copy(filename, dest='/content/model3') # Copies to dest variable -> '/content/model3/pytorch_model.bin'
sync_client.copy(filename, dest='/content/model3/model.bin') # Copies to dest variable -> '/content/model3/model.bin'
# Copy Explicitly
src_file = '/content/mydataset.pb'
dest_file = 's3://bucket/data/dataset.pb'
sync_client.copy(src_file, dest_file)
Environment Variables Used
Google Cloud Storage
GOOGLE_API_TOKEN
GOOGLE_APPLICATION_CREDENTIALS
AWS S3
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_SESSION_TOKEN
Limitations
While the library tries to respect and check prior to overwriting where overwrite=false
is available, it's not currently supported agnostically across all cloud.
Support for other Cloud FS is WIP.
Credits
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hfsync-0.0.2.tar.gz
(5.2 kB
view details)
Built Distribution
File details
Details for the file hfsync-0.0.2.tar.gz
.
File metadata
- Download URL: hfsync-0.0.2.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0047d2d7ced343f9ea8191dc796bfe6c563e563e4c941eff9744c4524a995d77 |
|
MD5 | 0770b8600572bb1e0f0446a7020bad42 |
|
BLAKE2b-256 | f4b14f8cdaa71fa0ea81af34d08ee745585e7a304b627ffe277b1638323ec9b9 |
File details
Details for the file hfsync-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: hfsync-0.0.2-py3-none-any.whl
- Upload date:
- Size: 6.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86761a61213cd4e280d18a94dacd74bdfcf0204dbd57fa87ce48defa6016318d |
|
MD5 | 5cc95b852cad7feb3cba2155202df14e |
|
BLAKE2b-256 | dd485140adfd0f0f3d14bc5a846dc34c2aace9debf2d42fa64f78d51c3208b00 |