Read GCS and local paths with the same interface, clone of tensorflow.io.gfile
Project description
blobfile
This is a standalone clone of TensorFlow's gfile
, supporting both local paths and gs://
(Google Cloud Storage) paths.
The main function is BlobFile
, a replacement for GFile
. There are also a few additional functions, basename
, dirname
, and join
, which mostly do the same thing as their os.path
namesakes, only they also support gs://
paths.
By default reads copy the entire source file on creation and writes on close()
. Set streaming=True
to BlobFile
to stream reads and writes instead. GCS files are written in large chunks though, so be careful if you do a log file this way as the end could be truncated.
Example usage:
import blobfile as bf
with bf.BlobFile("gs://my-bucket-name/cats", "wb") as w:
w.write(b"meow!")
Here are the functions:
BlobFile
- likeopen()
but works withgs://
paths toocopy
- copy a file from one path to anotherexists
- returnsTrue
if the file or directory existsglob
- return files matching a pattern, on GCS this only supports the*
operator and can be slow if the*
appears early in the pattern since GCS can only do prefix matches, all additional filtering must happen locallyisdir
- returnsTrue
if the path is a directorylistdir
- list contents of a directorymakedirs
- ensure that a directory and all parent directories existremove
- remove a filerename
- move a file from one path to another (source and destination must be both local or both on GCS), not atomic on GCScopytree
- copy a directory tree from one path to anotherrmtree
- remove a directory treestat
- get the size and modification time of a filewalk
- walk a directory tree, yielding(dirpath, dirnames, filenames tuples)
basename
- get the final component of a pathdirname
- get the path except for the final componentjoin
- join 2 or more paths together, inserting directory separators between each componentcache_key
- returns a cache key that can be used for the path (this is not guaranteed to change when the content changes, but should hopefully do that)get_url
- returns a url for a pathmd5
- get the md5 hash for a path, for GCS this is fast, but for other backends this may be slowset_log_callback
- set a log callback functionlog(msg: string)
to use instead of printing to stdout
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.