file_io
Project description
file_io
Deterministic File Lib to make working with Files across Object Storage easier
Quickstart
!pip install --upgrade git+https://github.com/trisongz/file_io.git
!pip install --upgrade file-io
from fileio import File, Auth
# Auth object is for setting ADC if needed
Auth(adc='/path/to/adc.json')
'''
Recognized File Extensions
.json - json
.jsonl/.jsonlines - jsonlines
.csv - csv
.tsv - tsv with "\t" seperator
.txt - txtlines
.pkl - pickle
.pt - pytorch
.tfrecords - tensorflow
'''
# Main auto classes
File.open(filename, mode='r', auto=True, device=None) # device is specific to pytorch. Set auto=False to get a barebones Posix via Gfile
File.save(data, filename, overwrite=False) # if not overwrite, will attempt to append for newline files
File.load(filenames, device=None) # yields generators per file, meaning you can have different file types
File.download(url, dirpath=None, filename=None, overwrite=False) # Downloads a single url
File.gdown(url, extract=True, verbose=False) # uses gdown lib to grab a google drive drive
# Main i/o classes (Not Binary)
File.read(filename) # 'r'
File.write(filename) # 'w'
File.append(filename) # 'a'
# Binary
File.wb(filename) # 'wb'
File.rb(filename) # 'rb'
# Batch downloaders
File.batch_download(urls, directory=None, overwrite=False) # downloads all urls into a directory, skipping if overwrite = True and exists
File.batch_gdown(urls, directory=None, extract=True, verbose=False) # downloads all gdrive urls to a directory
# Extension Specific
# .json
File.jsonload(filename)
File.jsondump(dict, filename)
# .jsonl/.jsonlines (Single File)
File.jlg(filename)
File.jlw(data, filename, mode='auto', verbose=True)
# Multifile Readers
# .jsonl/.jsonlines
File.jgs(filenames)
Upcoming Changes / APIs
- Support for setting JSON serializer [
simdjsonby default] - Support for Google Sheets manipulation [
gspread] - Support for compressed files [
.zst,.zip,.tar,.gz,.tar.gz]
Changelogs
July 7, 2021 - v0.1.14
- Remove Explicit need for Tensorflow in setup, but still require it at the moment.
- This may help with macos Tensorflow installations using
tensorflow-macos
- This may help with macos Tensorflow installations using
July 2, 2021 - v0.1.13
- Change
.textreadto return string rather than list.textreadlinesreplaces original function
- Update
.textlistto support option for stripping newlines and have replacementsstrip_newlines = True, will strip all newlines prior to returnreplacements: [ list | dict | str ] = None, will iterate through and replace
- Update
.base(filename, with_ext=True)to allow return without File Extension - Add
.readfilemethod to return.read()API - Add
.mod_fname(filename, new_name=None, prefix=None, suffix=None, ext=None, directory=None, create_dirs=True, filename_only=False)src = 'gs://mybucket/path/file.txt'res = File.mod_fname(src, newname='newfile', ext='json', directory='/newdir', prefix='test_', suffix='_001')>> res = /newdir/test_newfile_001.json
June 30, 2021 - v0.1.11
- Added Dill as default pickler if installed
- Ability to set any pickle method that supports .dumps/.loads call with
File.set_pickler(name='pickler')orFile.set_pickler(function=cloudpickle) - Hotfix to change method to dumps/loads
- Hotfix for .gsutil method which did not initialize properly.
June 11, 2021 - v0.1.8
- Hotfix for methods .split_file/.split_files
June 9, 2021 - v0.1.7
- Hotfix for Method .get_local
- Hotfix for method .jlgs
May 28, 2021 - v0.1.6
- Added Method to get User Dir
- File.userdir
May 21, 2021 - v0.1.5
- Added TSV/CSV Write Methods
- File.csvwrite
- File.tsvwrite
May 20, 2021 - v0.1.4
- Hotfix for file.split_file(s) method to also return resulting filenames with
output_fileskey
May 20, 2021 - v0.1.3
- Py Version Requirement Fix
May 19, 2021 - v0.1.2
- Minor Fixes
- Added Methods for Splitting Files/Items
- File.calc_splits
- File.split_items
- File.split_file
- File.split_files
May 12, 2021 - v0.1.1
- Minor Fixes
- Added Method
- File.fmv
May 12, 2021 - v0.1.0
- Refactored Library
- Organized Methods
- Added MultiThreaded Wrapper
from fileio import MultiThreadPipeline
- Added gsutil wrapper method
- File.gsutil
- Added Methods for Yaml
- File.yload
- File.yloads
- File.ydump
- File.ydumps
- File.yparse
- Updated Methods for Json
- File.jsonload
- File.jsonloads
- File.jsondump
- File.jsondumps
- File.jp
- File.jwrite
- File.jg
- File.jgs
- Updated Methods for Jsonlines
- File.jll
- File.jlp
- File.jldumps
- File.jlwrite
- File.jlwrites
- File.jlg
- File.jlgs
- File.jlload
- File.jlw
- File.jlsample
- Updated Methods for Text
- File.textload
- File.textwrite
- File.textread
- File.textlist
- Added Methods for Requests
- File.rget
- File.rpost
- File.reqsess
- Added Methods for URL Encoding/Decoding
- File.urlencode
- File.urldecode
- Added Methods for Hashing
- File.hash
- File.checkhash
- Added Methods to Disable/Enable TQDM
- File.enable_progress
- File.disable_progress
- Added Utility Methods
- File.cat
- File.backup
- File.findir
- File.append_ext
- File.copydir
- File.dirglob
- File.absdir
- File.get_local
- File.finalize
- File.print
- File.set_printer
- Fixed/Updated Methods
- File.isfile
- File.download
- File.batch_download
- File.pexists
- File.whichpath
- File.copy
- File.bcopy
- Added TFDSIODataset
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file file_io-0.1.14.tar.gz.
File metadata
- Download URL: file_io-0.1.14.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c770c0f4452f77981a66304cfa3681c2996d19c2f321e43bc89610464977eab2
|
|
| MD5 |
bde33c7e2bf7bc44a9b2eb14ca5e8e2f
|
|
| BLAKE2b-256 |
83b03214c3eac73db9b298c44cdcee82b8d5e845647919e26a52b7a6701cd673
|
File details
Details for the file file_io-0.1.14-py3-none-any.whl.
File metadata
- Download URL: file_io-0.1.14-py3-none-any.whl
- Upload date:
- Size: 19.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6fe656665fc0f24fe12412ffd4c192af3299a9bcf7e7c2b308c492a6c5c63a0
|
|
| MD5 |
5b227a7305e9170f1116f9336236e8aa
|
|
| BLAKE2b-256 |
a4b6e42d38a107b0966729af0fc28f4e3a61fa78d7a935c5a4cafe1f14f4eac0
|