Skip to main content

file_io

Project description

file_io

Deterministic File Lib to make working with Files across Object Storage easier

Quickstart

!pip install --upgrade git+https://github.com/trisongz/file_io.git
!pip install --upgrade file-io


from fileio import File, Auth

# Auth object is for setting ADC if needed

Auth(adc='/path/to/adc.json')

'''
Recognized File Extensions

.json               - json
.jsonl/.jsonlines   - jsonlines
.csv                - csv
.tsv                - tsv with "\t" seperator
.txt                - txtlines
.pkl                - pickle
.pt                 - pytorch
.tfrecords          - tensorflow
'''

# Main auto classes
File.open(filename, mode='r', auto=True, device=None) # device is specific to pytorch. Set auto=False to get a barebones Posix via Gfile
File.save(data, filename, overwrite=False) # if not overwrite, will attempt to append for newline files
File.load(filenames, device=None) # yields generators per file, meaning you can have different file types
File.download(url, dirpath=None, filename=None, overwrite=False) # Downloads a single url
File.gdown(url, extract=True, verbose=False) # uses gdown lib to grab a google drive drive

# Main i/o classes (Not Binary)
File.read(filename) # 'r'
File.write(filename) # 'w'
File.append(filename) # 'a'

# Binary
File.wb(filename) # 'wb'
File.rb(filename) # 'rb'


# Batch downloaders
File.batch_download(urls, directory=None, overwrite=False) # downloads all urls into a directory, skipping if overwrite = True and exists
File.batch_gdown(urls, directory=None, extract=True, verbose=False) # downloads all gdrive urls to a directory

# Extension Specific 

# .json
File.jsonload(filename)
File.jsondump(dict, filename)

# .jsonl/.jsonlines (Single File)
File.jlg(filename)
File.jlw(data, filename, mode='auto', verbose=True)

# Multifile Readers

# .jsonl/.jsonlines
File.jgs(filenames)

Upcoming Changes / APIs

  • Support for setting JSON serializer [simdjson by default]
  • Support for Google Sheets manipulation [gspread]
  • Support for compressed files [.zst, .zip, .tar, .gz, .tar.gz]

Changelogs

June 30, 2021 - v0.1.11

  • Added Dill as default pickler if installed
  • Ability to set any pickle method that supports .dumps/.loads call with File.set_pickler(name='pickler') or File.set_pickler(function=cloudpickle)
  • Hotfix to change method to dumps/loads
  • Hotfix for .gsutil method which did not initialize properly.

June 11, 2021 - v0.1.8

  • Hotfix for methods .split_file/.split_files

June 9, 2021 - v0.1.7

  • Hotfix for Method .get_local
  • Hotfix for method .jlgs

May 28, 2021 - v0.1.6

  • Added Method to get User Dir
    • File.userdir

May 21, 2021 - v0.1.5

  • Added TSV/CSV Write Methods
    • File.csvwrite
    • File.tsvwrite

May 20, 2021 - v0.1.4

  • Hotfix for file.split_file(s) method to also return resulting filenames with output_files key

May 20, 2021 - v0.1.3

  • Py Version Requirement Fix

May 19, 2021 - v0.1.2

  • Minor Fixes
  • Added Methods for Splitting Files/Items
    • File.calc_splits
    • File.split_items
    • File.split_file
    • File.split_files

May 12, 2021 - v0.1.1

  • Minor Fixes
  • Added Method
    • File.fmv

May 12, 2021 - v0.1.0

  • Refactored Library
  • Organized Methods
  • Added MultiThreaded Wrapper
    • from fileio import MultiThreadPipeline
  • Added gsutil wrapper method
    • File.gsutil
  • Added Methods for Yaml
    • File.yload
    • File.yloads
    • File.ydump
    • File.ydumps
    • File.yparse
  • Updated Methods for Json
    • File.jsonload
    • File.jsonloads
    • File.jsondump
    • File.jsondumps
    • File.jp
    • File.jwrite
    • File.jg
    • File.jgs
  • Updated Methods for Jsonlines
    • File.jll
    • File.jlp
    • File.jldumps
    • File.jlwrite
    • File.jlwrites
    • File.jlg
    • File.jlgs
    • File.jlload
    • File.jlw
    • File.jlsample
  • Updated Methods for Text
    • File.textload
    • File.textwrite
    • File.textread
    • File.textlist
  • Added Methods for Requests
    • File.rget
    • File.rpost
    • File.reqsess
  • Added Methods for URL Encoding/Decoding
    • File.urlencode
    • File.urldecode
  • Added Methods for Hashing
    • File.hash
    • File.checkhash
  • Added Methods to Disable/Enable TQDM
    • File.enable_progress
    • File.disable_progress
  • Added Utility Methods
    • File.cat
    • File.backup
    • File.findir
    • File.append_ext
    • File.copydir
    • File.dirglob
    • File.absdir
    • File.get_local
    • File.finalize
    • File.print
    • File.set_printer
  • Fixed/Updated Methods
    • File.isfile
    • File.download
    • File.batch_download
    • File.pexists
    • File.whichpath
    • File.copy
    • File.bcopy
  • Added TFDSIODataset

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

file_io-0.1.11.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

file_io-0.1.11-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file file_io-0.1.11.tar.gz.

File metadata

  • Download URL: file_io-0.1.11.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10

File hashes

Hashes for file_io-0.1.11.tar.gz
Algorithm Hash digest
SHA256 7463af079eab6e2e5287699fa2ebc565d4b3069ac2b18aa85530da41d89425de
MD5 dfa790df91e8b0f64966a7b58aa42528
BLAKE2b-256 e62961e3d7f41623857266b20445e88ebc115b7ff720bda60f79e468cd60c242

See more details on using hashes here.

File details

Details for the file file_io-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: file_io-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.10

File hashes

Hashes for file_io-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 cc34e5de2c16b7c84724827485147bff54209fe836adcd1cf42093b1c59b5144
MD5 840d5176196cdb27c7207ceaf23a093e
BLAKE2b-256 d6bfa038944d971294d877610576f201a60d07ef07db369e5bf2d124590798bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page