Skip to main content

A Python class that supports Data Science projects.

Project description

resumableds

A Python class that supports Data Science projects.

resumableds supports you in writing data science scripts including save/resume functionality.

Data can be saved and resumed avoiding unnessary retrievals of raw data from data storages.

The data directory structure is inspired by cookiecutter-data-science (https://drivendata.github.io/cookiecutter-data-science/).

The class also supports the statement 'Analysis is a DAG' (https://drivendata.github.io/cookiecutter-data-science/#analysis-is-a-dag).

resumableds is written in pure Python and it is intended to be used within Jupyter notebooks. It however can also be useful in Python scripts or script pipelines.

Example

proj1 = RdsProject('project1') # create object from class (creates the dir if it doesn't exist yet)
proj1.raw.df1 = pd.DataFrame() # create dataframe as attribute of proj1.raw (RdsFs 'raw')
proj1.defs.variable1 = 'foo' # create simple objects as attribute of proj1.defs (RdsFs 'defs')
proj1.save() # saved attributes of all RfdFs in proj1 to disk

This will result in the following directory structure (plus some overhead of internals):

  • <output_dir>/defs/var_variable1.pkl
  • <output_dir>/raw/df1.pkl
  • <output_dir>/raw/df1.csv

Note, pandas dataframes are always dumped as pickle for further processing and as csv for easy exploration. The csv files are never read back anymore.

Later on or in another python session, you can do this:

proj2 = RdsProject('project1') # vars and data are read back to their original names
proj2.defs.variable1 == 'foo' # ==> True
isinstance(proj2.raw.df1, pd.DataFrame) # ==> True

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

resumableds-1.0.0.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

resumableds-1.0.0-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file resumableds-1.0.0.tar.gz.

File metadata

  • Download URL: resumableds-1.0.0.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.9

File hashes

Hashes for resumableds-1.0.0.tar.gz
Algorithm Hash digest
SHA256 50fa37de478da40a3429e440f5dd3bc7a58bbef456705ede0904035f60621413
MD5 6e03e4bc5a5164bae452a221448cdf4c
BLAKE2b-256 672edff19507eba62e9f48333f49acb5dee536b8e9728e90923e871d2ff71997

See more details on using hashes here.

File details

Details for the file resumableds-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: resumableds-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.9

File hashes

Hashes for resumableds-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2915838a4545cebef2eaf4ab8385e9d9db1b84528387a611caea6dac119ce0ee
MD5 0a8535107a8e6c6cc357858195ca0fb0
BLAKE2b-256 b84c0d19e94b6a53e55e700dcc4a36f3eb7523e4f51121f93fc79de9245085f1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page