PyDDS: data-driven programing
Project description
dds_py - Data driven software
Data-driven software (python implementation)
Introduction
The DDS package solves the synchronization problem between code and data. It allows programmers, scientists and data scientists to integrate code with data and data with code without fear of stale data, disparate storage frameworks or concurrency issues. DDS allows quick collaboration and data software reuse without the complexity. In short, you do not have to think about changes in your data pipelines.
How to use
This package is not published on PyPI yet. To use the latest version, run:
pip install -U git+https://github.com/tjhunter/dds_py
This package is known to work on python 3.6, 3.7, 3.8. No other versions are officially supported. Python 3.4 and 3.5 might work but they are not supported.
Plotting dependencies If you want to plot the graph of data dependencies, you must install separately the pydotplus
package, which requires graphviz
on your system to work properly. Consult the documentation of the pydotplus
package for more details. The pydotplus
package is only required with the dds_export_graph
option.
Databricks users: If you want to use this package with Databricks, some specific hooks for Spark are available. See this notebook for a complete example:
Example
In the world of data-driven software, executing code leads to the creation of data artifacts, which can be of any sort and shape that the work requires:
- datasets : collections of data items
- models : compact representations of datasets for specific tasks (classification, ...)
- insights: information about datasets and models that provide human-relatable cues about other artifacts
Combining software with data is currently a hard challenge, because existing programming paradigms
aim at being universal and are not tuned to the specific challenges of combining data and code
within a single product. DDS provides the low-level foundations to do that, in the spirit
of Karparthy's Software 2.0 directions (TODO: cite). dds_py
is a software implementation of these ideas
Here is the Hello world example (using type annotations for clarity)
import dds
import requests
@dds.dds_function("/hello_data")
def data() -> str:
url = "https://gist.githubusercontent.com/bigsnarfdude/515849391ad37fe593997fe0db98afaa/raw/f663366d17b7d05de61a145bbce7b2b961b3b07f/weather.csv"
return requests.get(url=url, verify=False).content.decode("utf-8")
data()
This example does the following:
- it defines a source of data, here a piece of weather data from the internet. This source is defined as the function
data_creator
- it assigns the data produced by this source into a variable (
data
) and also to a path in a storage system (/hello_data
)
The DDS library guarantees the following after evaluation of the code:
- the path
/hello_data
contains a copy of the data returned bydata_creator
, as if the functiondata_creator
had been called at this moment - the function
data_creator
is only evaluated when its inputs, or its code, are modified (referential transparency)
Programming model
This model has profound consequences for the programmers:
- computationally expensive data functions (such as building models) can be composed and built upon very cheaply, as if they were variables. DDS alleviates the need to decompose data pipelines into multiple stages because of technological requirments.
At its core, the programming model of DDS is very simple:
- functions are assumed to be idempotent, if not pure
- functions are referentially transparent (they can be replaced with their output)
- artifacts of any sort (models, data, statistics) are stored in a central repository
- the programming model is assumed to be hermetic (only the I/O tracked within the framework is expected to happen)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file dds_py-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: dds_py-0.3.0-py3-none-any.whl
- Upload date:
- Size: 47.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19bf75d819d4235828fa7b127edece389cee33208d12eb25cbc9f07604eb2629 |
|
MD5 | 65dfe6c97191f2655c1663ea75d6ee63 |
|
BLAKE2b-256 | 2d2be9d81178da5cbb2d7a5128502b915dd5fc920d19c589bf9d914f935d536b |