Entropic, the simple data pipeline framework for scientists.
Project description
Entropic
From chaos, information.
Entropic is a data pipeline framework designed to provide scientists with a simple and efficient way to access data from their experiments. You can find the full documentation here.
Requirements
Entropic needs Python 3.9+, and relies mostly on:
Installation
You can install Entropic using pip
:
pip install entropic
Usage
Example
The most basic data pipeline that can be created with entropic consists of a Pipeline
subclass which defines the directories containing the experiment results and a function that will be used to read each result file and create a pandas DataFrame from it:
import pandas as pd
from entropic.process import Pipeline
from entropic import results
class Process(Pipeline):
source_paths = ["experiments/iteration_1", "experiments/iteration_2"]
extract_with = pd.read_csv
p = Process()
p.run()
if __name__ == "__main__":
for iteration in results.all:
for sample in iteration.samples:
print(sample.data.raw.head())
The main parts from this example are:
- Define your data processing class by inheriting from Pipeline:
class Process(Pipeline): source_paths = ["experiments/iteration_1", "experiments/iteration_2"] extract_with = pd.read_csv
Thesource_paths
variable points to folders which contain the results for an iteration. Within entropic, an iteration can be thought as a set of initial conditions for which you performed an experiment and took various samples with various results.extract_with
defines a function that will read through all of the sample files and create a DataFrame from it. In this example I'm usingpandas.read_csv
, but it can be any function you want -you can even custom define it and pass it toextract_with
. - Instantiate and run the pipeline:
p = Process() p.run()
- Access your results using the
results
API:if __name__ == "__main__": for iteration in results.all: for sample in iteration.samples: print(sample.data.raw.head())
In this example the accessing of results happens on the same file in which you run the pipeline. However, for performance reasons you might want to consider splitting the processing and the analysis on two different files. In this case you only need to run the processing part once, and your data will be loaded to a JSON-based database.
Example upgrade
A more realistic example will involve custom iterations and samples, which need custom logic for extracting, transforming or loading them into the database.
import pandas as pd
from entropic import results
from entropic.sources import BaseSample, Iteration
from entropic.process import Pipeline
from entropic.sources.fields import DataSource
class KinematicSample(BaseSample):
data: DataSource
speed: float = 0
points_in_data: int = 0
class KinematicExperiment(Iteration):
average_speed: float = 0
sample = KinematicSample
class Process(Pipeline):
source_paths = ["experiments/initial_condition_1"]
iteration = KinematicExperiment
def extract(self, source_path):
iteration = self.get_iteration_by_path(source_path)
for file_path in self.get_files_from_path(source_path):
raw = pd.read_csv(file_path)
data_source = DataSource(file_path=file_path, raw=raw)
sample = self.get_sample()(data=data_source, points_in_data=raw.shape[0])
iteration.upsert_sample(sample)
return iteration
def transform(self, iteration):
average = 0
for sample in iteration.samples:
sample.speed = (sample.data.raw["x"] / sample.data.raw["t"]).mean()
average += sample.speed
iteration.average_speed = average / len(iteration.samples)
p = Process()
p.run()
results.set_iteration(KinematicExperiment)
if __name__ == "__main__":
for iteration in results.all:
print(f"Iteration average speed={iteration.average_speed}")
for i, sample in enumerate(iteration.samples):
print(f"Sample {i+1}")
print(f"speed={sample.speed}")
print(f"rows={sample.points_in_data}")
print()
A few changes have been done from the previous example:
- Custom iteration and sample classes were created:
class KinematicSample(BaseSample): data: DataSource speed: float = 0 points_in_data: int = 0 class KinematicExperiment(Iteration): average_speed: float = 0 sample = KinematicSample
- Instead of defining an
extract_with
function, the extract function is defined instead. Also, calculations can be performed on a given iteration using thetransform
function:class Process(Pipeline): source_paths = ["experiments/initial_condition_1"] iteration = KinematicExperiment def extract(self, file_path): raw = pd.read_csv(file_path) data_source = DataSource(file_path=file_path, raw=raw) return self.get_sample()(data=data_source, points_in_data=raw.shape[0]) def transform(self, iteration): average = 0 for sample in iteration.samples: sample.speed = (sample.data.raw["x"] / sample.data.raw["t"]).mean() average += sample.speed iteration.average_speed = average / len(iteration.samples)
Note thatKinematicExperiment
is being defined as the iteration for theProcess
class. You can access theiteration
andsample
usingself.get_iteration()
andself.get_sample()
. Don't try to accessself.iteration
andself.sample
, as it might break! - In order to properly display results, the custom iteration has to be "added" to the results API:
results.set_iteration(KinematicExperiment)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file entropic-0.3.0.tar.gz
.
File metadata
- Download URL: entropic-0.3.0.tar.gz
- Upload date:
- Size: 28.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e413d3731df3ac2e450ec39f861bab10dde27039b0350de671cd6f7888cd348e |
|
MD5 | ac82eb9657f2b78e6c5ab35f813c7654 |
|
BLAKE2b-256 | 686dc776486caf2c665c94aea792fe9a2984557cc193570e38aee3ec5dd14ef3 |
File details
Details for the file entropic-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: entropic-0.3.0-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2257e7b329d3638d6a7ea8028b8b8655b2682d5b1deb4da5566a106fdbaaa5cc |
|
MD5 | 1edba494f898f003ede1bb515d8f1dfe |
|
BLAKE2b-256 | 0dc00a2150074a47727493b8cff79f0255d5ef5eed65346bf2d9feff50940a40 |