Structure for large data-sets in science
Project description
Science data structure
This library makes it straight forward to make a tree folder structure for large data-sets. For now it supports numpy arrays only, but I have plans to implement pandas, csv, tab-separated and excel soon.
The idea behind the library is to make a data-set browse-able with a normal file browser. The components can be rearranged with the use of Python, the terminal or a simple file-browser.
Install
Install through pip
pip install science-data-structure
Examples
Simple data-set
In this simple example a data-set is created, with a single branch parabola
. In this branch two "leafs" are added x
and y
. At the end of the example the data_set is written to disk.
import science_data_structure.structures as structures
from pathlib import Path
import numpy
# Initialze an empty data-set
data_set = structures.StructuredDataSet(Path("./"), "example", {})
# add data to the data-set
data_set["parabola"]["x"] = numpy.linspace(-2, 2, 100)
data_set["parabola"]["y"] = data_set["parabola"]["x"].data ** 2
# write the data to disk
data_set.write()
Branch overriding
What will happen when a branch or a leaf is overwritten with another leaf or branch? This example extends the previous example
data_set["parabola"]["x"] = None
The above code will try to delete the variable x
, however it will raise a PermissionError
. This protection method is in place to make sure that data from a data-set is not simple overwritten. The user must explicitly ask to override the branch or leaf. In the case above, a simple solution will be:
data_set.overwrite = True
data_set["parabola"]["x"] = None
data_set.overwrite = False
data_set.write(exists_ok=True)
The last protection in place is the exist_ok
variable in the data_set.write()
function. This makes sure to not accidentally override an existing data-set.
Reading an existing data-set
Often you want to read a data-set, use it, adapt it, and write the results back to disk. The following script does just that.
import science_data_structure.structures as structures
from pathlib import Path
import numpy
# Initialze an empty data-set
data_set = structures.StructuredDataSet.read(Path("./example.struct"))
a = 2
b = 4
data_set["linear"]["x"] = numpy.linspace(-2, 2, 100)
data_set["linear"]["y"] = data_set["linear"]["x"] * a + b
data_set.write(exists_ok=True)
Note that we again must set the exists_ok = True
, otherwise the data-set cannot be written to disk.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for science_data_structure-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45c9cae2a8d6ae3731e21630de2694deb7e6bb5e3fec3e7b98638c70f1a2b393 |
|
MD5 | d960c1d2172164f67ba5fb09948ac6fa |
|
BLAKE2b-256 | a07098540eb4a1e37dcadae8e015d1265b89e01cd8fb7fc7b739524d125764de |
Hashes for science_data_structure-0.0.2-py3.8.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74258c81f8b6202dfcf8ccfffd2d309d83b9a63d206498a76b9581774f693e62 |
|
MD5 | d31e515c72877cc2adce34d5361f334f |
|
BLAKE2b-256 | 01f7f172b1fb8b02fe385cb6e689982fb278c3aa80de709baba73729546b1240 |
Hashes for science_data_structure-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 971b2b3fee8afcc8ab708ca259a410e858b182a73afc324eb8ed4cfd50bd3d00 |
|
MD5 | 9ac8c598fe6c92520f13bce41cf880f5 |
|
BLAKE2b-256 | 6af7606261613fd3275b4e6c1fc7499a83248e66bf79185d5d71c7d885a3c3ef |