Structure for large data-sets in science
Project description
Science data structure
This library makes it straight forward to make a tree folder structure for large data-sets. For now it supports numpy arrays only, but I have plans to implement pandas, csv, tab-separated and excel soon.
The idea behind the library is to make a data-set browse-able with a normal file browser. The components can be rearranged with the use of Python, the terminal or a simple file-browser.
Install
Install through pip
pip install science-data-structure
Manual installation
python setup.py install
Command line tools
This library is bundled with command line tools to create a system wide author
science_data_structure global create author "<name>"
or
science_data_structure global create author
and you will be prompted for the name of the author. You only have to run the above commands a single time, the data is stored in a configuration file (the location is dependent of your OS). From the command line you can create a dataset:
science_data_structure create dataset "<name>" "<description>"
The author you have created for you system is added to this dataset. Go into the folder of the dataset and execute:
science_data_structure list author
to view all the authors in this dataset. Alternatively you can list the entire meta file
science_data_structure list meta
Examples
Simple data-set
In this simple example a data-set is created, with a single branch parabola
. In this branch two "leafs" are added x
and y
. At the end of the example the data_set is written to disk.
Before we can create a dataset we need to create a meta file containing an author, you can do this with the earlier mentioned command line example above.
import science_data_structure.structures as structures
from pathlib import Path
import numpy
# initialize the empty data-set
dataset = structures.StructuredDataSet.create_dataset(Path("./."),
"test_set")
# add data to the data-set
data_set["parabola"]["x"] = numpy.linspace(-2, 2, 100)
data_set["parabola"]["y"] = data_set["parabola"]["x"].data ** 2
# write the data to disk
data_set.write()
Branch overriding
What will happen when a branch or a leaf is overwritten with another leaf or branch? This example extends the previous example
data_set["parabola"]["x"] = None
In this case the variable x stored in the branch parabola will be deleted upon the first write.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file science_data_structure-0.0.4.tar.gz
.
File metadata
- Download URL: science_data_structure-0.0.4.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c3ebb382043cf2bffeb83e553b8b6d2ccf07ddecd3e487ee196cec328f14111 |
|
MD5 | 589007671b168ddbd514de5c7eb1f447 |
|
BLAKE2b-256 | 5aba4991ed060791d46bb6bf6a3d2ddf42483f097272627e38ee7893b6fdf4f3 |
File details
Details for the file science_data_structure-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: science_data_structure-0.0.4-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27ff060ccb552d2cfa0894acafade7283e295ad06a35e9e919c920e87fe829e2 |
|
MD5 | d1c53eb94335ded1436838832d57b40c |
|
BLAKE2b-256 | 0a8911e455463014aef5cc4de22c95a6c17c5ee4baf1886f57f36795773ea2fe |