Skip to main content

Structure for large data-sets in science

Project description

PyPI PyPI GitHub last commit

Science data structure

This library makes it straight forward to make a tree folder structure for large data-sets. For now it supports numpy arrays only, but I have plans to implement pandas, csv, tab-separated and excel soon.

The idea behind the library is to make a data-set browse-able with a normal file browser. The components can be rearranged with the use of Python, the terminal or a simple file-browser.

Install

Install through pip

pip install science-data-structure

Manual installation

python setup.py install

Command line tools

This library is bundled with command line tools to create a system wide author

science_data_structure global create author "<name>"

or

science_data_structure global create author

and you will be prompted for the name of the author. You only have to run the above commands a single time, the data is stored in a configuration file (the location is dependent of your OS). From the command line you can create a dataset:

science_data_structure create dataset "<name>" "<description>"

The author you have created for you system is added to this dataset. Go into the folder of the dataset and execute:

science_data_structure list author

to view all the authors in this dataset. Alternatively you can list the entire meta file

science_data_structure list meta

Examples

Simple data-set

In this simple example a data-set is created, with a single branch parabola. In this branch two "leafs" are added x and y. At the end of the example the data_set is written to disk.

Before we can create a dataset we need to create a meta file containing an author, you can do this with the earlier mentioned command line example above.

import science_data_structure.structures as structures
from pathlib import Path
import numpy

# initialize the empty data-set
dataset = structures.StructuredDataSet.create_dataset(Path("./."),
                                                      "test_set")


# add data to the data-set
data_set["parabola"]["x"] = numpy.linspace(-2, 2, 100)
data_set["parabola"]["y"] = data_set["parabola"]["x"].data ** 2

# write the data to disk
data_set.write()

Branch overriding

What will happen when a branch or a leaf is overwritten with another leaf or branch? This example extends the previous example

data_set["parabola"]["x"] = None

In this case the variable x stored in the branch parabola will be deleted upon the first write.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

science_data_structure-0.0.4.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

science_data_structure-0.0.4-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file science_data_structure-0.0.4.tar.gz.

File metadata

  • Download URL: science_data_structure-0.0.4.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2

File hashes

Hashes for science_data_structure-0.0.4.tar.gz
Algorithm Hash digest
SHA256 9c3ebb382043cf2bffeb83e553b8b6d2ccf07ddecd3e487ee196cec328f14111
MD5 589007671b168ddbd514de5c7eb1f447
BLAKE2b-256 5aba4991ed060791d46bb6bf6a3d2ddf42483f097272627e38ee7893b6fdf4f3

See more details on using hashes here.

File details

Details for the file science_data_structure-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: science_data_structure-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.8.2

File hashes

Hashes for science_data_structure-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 27ff060ccb552d2cfa0894acafade7283e295ad06a35e9e919c920e87fe829e2
MD5 d1c53eb94335ded1436838832d57b40c
BLAKE2b-256 0a8911e455463014aef5cc4de22c95a6c17c5ee4baf1886f57f36795773ea2fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page