Skip to main content

Reproducible, tailorable archives for computational chemistry

Project description

Reptar

A tool for storing and analyzing computational chemistry data in a file-agnostic way

Travis Build codecov GitHub release (latest by date) DOI Language grade: Python License GitHub repo size

MotivationAboutLicense

Motivation

Computational chemistry is falling behind in providing open raw and processed data used to draw scientific conclusions. Often it is the lack of time, expertise, and options that push researchers to overlook the importance of reproducible findings. Projects such as QCArchive, Materials Project, Pitt Quantum Repository, ioChem-BD and many others provide a rigid data framework for a specific purpose (e.g., quantum chemistry and material properties). In other words, data that does not directly fit into their paradigm are incompatible (for good reason).

Alternatively, you could use file formats such as JSON, XML, YAML, npz, etc. for a specific project. These are great options for customizable data storage with their own advantages and disadvantages. However, you often must choose between (1) a standardized parser that might not support your workflow or (2) writing your own.

Reptar provides customizable parsers and data storage frameworks for whatever an individual project demands. Data is stored in one of the supported file types and generalized routines are used to access and store data. All data is stored in a key-value pair format where users can use predetermined definitions or include their own. Regardless if you are running nudged elastic band calculations in VASP, free energy perturbation simulations in GROMACS, or gradient calculations in Psi4, you can store data easily with reptar by selecting a parser a specifying the desired file type. The result is a user-specified data file streamlined for analysis in Python and optimized for archival on places such as GitHub and Zenodo.

File types

Reptar supports three file types with a single interface: exdir, JSON, and npz. JSON is a text file for storing key-value pairs with few dimensions (i.e., no large arrays). NumPy's npz format is useful for arrays; however, no nesting is possible and loading data often requires postprocessing for 0D arrays (e.g., np.array('data')).

Exdir is a simple, yet powerful open file format that mimics the HDF5 format with metadata and data stored in directories with YAML and npy files instead of a single binary file. This provides several advantages such as mixing human-readable YAML and binary NumPy files, being easier for version control, and only loading requested portions of datasets into memory. For more detailed information, please read this Front. Neuroinform. article about exdir.

Installation

You can install reptar using pip install reptar or install the latest version directly from the GitHub repository.

git clone https://github.com/aalexmmaldonado/reptar
cd reptar
pip install .

License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reptar-0.0.2.tar.gz (61.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page