Skip to main content

Reproducible, tailorable archives for computational chemistry

Project description

reptar

Reproducible, tailorable archive

License Language grade: Python License

MotivationAboutLicense

Motivation

Computational chemistry is falling behind in providing open raw and processed data used to draw scientific conclusions. Often it is the lack of time, expertise, and options that push researchers to overlook the importance of reproducible findings. Projects such as QCArchive, Materials Project, Pitt Quantum Repository, ioChem-BD and many others provide a rigid data framework for a specific purpose (e.g., quantum chemistry and material properties). In other words, data that does not directly fit into their paradigm are incompatible—for good reason.

Alternatively, you could use file formats such as JSON, XML, YAML, npz, etc. for a specific project. These are great options for customizable data storage with their own advantages and disadvantages. However, you often must choose between (1) a standardized parser that might not support your workflow or (2) writing your own.

Reptar provides customizable parsers and data storage frameworks for whatever an individual project demands. Data is stored in one of the supported file types and generalized routines are used to access and store data. All data is stored in a key-value pair format where users can use predetermined definitions or include their own. Regardless if you are running nudged elastic band calculations in VASP, free energy perturbation simulations in GROMACS, or gradient calculations in Psi4, you can store data easily with reptar by selecting a parser a specifying the desired file type. The result is a user-specified data file streamlined for analysis in Python and optimized for archival on places such as GitHub and Zenodo.

About

Reptar is essentially a collection of tools for managing computational chemistry data using the a variety of formats like JSON, npz, and exdir.

Exdir is a simple, yet powerful open file format that mimics the HDF5 format with metadata and data stored in directories with YAML and npy files instead of a single binary file. This provides several advantages such as mixing human-readable YAML and binary NumPy files, being easier for version control, and only loading requested portions of datasets into memory. For more detailed information, please read this Front. Neuroinform. article about exdir.

License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reptar-0.0.1.tar.gz (44.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page