Skip to main content

Exabyte Parser

Project description

License: Apache

Exabyte Parser (ExaParser)

Exabyte parser is a python package to extract and convert materials modeling data (eg. Density Functional Theory, Molecular Dynamics) on disk to ESSE/EDC format.

Functionality

As below:

  • Extract structural information and material properties from simulation data
  • Serialize extracted information according to ESSE/EDC
  • Store serialized data on disk or remote databases
  • Support for multiple simulation engines, including:

The package is written in a modular way easy to extend for additional applications and properties of interest. Contributions can be in the form of additional functionality and bug/issue reports.

Installation

ExaParser can be installed as below.

  1. Install git-lfs in order to pull the files stored on Git LFS.

  2. Clone repository:

    git clone git@github.com:Exabyte-io/exaparser.git
    
  3. Install virtualenv using pip if not already present:

    pip install virtualenv
    
  4. Create virtual environment and install required packages:

    cd exaparser
    virtualenv venv
    source venv/bin/activate
    export GIT_LFS_SKIP_SMUDGE=1
    pip install -r requirements.txt
    

Usage

  1. Exaparser will look in the following locations for the config file, and use the first one it finds:

    • The existing file in the root of this repository, if installed as editable source. This won't work for production installs, and is just for testing scenarios.
    • Your user's home directory at ~/.exabyte/exaparser/config
    • A global system configuration at /etc/exabyte/exaparser/config

    Copy the config file from the root of this repo to one of the above locations and edit it.

  2. Edit the config file and adjust parameters as necessary. The most important ones are listed below.

    • Add ExabyteRESTfulAPI to data_handlers parameters list (comma-separated), if not already present. This will enable upload the data into Exabyte.io account.

      • New users can register here to obtain an Exabyte.io account.
    • Set owner_slug, project_slug, api_account_id, and api_auth_token if ExabyteRESTfulAPI is enabled.

    • Adjust workflow_template_name parameter in case a different template should be used.

    • Adjust properties parameter to extract desired properties; all listed properties will be attempted for extraction.

  3. Run the below commands to extract the data.

source venv/bin/activate
exaparser -w PATH_TO_JOB_WORKING_DIRECTORY

or just call exaparser with the explicit path to the virtualenv binary:

venv/bin/activate/exaparser -w PATH_TO_JOB_WORKING_DIRECTORY

Tests

Run the following command to run the tests.

./run-tests.sh -p=PYTHON_BIN -v=VENV_NAME -t=TEST_TYPE

All the passed parameters are optional, with the defaults being python3, venv, and unit, respectively.

The script will create a virtual environment and populate it, so there's no need to create one manually for testing.

Note that the testing virtualenv uses the requirements-dev.txt file, where a production usage should use the requirements.txt file. This avoids installing test dependencies when not needed.

Contribution

This repository is an open-source work-in-progress and we welcome contributions. We suggest forking this repository and introducing the adjustments there, the changes in the fork can further be considered for merging into this repository as explained in GitHub Standard Fork and Pull Request Workflow.

Architecture

The following diagram presents the package architecture.

ExaParser

Here's an example flow of data/events:

  • User invokes the parser with a path to a job working directory.

  • The parser initializes a Job class to extract and serialize the job.

  • Job class uses Workflow parser to extract and serialize the workflow.

  • The Workflow is initialized with a Template to help the parser to construct the workflow.

    • Users can add new templates or adjust the current ones to support complex workflows.
  • Workflow parser iterates over the Units to extract

    • application-related data
    • input and output files
    • materials (initial/final structures) and properties
  • The job utilizes Compute classes to extract compute configuration from the resource management system.

  • Once the job is formed it is passed to Data Handler classes to handle data, e.g. storing data in Exabyte platform.

Templates

Workflow templates are used to help the parser extracting the data as users follow different approaches to name their input/output files and organize their job directories. Readers are referred to Exabyte.io Documentation for more information about the structure of workflows. As explain above a Shell Workflow Template is used by default to construct the workflow. For each unit of the workflow one should specify stdoutFile, the relative path to the file containing the standard output of the job, workDir, the relative path to directory containing data for the unit and the name of input files.

TODO List

Desirable features for implementation:

  • Implement PBS/Torque and SLURM compute parsers
  • Implement VASP and Espresso execution unit parsers
  • Add other data handlers
  • Add complex workflow templates

Links

  1. Exabyte Source of Schemas and Examples (ESSE), Github Repository
  2. Vienna Ab-initio Simulation Package (VASP), official website
  3. Quantum ESPRESSO, Official Website

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exaparser-2024.2.16.post2.tar.gz (1.7 MB view hashes)

Uploaded Source

Built Distribution

exaparser-2024.2.16.post2-py3-none-any.whl (27.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page