Exabyte Parser
Project description
Exabyte Parser (ExaParser)
Exabyte parser is a python package to extract and convert materials modeling data (eg. Density Functional Theory, Molecular Dynamics) on disk to ESSE/EDC format.
Functionality
As below:
- Extract structural information and material properties from simulation data
- Serialize extracted information according to ESSE/EDC
- Store serialized data on disk or remote databases
- Support for multiple simulation engines, including:
- VASP
- Quantum ESPRESSO
- others, to be added
The package is written in a modular way easy to extend for additional applications and properties of interest. Contributions can be in the form of additional functionality and bug/issue reports.
Installation
ExaParser can be installed as below.
-
Install git-lfs in order to pull the files stored on Git LFS.
-
Clone repository:
git clone git@github.com:Exabyte-io/exaparser.git
-
Install virtualenv using pip if not already present:
pip install virtualenv
-
Create virtual environment and install required packages:
cd exaparser virtualenv venv source venv/bin/activate export GIT_LFS_SKIP_SMUDGE=1 pip install -r requirements.txt
Usage
-
Exaparser will look in the following locations for the
config
file, and use the first one it finds:- The existing file in the root of this repository, if installed as editable source. This won't work for production installs, and is just for testing scenarios.
- Your user's home directory at
~/.exabyte/exaparser/config
- A global system configuration at
/etc/exabyte/exaparser/config
Copy the
config
file from the root of this repo to one of the above locations and edit it. -
Edit the config file and adjust parameters as necessary. The most important ones are listed below.
-
Add
ExabyteRESTfulAPI
todata_handlers
parameters list (comma-separated), if not already present. This will enable upload the data into Exabyte.io account.- New users can register here to obtain an Exabyte.io account.
-
Set
owner_slug
,project_slug
,api_account_id
, andapi_auth_token
ifExabyteRESTfulAPI
is enabled.- See RESTful API Documentation to learn how to obtain authentication parameters.
-
Adjust
workflow_template_name
parameter in case a different template should be used.- By default a Shell Workflow is constructed. See Templates section for more details.
-
Adjust
properties
parameter to extract desired properties; all listed properties will be attempted for extraction.
-
-
Run the below commands to extract the data.
source venv/bin/activate
exaparser -w PATH_TO_JOB_WORKING_DIRECTORY
or just call exaparser with the explicit path to the virtualenv binary:
venv/bin/activate/exaparser -w PATH_TO_JOB_WORKING_DIRECTORY
Tests
Run the following command to run the tests.
./run-tests.sh -p=PYTHON_BIN -v=VENV_NAME -t=TEST_TYPE
All the passed parameters are optional, with the defaults being python3
, venv
, and unit
, respectively.
The script will create a virtual environment and populate it, so there's no need to create one manually for testing.
Note that the testing virtualenv uses the requirements-dev.txt
file, where a production usage should use the requirements.txt
file. This avoids installing test dependencies when not needed.
Contribution
This repository is an open-source work-in-progress and we welcome contributions. We suggest forking this repository and introducing the adjustments there, the changes in the fork can further be considered for merging into this repository as explained in GitHub Standard Fork and Pull Request Workflow.
Architecture
The following diagram presents the package architecture.
Here's an example flow of data/events:
-
User invokes the parser with a path to a job working directory.
-
The parser initializes a
Job
class to extract and serialize the job. -
Job class uses
Workflow
parser to extract and serialize the workflow. -
The Workflow is initialized with a Template to help the parser to construct the workflow.
- Users can add new templates or adjust the current ones to support complex workflows.
-
Workflow parser iterates over the Units to extract
- application-related data
- input and output files
- materials (initial/final structures) and properties
-
The job utilizes Compute classes to extract compute configuration from the resource management system.
-
Once the job is formed it is passed to Data Handler classes to handle data, e.g. storing data in Exabyte platform.
Templates
Workflow templates are used to help the parser extracting the data as users follow different approaches to name their input/output files and organize their job directories. Readers are referred to Exabyte.io Documentation for more information about the structure of workflows. As explain above a Shell Workflow Template is used by default to construct the workflow. For each unit of the workflow one should specify stdoutFile
, the relative path to the file containing the standard output of the job, workDir
, the relative path to directory containing data for the unit and the name of input
files.
TODO List
Desirable features for implementation:
- Implement PBS/Torque and SLURM compute parsers
- Implement VASP and Espresso execution unit parsers
- Add other data handlers
- Add complex workflow templates
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file exaparser-2024.2.16.post2.tar.gz
.
File metadata
- Download URL: exaparser-2024.2.16.post2.tar.gz
- Upload date:
- Size: 1.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/7.0.1 pkginfo/1.9.6 requests/2.31.0 requests-toolbelt/1.0.0 tqdm/4.66.2 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f1834123c1577002b9e0d2bfd9f0917c4c9151aac893741a51992beff452de2 |
|
MD5 | 70cfb773a917c3054c2aeb19d2d509b8 |
|
BLAKE2b-256 | 40d1a9f7ca7dd393977841c9c89c79685cbecd87b2f41aeca537f42165773260 |
File details
Details for the file exaparser-2024.2.16.post2-py3-none-any.whl
.
File metadata
- Download URL: exaparser-2024.2.16.post2-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/7.0.1 pkginfo/1.9.6 requests/2.31.0 requests-toolbelt/1.0.0 tqdm/4.66.2 CPython/3.12.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8a6b8d39198aa752523fb76e245870fd0d8ba0f425779fa1182b8f48b22e1bd |
|
MD5 | 6bbbd5bac01c8f4aed4d2d85fdd8a7dd |
|
BLAKE2b-256 | 6a31f993aa4aa0d7fbf791ed4f22340d5ec8846960af98d992cc6f9d6fcfac02 |