Skip to main content

A Python package for pre and post-process VASP/Quantum ESPRESSO data into machine learning interatomic potential (MLIP) format.

Project description

arXiv Python 3.6+ Release License: MIT

AtomProNet: Atomic Data Processing for Neural Network

AtomProNet Logo

This package demonstrates a data processing workflow involving Bash script, Python conversion scripts, which automatically converts pre and post-process VASP/Quantum ESPRESSO data into machine learning interatomic potential (MLIP) training format (extxyz or npz).

AtomProNet
    |
    ├── Data collection from materials project database
    │   │
    │   ├── Atomic energy, position, lattice parameters                   
    │   └── Supercell formation  
    |
    |
    ├── Data generation using DFT simulation (VASP/Quantum ESPRESSO)
    │   │
    │   ├── Batch job preparation  
    │   ├── Batch job submission                   
    │   └── Batch data collection            
    │
    │
    ├── Pre-processing for Neural Network  (Post-processing of DFT simulation)             
    │   │
    │   └── DFT folders 
    │       │       
    │       ├── energy
    │       ├── forces
    │       ├── pressure      
    │       └── lattice parameters            
    │            │
    │            └── extxyz/npz format
    │
    │
    └── Post-processing
        ├── Machine Learning Interatomic Potential (MLIP)         
        │   │                 
        │   ├── Parity plots
        │   └── Cumulative distributions
        │
        └── Classical Molecular Dynamics (LAMMPS) 
            │   
            └── Computational Performance Assesment 
                ├── Simulation cell size
                └── CPU allocation

Tutorial

Example notebook of using AtomProNet's 4 modules- Open in Colab

Installation and Usage Guide

This guide provides detailed instructions on how to install and use the AtomProNet package.

Prerequisites

  • Python 3.6 or later
  • Pip (Python package manager)
  • Bash Shell (e.g., Git Bash, Cygwin, or WSL on Windows) to execute .sh scripts.

Installation

  1. Install Using Git:

    • Open a command prompt or terminal.
    • Navigate to the directory where you extracted the package.
    • Install the package by running the command:
      git clone https://github.com/MusannaGalib/AtomProNet.git
      cd AtomProNet
      pip install .
      
  2. Install Using PyPI:

    • AtomProNet can also be installed from PyPI:
      pip install AtomProNet
      

This command installs the package along with its dependencies.

Using the Package

Example Usage

Example datasets are given in 'example_dataset' folder. You can use the following commands to play with that by executing the python wrapper file.

cd AtomProNet
python3 process_and_run_script.py

Workflow Overview

  1. Bash Scripts (.sh files):

    • Takes a user-provided file path, process VASP and Quantum ESPRESSO job submission
    • Takes a user-provided file path, runs over all VASP and Quantum ESPRESSO simulation folders
    • Collect all the required information (energy, force, atomic positions, pressure in eV, lattice parameters)
  2. Python Converter (.py files):

    • Processes the files generated by the Bash script.
    • Outputs the converted npz and extxyz files.
    • Post-process MLIP data to get parity plots and cumulative distributions.

Options

To use this package, use the following options:

Choose an option:
1. Data from Materials Project
2. Pre-processing for DFT simulation
3. Pre-processing for Neural Network
4. Post-processing

Option 1

Enter your choice (1/2/3/4 or 'exit'): 1
Enter your Materials Project API key (press Enter to use default): 
Enter the material ID (e.g., mp-1234), compound formula (e.g., Al2O3), or elements (e.g., Li, O, Mn): Al2O3
Do you want to create supercells for all structures? (yes/no): yes
Enter the supercell size (e.g., 2 2 2): 2 3 4
Do you want to download energy+lattice data for the materials? (yes/no): yes

Option 2

Enter your choice (1/2/3/4 or 'exit'): 2
Options:
1: VASP
        Enter your choice: 1
        VASP Options:
        1: Prepare VASP job submission folders
               1. Enter the full path to the folder containing multiple POSCAR files
               2. Do you want to strain hydrostatically one POSCAR structure
                     Do you want to modify the EXX range in the script? (yes/no): yes
                     Enter the new range for EXX:
                     Start (e.g., -0.05): 
                     Step size (e.g., 0.01): 
                     End (e.g., 0.05): 
               3. Do you want to strain volumetrically one POSCAR structure
                     Do you want to modify the EXX, EYY, and EZZ ranges in the script? (yes/no): yes
                     Enter the new range for EXX, EYY, and EZZ:
                     Start (e.g., -0.05): 
                     Step size (e.g., 0.01): 
                     End (e.g., 0.05): 
        2: VASP job submission
        3: Post-processing of VASP jobs 
        4: Convergence check of VASP jobs      
        q: Quit
2: Quantum ESPRESSO
        Enter your choice: 2
        Quantum ESPRESSO Options:
        1: Prepare Quantum ESPRESSO job submission folders
        2: Quantum ESPRESSO job submission
        3: Post-processing of Quantum ESPRESSO jobs
        q: Quit
q: Quit

Instruction for preparing VASP jobs:

  • "INCAR", "KPOINTS", "vasp_jobsub.sh" files must be outside of the folder containing all the POSCAR files
  • POTCAR files must be provided as POTCAR_$atomsymbol (e.g. POTCAR_Al, POTCAR_O)

Instruction for preparing Quantum ESPRESSO jobs:

  • The code will prepare input_template and qe_jobsub.sh one level up of the provided POSCAR files
  • Update the input_template and qe_jobsub.sh as needed
  • Pesudopotentials files must be provided as $atomsymbol_*.UPF (e.g. li_pbe_v1.4.uspp.F.UPF, O.pbe-n-kjpaw_psl.0.1.UPF)

Option 3

Enter your choice (1/2/3/4 or 'exit'): 3
Do you want to run the first step (execute post-processing script)? (yes/no): yes
        Select the system for post-processing:
        1. VASP
              Enter your choice (1/2): 1
                Select the extraction type for VASP:
                1. Extract ionic last step (Self-Consistent simulations)
                   Do you want to split the Data files? (yes/no):
                2. Extract all ionic steps (Ab-initio MD)
                   Do you want to split the Data files? (yes/no):
        2. Quantum ESPRESSO
           Do you want to split the Data files? (yes/no): 
Do you want to split the dataset into train, test, and validation sets? (yes/no): yes

Option 4

Enter your choice (1/2/3/4 or 'exit'): 4
Post-Processing Options:
1. Post-Processing of MLIP
2. Post-Processing of LAMMPS
📖 Read More

Pre-processing for DFT simulation (VASP)

Hydrostatically/Volumetrically strain a structure: INCAR, KPOINTS, POTCAR, POSCAR, vasp_job.sh must be in the hydrostatic_strain.sh/volumetric_strain.sh folder

VASP/Quantum ESPRESSO job submission

Max number of job submission:

    job_submission.sh
    └── max_jobs=${1:-999}  (Limit 999 job submission; change it based on server)

2: VASP job submission:

  • last_job.txt keeps track of how many jobs are submitted. While rerunning 2: VASP job submission, it will use last_job txtto continue submitting remaining jobs.
  • job_submission.log keeps track of how many jobs falied to resubmit later.

Authors

This Software is developed by Musanna Galib

Citing This Work

If you use this software in your research, please cite the following paper:

BibTeX entry:
@misc{galib2025atompronetdataflowmachine,
      title={AtomProNet: Data flow to and from machine learning interatomic potentials in materials science}, 
      author={Musanna Galib and Mewael Isiet and Mauricio Ponga},
      year={2025},
      eprint={2501.14039},
      archivePrefix={arXiv},
      url={https://doi.org/10.48550/arXiv.2501.14039}, 
}

Contact, questions, and contributing

If you have questions, please don't hesitate to reach out to galibubc[at]student[dot]ubc[dot]ca

If you find a bug or have a proposal for a feature, please post it in the Issues. If you have a question, topic, or issue that isn't obviously one of those, try our GitHub Disucssions.

If your post is related to the framework/package, please post in the issues/discussion on that repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atompronet-0.0.1.tar.gz (39.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

AtomProNet-0.0.1-py3-none-any.whl (48.2 kB view details)

Uploaded Python 3

File details

Details for the file atompronet-0.0.1.tar.gz.

File metadata

  • Download URL: atompronet-0.0.1.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for atompronet-0.0.1.tar.gz
Algorithm Hash digest
SHA256 7b40697a0cd8b4eb3363d5409ce2e9dc917cb4e6c3b4ca361f81b8b8f94b30fe
MD5 eb01647b45e3c82f1d91db0b2ec129a4
BLAKE2b-256 dcd408820311915b98be1c664aa09327d263df6e0c63c2d7f13e1e976a02b206

See more details on using hashes here.

File details

Details for the file AtomProNet-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: AtomProNet-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 48.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.9

File hashes

Hashes for AtomProNet-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 76b07701b65699c57b648cc49577b2b0daec1a4de721614ab2845bb77b3f8a3a
MD5 3799a2aeec97fd6fb46b0a2c233bf2aa
BLAKE2b-256 493ad0c0116a6c3590bab2a3bf2fd16472d155bc466073aac74fbb094d31aae3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page