Skip to main content

Implementation of the Vector-GC method and its regression framework

Project description

Vector-GC - Modeling Dipolar Molecules with PCP-SAFT

repository

Vector-GC is a group-contribution method for PCP-SAFT yielding parameters for non-polar and dipolar substances based on their molecular structure, e.g. given as SMILES. The method considers the dipolar term on a physical basis, thus improving predictions for dipolar substances and distinguishing cis- and trans-isomers.

The method and the corresponding regression framework is published as pypi package and can be installed via pip as vectorGC. The regression framework utilizes rapid phase calculations and automatic differentiation through FeOs-Torch. The framework can be used to regress additional substance classes for the Vector-GC method or to develop novel group-contribution methods for PCP-SAFT (by implementing new sum rules).

Using the Vector-GC Method

from vectorGC import vectorGC

# get PCP-SAFT parameters from a SMILES
m, sigma, epsilon, mu = vectorGC.from_smiles("CCCCBr")

Currently, the Vector-GC method is parametrized for alkanes, alkenes, oxygenated substances, and halogenated substances. The parametrized groups and bonds are presented in our publication.

Using the Regression Framework

import pandas as pd
import vectorGC.regression_framework.models as models
from vectorGC.regression_framework.solver import scipy_solver

# Use scipy solver without printing progress
solver = scipy_solver(show_progress=False)

# Load data required for the model fit
with open("path_to_group_info.json", "r") as f:
        group_info = json.load(f)
chem_id = pd.read_json("path_to_chemid.json").set_index("mol_id")
phys_data = pd.read_json("path_to_physdata.json").set_index("mol_id")

# Initialize vectorGC model to be fitted; use square root as normalization between properties
vecGC = models.vectorGC(chem_id, group_info, phys_data=phys_data, normalize="sqrt")

# Fit vectorGC model and generate results dictionaries
solver.fit(vecGC)
vecGC.get_fit_results()

# Perform leave-one-out cross-validation and generate results dictionaries (Attention: This can take up a lot of time & resources!)
solver.loocv(vecGC)
vecGC.get_loocv_results()

# Save the model in a pickle file
vecGC.save(path2outdir,filename)

Required Inputs

The model classes for PCP-SAFT regression require information on the chemical structure (chem_id), the groups and bonds that should be parametrized (group_info), and physical property data for vapor pressures and liquid densities (phys_data). See also example_preprocessing_halogenated_compounds.ipynb and example_preprocessing_oxygenated_compounds.ipynb for examples how to create inputs.

chem_id dataframe:


mol_id isomeric_smiles mw exp_dipole_moment iupac_name (or other additional identifier)
(Mandatory) Unique identifier for each considered substance (Mandatory) Isomeric Smiles of each considered substance (Mandatory) Molar weight of each considered substance (Optional) Known experimental dipole moment of each substance (Optional) Additional identifier / information for each substance

group_info dictionary:


{
  "smarts" : dict,            
  "bonds" : dict,            
  "initvals" : dict,        
  "known_groups" : dict,
  "scaling_factors" : dict
}
smarts dictionary: Used to define considered groups, known and to be fitted
Keys Considered group as String
Values Corresponding SMART as String
Example:
"smarts" : {
  "-CH3" : "[CH3;!R;!$([CH3][CH3]);!$([CH3][OH]);!$([CH3][NH2])]",
  "-CH2-": "[$([CX4H2]);!R;!$([CH2]=O)]",
  "F": "[$(F[C])]",
  ...
}
bonds dictionary: Used to define considered bonds, known and to be fitted
Keys Considered bond as String
Values Array defining the bond with ["Atom with higher electronegativitiy", "Atom with lower electronegativity", Identifier for bond type: 1.0 for SINGLE, 1.5 for AROMATIC, 2.0 for DOUBLE bond]
Example:
"bonds" : {
  "C-H" : ["C","H",1.0],
  "F-C" : ["F","C",1.0],
  ...
}
initvals dictionary: Used to define to be fitted groups and bonds and their initial values
Keys Considered bond or group as String; must match with defined Strings from smarts and bonds dictionaries
Values Initial value(s) for the considered bond or group; Float in case of bonds; dict in case of groups
Example:
"initvals" : {
  "F-C" : 1.5,
  "F" : {"m" : 0.3, "sigma" : 3.5, "epsilon" : 350.0},
  ...
}
known_groups dictionary: Used to define known groups or bonds that should not be fitted
Keys Known, considered bond or group as String; must match with defined Strings from smarts and bonds dictionaries
Values Known value(s) for considered bond or group; Float in case of bonds; dict in case of groups
Example:
"known_groups" : {
  "C-H" : 0.0,,
  "-CH3": {'m': 0.6119800000000001, 'sigma': 3.7202, 'epsilon': 229.9},
  "-CH2-": {'m': 0.45606, 'sigma': 3.89, 'epsilon': 239.01},
  ...
}
scaling_factors dictionary: Used to scale fit variables (group contributions) to same order of magnitude for solver
Keys PCP-SAFT parameters as String
Values Factor that is used to scale group variables to same order of magnitude for solver
Example:
"scaling_factors" : {
  "m" : 1.0,
  "sigma": 4.0,
  "epsilon": 400.0,
  "mu" : 3.0,
  ...
}

phys_data dataframe:


mol_id property phase temperature pressure value
(Mandatory) Unique identifier for each considered substance, must match to chem_id (Mandatory) Property of data point: pressure for vapor pressures or density for densities (Mandatory) Phase of data point: Vapor, Vapor (VLE), Liquid, Liquid (VLE), or Critical Point (Mandatory) Temperature of the data point in $\mathrm{K}$ (If applicable) Pressure of the data point in $\mathrm{Pa}$ - only required for Densities (property==density) in liquid state (phase=Liquid) (Mandatory) Value of the property: Pressures in $\mathrm{Pa}$ and Densities in $\mathrm{kg}/\mathrm{m}^3$

Implementing new sum rules

See example_implement_Sauer2014_sumrules.ipynb for an example how to use the pcsaftGC model class to define other sum rules. Here, we show implementing the sum rules for $\mu$, $\varepsilon^{A_iB_i}$, and $\kappa^{A_iB_i}$ defined by Sauer et al., 2014. Re-implementing sum rules for $m$, $\sigma$, and $\varepsilon$ works equivalently by overloading the function sum_rules_pcpsaft, see models source code.

Running regression & loading model

Examples for running the regression and loading models can be found in the examples folder. The script run_model_regression.py additionally yields an executable script that can be run as a background job.

Publication data

The regressed group parameters from our publication are available as csv and json files in the raw data folder: Hemprich2024.

Cite us

If you find Vector-GC or the developed regression framework useful for your own research, consider citing our publication from which this repository resulted.

@article{hemprich2024,
  author = {Hemprich, Carl and Rehner, Philipp and Esper, Timm and Gross, Joachim and Roskosch, Dennis and Bardow, André},
  title = {Modeling Dipolar Molecules with PCP-SAFT: A Vector Group-Contribution Method}
  journal = {ACS Omega},
  volume = {XX},
  number = {XX},
  pages = {XX},
  year = {2024}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorgc-0.1.2.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectorGC-0.1.2-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file vectorgc-0.1.2.tar.gz.

File metadata

  • Download URL: vectorgc-0.1.2.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for vectorgc-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d8858a335116c166857532a44692a34fbbb4e43adc3297ea1cf5ea91f9363ccb
MD5 dff598632564f4d429afba12b4d944ad
BLAKE2b-256 727e65d792dac5c578d638bbfe75c178785db99339bd20ccf204692c60b56c36

See more details on using hashes here.

File details

Details for the file vectorGC-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: vectorGC-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for vectorGC-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eaa141b9a88c09dd3de1f5154642fe6c6ee41a89ec285ac5677b579864330b8b
MD5 8f8ed90b4bd547ee949bb758c5e67c3e
BLAKE2b-256 2246f040eefedda6e16c02ca48bbc80efa9103c3fbaeaee5049e17a5a7f10ff5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page