Implementation of the Vector-GC method and its regression framework
Project description
Vector-GC - Modeling Dipolar Molecules with PCP-SAFT
Vector-GC is a group-contribution method for PCP-SAFT yielding parameters for non-polar and dipolar substances based on their molecular structure, e.g. given as SMILES. The method considers the dipolar term on a physical basis, thus improving predictions for dipolar substances and distinguishing cis- and trans-isomers.
The method and the corresponding regression framework is published as pypi package and can be installed via pip as vectorGC. The regression framework utilizes rapid phase calculations and automatic differentiation through FeOs-Torch. The framework can be used to regress additional substance classes for the Vector-GC method or to develop novel group-contribution methods for PCP-SAFT (by implementing new sum rules).
Using the Vector-GC Method
from vectorGC import vectorGC
# get PCP-SAFT parameters from a SMILES
m, sigma, epsilon, mu = vectorGC.from_smiles("CCCCBr")
Currently, the Vector-GC method is parametrized for alkanes, alkenes, oxygenated substances, and halogenated substances. The parametrized groups and bonds are presented in our publication.
Using the Regression Framework
import pandas as pd
import vectorGC.regression_framework.models as models
from vectorGC.regression_framework.solver import scipy_solver
# Use scipy solver without printing progress
solver = scipy_solver(show_progress=False)
# Load data required for the model fit
with open("path_to_group_info.json", "r") as f:
group_info = json.load(f)
chem_id = pd.read_json("path_to_chemid.json").set_index("mol_id")
phys_data = pd.read_json("path_to_physdata.json").set_index("mol_id")
# Initialize vectorGC model to be fitted; use square root as normalization between properties
vecGC = models.vectorGC(chem_id, group_info, phys_data=phys_data, normalize="sqrt")
# Fit vectorGC model and generate results dictionaries
solver.fit(vecGC)
vecGC.get_fit_results()
# Perform leave-one-out cross-validation and generate results dictionaries (Attention: This can take up a lot of time & resources!)
solver.loocv(vecGC)
vecGC.get_loocv_results()
# Save the model in a pickle file
vecGC.save(path2outdir,filename)
Required Inputs
The model classes for PCP-SAFT regression require information on the chemical structure (chem_id), the groups and bonds that should be parametrized (group_info), and physical property data for vapor pressures and liquid densities (phys_data). See also example_preprocessing_halogenated_compounds.ipynb and example_preprocessing_oxygenated_compounds.ipynb for examples how to create inputs.
chem_id dataframe:
mol_id |
isomeric_smiles |
mw |
exp_dipole_moment |
iupac_name (or other additional identifier) |
|---|---|---|---|---|
| (Mandatory) Unique identifier for each considered substance | (Mandatory) Isomeric Smiles of each considered substance | (Mandatory) Molar weight of each considered substance | (Optional) Known experimental dipole moment of each substance | (Optional) Additional identifier / information for each substance |
group_info dictionary:
{
"smarts" : dict,
"bonds" : dict,
"initvals" : dict,
"known_groups" : dict,
"scaling_factors" : dict
}
smarts dictionary: Used to define considered groups, known and to be fitted
| Keys | Considered group as String |
| Values | Corresponding SMART as String |
| Example: |
"smarts" : {
"-CH3" : "[CH3;!R;!$([CH3][CH3]);!$([CH3][OH]);!$([CH3][NH2])]",
"-CH2-": "[$([CX4H2]);!R;!$([CH2]=O)]",
"F": "[$(F[C])]",
...
}
bonds dictionary: Used to define considered bonds, known and to be fitted
| Keys | Considered bond as String |
| Values | Array defining the bond with ["Atom with higher electronegativitiy", "Atom with lower electronegativity", Identifier for bond type: 1.0 for SINGLE, 1.5 for AROMATIC, 2.0 for DOUBLE bond] |
| Example: |
"bonds" : {
"C-H" : ["C","H",1.0],
"F-C" : ["F","C",1.0],
...
}
initvals dictionary: Used to define to be fitted groups and bonds and their initial values
| Keys | Considered bond or group as String; must match with defined Strings from smarts and bonds dictionaries |
| Values | Initial value(s) for the considered bond or group; Float in case of bonds; dict in case of groups |
| Example: |
"initvals" : {
"F-C" : 1.5,
"F" : {"m" : 0.3, "sigma" : 3.5, "epsilon" : 350.0},
...
}
known_groups dictionary: Used to define known groups or bonds that should not be fitted
| Keys | Known, considered bond or group as String; must match with defined Strings from smarts and bonds dictionaries |
| Values | Known value(s) for considered bond or group; Float in case of bonds; dict in case of groups |
| Example: |
"known_groups" : {
"C-H" : 0.0,,
"-CH3": {'m': 0.6119800000000001, 'sigma': 3.7202, 'epsilon': 229.9},
"-CH2-": {'m': 0.45606, 'sigma': 3.89, 'epsilon': 239.01},
...
}
scaling_factors dictionary: Used to scale fit variables (group contributions) to same order of magnitude for solver
| Keys | PCP-SAFT parameters as String |
| Values | Factor that is used to scale group variables to same order of magnitude for solver |
| Example: |
"scaling_factors" : {
"m" : 1.0,
"sigma": 4.0,
"epsilon": 400.0,
"mu" : 3.0,
...
}
phys_data dataframe:
mol_id |
property |
phase |
temperature |
pressure |
value |
|---|---|---|---|---|---|
(Mandatory) Unique identifier for each considered substance, must match to chem_id |
(Mandatory) Property of data point: pressure for vapor pressures or density for densities |
(Mandatory) Phase of data point: Vapor, Vapor (VLE), Liquid, Liquid (VLE), or Critical Point |
(Mandatory) Temperature of the data point in $\mathrm{K}$ | (If applicable) Pressure of the data point in $\mathrm{Pa}$ - only required for Densities (property==density) in liquid state (phase=Liquid) |
(Mandatory) Value of the property: Pressures in $\mathrm{Pa}$ and Densities in $\mathrm{kg}/\mathrm{m}^3$ |
Implementing new sum rules
See example_implement_Sauer2014_sumrules.ipynb for an example how to use the pcsaftGC model class to define other sum rules. Here, we show implementing the sum rules for $\mu$, $\varepsilon^{A_iB_i}$, and $\kappa^{A_iB_i}$ defined by Sauer et al., 2014. Re-implementing sum rules for $m$, $\sigma$, and $\varepsilon$ works equivalently by overloading the function sum_rules_pcpsaft, see models source code.
Running regression & loading model
Examples for running the regression and loading models can be found in the examples folder. The script run_model_regression.py additionally yields an executable script that can be run as a background job.
Publication data
The regressed group parameters from our publication are available as csv and json files in the raw data folder: Hemprich2024.
Cite us
If you find Vector-GC or the developed regression framework useful for your own research, consider citing our publication from which this repository resulted.
@article{hemprich2024,
author = {Hemprich, Carl and Rehner, Philipp and Esper, Timm and Gross, Joachim and Roskosch, Dennis and Bardow, André},
title = {Modeling Dipolar Molecules with PCP-SAFT: A Vector Group-Contribution Method}
journal = {ACS Omega},
volume = {XX},
number = {XX},
pages = {XX},
year = {2024}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vectorgc-0.1.2.tar.gz.
File metadata
- Download URL: vectorgc-0.1.2.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8858a335116c166857532a44692a34fbbb4e43adc3297ea1cf5ea91f9363ccb
|
|
| MD5 |
dff598632564f4d429afba12b4d944ad
|
|
| BLAKE2b-256 |
727e65d792dac5c578d638bbfe75c178785db99339bd20ccf204692c60b56c36
|
File details
Details for the file vectorGC-0.1.2-py3-none-any.whl.
File metadata
- Download URL: vectorGC-0.1.2-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaa141b9a88c09dd3de1f5154642fe6c6ee41a89ec285ac5677b579864330b8b
|
|
| MD5 |
8f8ed90b4bd547ee949bb758c5e67c3e
|
|
| BLAKE2b-256 |
2246f040eefedda6e16c02ca48bbc80efa9103c3fbaeaee5049e17a5a7f10ff5
|