ML ready Quantum Mechanical datasets
Project description
openQDC - Open Quantum Data Commons
Installing openQDC
git clone git@github.com:OpenDrugDiscovery/openQDC.git
cd openQDC
# use mamba/conda
mamba env create -n openqdc -f env.yml
pip install -e .
Tests
You can run tests locally with:
pytest
Documentation
You can build the documentation locally with:
mkdocs serve
Downloading Datasets
A command line interface is available to download datasets or see which dataset is available, for more information please run openqdc --help.
# Display the available datasets
openqdc datasets
# Display the help message for the download command
openqdc download --help
# Download the Spice and QMugs dataset
openqdc download Spice QMugs
Overview of Datasets
We provide support for the following publicly available QM Potential Energy Datasets.
Potential Energy
Dataset | # Molecules | # Conformers | Average Conformers per Molecule | Force Labels | Atom Types | QM Level of Theory | Off-Equilibrium Conformations |
---|---|---|---|---|---|---|---|
ANI | 57,462 | 20,000,000 | 348 | No | 4 | ωB97x:6-31G(d) | Yes |
GEOM | 450,000 | 37,000,000 | 82 | No | 18 | GFN2-xTB | No |
Molecule3D | 3,899,647 | 3,899,647 | 1 | No | 5 | B3LYP/6-31G* | No |
NablaDFT | 1,000,000 | 5,000,000 | 5 | No | 6 | ωB97X-D/def2-SVP | |
OrbNet Denali | 212,905 | 2,300,000 | 11 | No | 16 | GFN1-xTB | Yes |
PCQM_PM6 | 1 | No | PM6 | No | |||
PCQM_B3LYP | 85,938,443 | 85,938,443 | 1 | No | B3LYP/6-31G* | No | |
QMugs | 665,000 | 2,000,000 | 3 | No | 10 | GFN2-xTB, ωB97X-D/def2-SVP | No |
QM7X | 6,950 | 4,195,237 | 603 | Yes | 7 | PBE0+MBD | Yes |
SN2RXN | 39 | 452709 | 11,600 | Yes | 6 | DSD-BLYP-D3(BJ)/def2-TZVP | |
SolvatedPeptides | 2,731,180 | Yes | revPBE-D3(BJ)/def2-TZVP | ||||
Spice | 19,238 | 1,132,808 | 59 | Yes | 15 | ωB97M-D3(BJ)/def2-TZVPPD | Yes |
tmQM | 86,665 | 86,665 | 1 | No | TPSSh-D3BJ/def2-SVP | ||
Transition1X | 9,654,813 | Yes | ωB97x/6–31 G(d) | Yes | |||
WaterClusters | 1 | 4,464,740 | No | 2 | TTM2.1-F | Yes |
Interaction energy
We also provide support for the following publicly available QM Noncovalent Interaction Energy Datasets.
Dataset |
---|
DES370K |
DES5M |
Metcalf |
DESS66 |
DESS66x8 |
Splinter |
X40 |
L7 |
CI Status
The CI runs tests and performs code quality checks for the following combinations:
- The three major platforms: Windows, OSX and Linux.
- The four latest Python versions.
main |
|
---|---|
Lib build & Testing | |
Code Sanity (linting and type analysis) | |
Documentation Build | |
Pre-Commit |
How to cite
All data presented in the OpenQDC are already published in scientific journals, full reference to the respective paper is attached to each dataset class. When citing data obtained from OpenQDC, you should cite both the original paper(s) the data come from and our paper on OpenQDC itself. The reference is:
ADD REF HERE LATER
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file openqdc-0.1.2.tar.gz
.
File metadata
- Download URL: openqdc-0.1.2.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 694f4376ede84e1ae4246dcb71dc3d4cdefd7639cd598d674afe1f651b49d44c |
|
MD5 | ecfcc582e52841ef91f2e273b9a8b93b |
|
BLAKE2b-256 | ac93c795462c7971ca7594cbac08b0040e33bc8af551db8c26c6512bf974746a |
File details
Details for the file openqdc-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: openqdc-0.1.2-py3-none-any.whl
- Upload date:
- Size: 183.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae8c9f66a61fb40f4cf2b485d166ffc0861572496f7c8ef13e54f96d69a06880 |
|
MD5 | 3f42db490bcc34e94f85d0c268bf09be |
|
BLAKE2b-256 | 473f1a8b65cd558126c73323c651387beeac3b3407578c067da033b42f71d88c |