Skip to main content

package used to access the ANI-1 data set

Project description

ANI-1 dataset support repository

This repository contains the scripts needed to access the ANI-1 data set. The structure of this repository has been updated so the tools can be installed easily using pip. This modified repository is a fork of https://github.com/isayev/ANI1_dataset created by Olexandr Isayev

If you use ANI-1 dataset please cite the following two papers:

Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg. ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost. Chemical Science, 2017, DOI: 10.1039/C6SC05720A

Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg. ANI-1, A data set of 20 million calculated off-equilibrium conformations for organic molecules. Scientific Data, 4, Article number: 170193, DOI: 10.1038/sdata.2017.193 https://www.nature.com/articles/sdata2017193

Required software

Python3.5 or better Numpy H5PY

Included extraction software

pyanitools.py -Contains a class called "anidataloader" for loading and parsing the ANI-1 data set.

example_data_sampler.py -Example of how to sample data from the anidataloader class.

Installation instructions

  1. run pip install ANI1datatools

Description

The downloaded file (https://doi.org/10.6084/m9.figshare.c.3846712) can be extracted on a Unix based system with the “tar -xzf ani-1_dataset.tar.gz” command. Once extracted, a folder named “ANI-1_release” is the root directory for all files. The individual data files are separated into 8 HDF5 files (extension .h5) named ani_gdb_s0x.h5 where x is a number between 1 and 8 representing the number of heavy atoms (CNO) in the molecules contained in the file. The README file contains information about the data set and scripts included. The folder named “readers” has a code sample for reading the HDF5 file called “example_data_sampler.py” and “lib/pyanitools.py”, which contains classes for loading and storing data in our in-house format. File format

The ANI-1 data set is stored in the HDF5 [http://www.hdfgroup.org/HDF5] file format. Two python classes are included with the data set’s compressed archive in the python file: “ANI-1_release/readers/lib/pyanitools.py”. These classes are only tested for python version 3.5 and greater, and requires the h5py library [http://www.h5py.org/]. An example script for reading the data from the HDF5 files is given in: “ANI-1_release/readers/example_data_sampler.py”.

Data Units

Coordinates: Angstroms Energies: Hartrees

Self-interaction atomic energies

H = -0.500607632585 C = -37.8302333826 N = -54.5680045287 O = -75.0362229210

Related work

Developing the ANI-1x potential with active learning:

Justin S. Smith, Ben Nebgen, Nicholas Lubbers, Olexandr Isayev, Adrian E. Roitberg. Less is more: sampling chemical space with active learning. arXiv, 2018, DOI: [arXiv:1801.09319] (https://arxiv.org/abs/1801.09319)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ANI1datatools-0.0.3.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

ANI1datatools-0.0.3-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file ANI1datatools-0.0.3.tar.gz.

File metadata

  • Download URL: ANI1datatools-0.0.3.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.12

File hashes

Hashes for ANI1datatools-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5b31ffa04d7e6a46d3264872a18d4e4e2c78165eded5f904f7d6cd824a27d128
MD5 a783c10f19d5e3150f140917526d1dcc
BLAKE2b-256 3db6efe2ec40d57448daed35d6e6ca27ef989ec7288b0a715e9251cf5ca64485

See more details on using hashes here.

File details

Details for the file ANI1datatools-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for ANI1datatools-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ce91345387ab75dc4e65c63af0313c9591beb9c7411c68481bacc8029aca2dc2
MD5 a807b4d207671e8453fe5791900fa676
BLAKE2b-256 81c051bbd21d73c59f112f1f707210edcb4d4b1774adbb72ba8872f22423036d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page