Skip to main content

DECIMER 2.0: Deep Learning for Chemical Image Recognition using Efficient-Net V2 + Transformer

Project description

DECIMER Image Transformer V2: Deep Learning for Chemical Image Recognition using Efficient-Net V2 + Transformer

License Maintenance GitHub issues GitHub contributors DOI Documentation Status GitHub release PyPI version fury.io

Abstract

The DECIMER 2.0 [5] (Deep lEarning for Chemical ImagE Recognition) project [1] was launched to address the OCSR problem with the latest computational intelligence methods to provide an automated open-source software solution.

The original implementation of DECIMER[1] using GPU takes a longer training time when we use a bigger dataset of more than 1 million images. To overcome these longer training times, many implement the training script to work on multiple GPUs. However, we tried to step up and implemented our code to use Google's Machine Learning hardware TPU(Tensor Processing Unit) [2]. You can learn more about the hardware here.

GitHub Logo

Method and model changes

  • The DECIMER now uses EfficientNet-V2[3] for Image feature extraction and a transformer model [4] for predicting the SMILES.
  • The SMILES used during training and predictions

Changes in the training method

  • We converted our datasets into TFRecord Files, A binary file system the TPUs can read in a much faster way. Also, we can use these files to train on GPUs. Using the TFRecord helps us train the model fast by overcoming the bottleneck of reading multiple files from the hard disks.
  • We moved our data to Google Cloud Buckets. An efficient storage solution provided by the google cloud environment where we can access these files from any google cloud VMs easily and in a much faster way. (To get the highest speed, the cloud storage and the VM should be in the same region)
  • We adopted the TensorFlow data pipeline to load all TFRecord files to the TPUs from Google Cloud Buckets.
  • We modified the main training code to work on TPUs using TPU strategy introduced in Tensorflow 2.0.

How to use DECIMER?

  • Python package Documentation
  • Model library could be found here: DOI

We suggest using DECIMER inside a Conda environment, which makes the dependencies install easily.

  • Conda can be downloaded as part of the Anaconda or the Miniconda platforms (Python 3.7). We recommend installing miniconda3. Using Linux, you can get it with:
$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh

Instructions

Python Package Installation

Use a conda environment for clean installation

$ sudo apt update
$ sudo apt install unzip
$ conda create --name DECIMER
$ conda activate DECIMER
$ conda install pip
$ python3 -m pip install -U pip

Install the latest code from GitHub with:

$ pip install git+https://github.com/Kohulan/DECIMER-Image_Transformer.git

Install in development mode with:

$ git clone https://github.com/Kohulan/DECIMER-Image_Transformer.git decimer
$ cd decimer/
$ pip install -e.
  • Where -e means "editable" mode.

Install from PyPi

$ pip install decimer

How to use inside your own python script

from DECIMER import predict_SMILES

# Chemical depiction to SMILES translation
image_path = "path/to/imagefile"
SMILES = predict_SMILES(image_path)
print(SMILES)

Install tensorflow==2.7.1 if you do not have an Nvidia GPU (On Mac OS)

License:

  • This project is licensed under the MIT License - see the LICENSE file for details

Citation

References

  1. Rajan, K., Zielesny, A. & Steinbeck, C. DECIMER: towards deep learning for chemical image recognition. J Cheminform 12, 65 (2020). https://doi.org/10.1186/s13321-020-00469-w
  2. Norrie T, Patil N, Yoon DH, Kurian G, Li S, Laudon J, Young C, Jouppi N, Patterson D (2021) The Design Process for Google's Training Chips: TPUv2 and TPUv3. IEEE Micro 41:56–63
  3. Tan M, Le QV (2021) EfficientNetV2: Smaller Models and Faster Training. arXiv [cs.CV]
  4. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention Is All You Need. arXiv [cs.CL]
  5. Rajan, K., Zielesny, A. & Steinbeck, C. DECIMER 1.0: deep learning for chemical image recognition using transformers. J Cheminform 13, 61 (2021). https://doi.org/10.1186/s13321-021-00538-8

Acknowledgement

  • We thank Charles Tapley Hoyt for his valuable advice and help in improving the DECIMER repository.
  • We are grateful for the company @Google making free computing time on their TensorFlow Research Cloud infrastructure available to us.

Author: Kohulan

GitHub Logo

Project Website:

Research Group

GitHub Logo

Alt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decimer-2.0.2.tar.gz (65.7 kB view details)

Uploaded Source

Built Distribution

decimer-2.0.2-py3-none-any.whl (85.2 kB view details)

Uploaded Python 3

File details

Details for the file decimer-2.0.2.tar.gz.

File metadata

  • Download URL: decimer-2.0.2.tar.gz
  • Upload date:
  • Size: 65.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for decimer-2.0.2.tar.gz
Algorithm Hash digest
SHA256 499a72c05c186ff0913f5e42e2a6912a690e12143942374a85a6b95d552ca4b8
MD5 8d5236488d0b32151c70a2ff81f1fecc
BLAKE2b-256 0b08d7ea100297a1a3b356fd82b6065f10cad885c3e38ef5e328a2ae62761873

See more details on using hashes here.

File details

Details for the file decimer-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: decimer-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 85.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for decimer-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cd22b1646e423eb1d3ccc09178f43b1942d2231a1aad1c161a2677c1c9b8765d
MD5 dac56283ed1c27b58bac43440ef1597c
BLAKE2b-256 1ed831c07f9714ddcc657c91e4d0f02e48a4cabd167a1b55d3bb33f92b278594

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page