Skip to main content

A Simple Yet Effective Scanning Tunnel Microscope Image Simulator

Project description

SiPMai: A Simple Yet Effective Scanning Probe Microscope Auto Image Generator for Deep Learning

This project provides a streamlined pipeline for generating and handling molecular data, specifically for use in machine learning models. The toolkit involves generating SMILES representations, using ray tracing to create images, and preparing dataset indices for training, validation, and testing sets.

Table of Contents

  1. Project Description
  2. Installation
  3. Usage
  4. Credits
  5. License

Project Description

This repository contains several Python scripts that together form a pipeline for the generation and management of molecular data. Specifically, it includes:

  1. gen_data/smile_generation.py: A script for generating SMILES representations of molecules. It requires a CSV file containing molecule data as input and produces a JSON file containing the generated SMILES strings.
  2. gen_data/ray_generation.py: A script that uses ray tracing to generate images of the molecules described by the SMILES strings. It has several options for customization, such as resolution, blur, and the use of motion blur and gaussian noise.
  3. gen_data/prepare_dataset.py: A script that creates indices for the generated molecules and splits them into training, validation, and testing sets. It creates JSON files containing these indices.

The scripts are designed to be used in sequence, but can also be used independently if needed.

Installation

This project is written in Python and requires the following Python libraries:

Please note that Python > 3.10 is not supported (due to Ray).

        "numpy>=1.16,<1.24",
        "torch>=1.4.0",
        "packaging",
        "tqdm",
        "scikit-learn",
        "matplotlib",
        "scipy",
        "pandas",
        "opencv-python",
        "numba",
        "rdkit",
        "ray",

You can install these libraries using pip:

pip install SiPMai

or build from source:

Open your terminal and execute the following command:

git clone https://github.com/GilesLuo/SiPMai.git
cd SiPMai
python setup.py install

Usage

Data generation:

In a ternimal, do:

generate_pubchem

It will generate a 100k dataset for molecules with 39<=num_atom<=200 in the command execution directory.

You may modify the generation configuration by doing:

generate_pubchem --your_args

Please refer to SiPMai/gen_data/gen_all_data_pipeline.py for a complete list of arguments.

Equivalently, you can call the main() function directly from a python script, such as:

import SiPMai
from SiPMai.gen_data.gen_all_data_pipeline import gen_all_data, main

main()  # generate with preset arguments

# or 

from SimpTM.gen_data.gen_all_data_pipeline import gen_all
gen_all_data(many_args)   # generate with user-defined arguments

I get MemoryError: Unable to allocate internal buffer.

This is typically because our code, by default, use all your CPU cores for generation. You're trying to serialize is too large to fit into memory, or if your system is running low on available memory.

A simple fix is to set num_cpus properly. For example, you may use

generate_pubchem --num_cpus 4

Loading Data

We also provide a Pytorch DataLoader template to load the generated datasets. Details please refer to SiPMai/utils/dataloader.

More features are under development. Please feel free to raise issues and participate in developing this tool.

Credits

This project was made possible thanks to the contributions of the team members and the use of multiple open-source libraries.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SiPMai-0.0.20.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

SiPMai-0.0.20-py3-none-any.whl (7.1 MB view details)

Uploaded Python 3

File details

Details for the file SiPMai-0.0.20.tar.gz.

File metadata

  • Download URL: SiPMai-0.0.20.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for SiPMai-0.0.20.tar.gz
Algorithm Hash digest
SHA256 95f755adb22b7c515fe5d2ed3a4231126dbd13a08632a95ecd50ecf2624459cf
MD5 815604bb1561db21050cc8946091e535
BLAKE2b-256 6b1980fd3c7bcee8a620fd02b9a97395ae78cb9fd8c9d53315789342d6d8eb3f

See more details on using hashes here.

File details

Details for the file SiPMai-0.0.20-py3-none-any.whl.

File metadata

  • Download URL: SiPMai-0.0.20-py3-none-any.whl
  • Upload date:
  • Size: 7.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for SiPMai-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 dbf27627d1139464790eb68b6e1881bd17376cffa146df5a8c1e877d80bfcb7c
MD5 54d8b05023dac541065db810e86c7a48
BLAKE2b-256 993a23600deab2b118863e4bd508371dd07c44c3619bb7dcc93bc7271737ef0f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page