Skip to main content

A small package for loading data from MoleculeNet

Project description

molnet-python

A easy-to-use reimplementation of deepchem's data loading library, with minimal dependencies. With this package, you can get rid of long warnings when you use deepchem. Also, you could recognize what is going on quickly after a short time of code reviewing.

This simple library could load and split data according to MoleculeNet.

This code is first written for my own, since it is tedious that you need to code the same data-loading thing every time. So, if you find this before you get to work on molecule property prediction tasks, you're lucky and please enjoy yourself!

Installation

Currently, this package could be installed only by pip, via

pip install molnet-python

NOTE: one of dependencies, rdkit package could be installed only by conda, so please install rdkit before or after install this package.

How to use

You could get splitted datasets directly from load function. The supported dataset names is listed in molnet_config.py file, except PCBA, all other datasets are supported.

import molnet

datasets = molnet.load(name, datadir, save_whole_dataset=False,
                       save_split=False, split=None, seed=None)
  • name: dataset name
  • datadir: where to save downloaded, extracted & cached dataset files
  • save_whole_dataset: whether save whole dataset as a pickle binary file, useful when you have a large amount of SMILES but you need rdkit.Chem.Mol
  • save_split: whether to save splitted dataset. This guarantees the consistency between different runs.
  • split: do the corresponding data splitting
    • (float, float, float): train valid test split, return 3 datasets
    • float: train test split, returns 2 datasets
    • int: K-fold cross validation split, returns K datasets
  • seed: seed for numpy (this is useless when dataset need Scaffold split)

If you want to use lower-level functions, please review the code. I promise it won't take you more than half an hour :-)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molnet-python-0.0.9.tar.gz (18.0 kB view details)

Uploaded Source

File details

Details for the file molnet-python-0.0.9.tar.gz.

File metadata

  • Download URL: molnet-python-0.0.9.tar.gz
  • Upload date:
  • Size: 18.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.6.9

File hashes

Hashes for molnet-python-0.0.9.tar.gz
Algorithm Hash digest
SHA256 4106c8f92b0f1fadb3ab1e76cc3fbbdc485dce747959d85ec4aac515a9abc510
MD5 db6e2250699e4e7c261f7aeb445a3dc6
BLAKE2b-256 7f85940dd7f8fa57754cd8198e3a7e2c53be53d222de2258e0776546c7128dd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page