Skip to main content

Purchaseable SMILES filter

Project description

molbloom

Can I buy this molecule? Returns results in about 500 ns and consumes about 100MB of RAM (or 2 GB if using all ZINC20).

pip install molbloom
from molbloom import buy
buy('CCCO')
# True
buy('ONN1CCCC1')
# False

If buy returns True - it may be purchasable with a measured error rate of 0.0003. If it returns False - it is not purchasable. The catalog information is from ZINC20. Add canonicalize=True if your SMILES are not canonicalized (requires installing rdkit).

There are other available catalogs - see options with molbloom.catalogs(). Most catalogs require an initial download. buy('CCCO', catalog='zinc-instock-mini) doesn't require a download and is included in the package. Useful for testing, but has a high false positive rate of 1%.

Querying Small World

To find similar purchasable molecules,

buy_similar('CCCO')

this will query ZINC Small World defaulting to the Enamine REAL-22Q1-4.5B database and return a list of hits and their similarities to the query via few different measures.

Custom Filter

Do you have your own list of SMILES? There are two ways to build a filter -- you can use a C tool that is very fast (1M / s) if your SMILES are in a file and already canonical. Or you can use the Python API to programmaticaly build a filter and canonicalize as you go. See below

Once built:

from molbloom import BloomFilter
bf = BloomFilter('myfilter.bloom')
# usage:
'CCCO' in bf

Build with C Tool

You can build your own filter using the code in the tool/ directory.

cd tool
make
./molbloom-bloom <MB of filter> <filter name> <approx number of compounds> <input file 1> <input file 2> ...

where each input file has SMILES on each line in the first column and is already canonicalized. The higher the MB, the lower the rate of false positives. If you want to choose the false positive rate rather than the size, you can use the equation:

$$ M = - \frac{N \ln \epsilon}{(\ln 2)^2} $$

where $M$ is the size in bits, $N$ is the number of compounds, and $\epsilon$ is the false positive rate.

Build with Python

from molbloom import CustomFilter, canon
bf = CustomFilter(100, 1000, 'myfilter')
bf.add('CCCO')
# canonicalize one
s = canon("CCCOC")
bf.add(s)
# save it
bf.save('test.bloom')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molbloom-1.0.0.tar.gz (9.2 MB view hashes)

Uploaded Source

Built Distributions

molbloom-1.0.0-pp39-pypy39_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

molbloom-1.0.0-pp38-pypy38_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

molbloom-1.0.0-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-pp37-pypy37_pp73-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

molbloom-1.0.0-cp311-cp311-musllinux_1_1_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

molbloom-1.0.0-cp311-cp311-musllinux_1_1_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

molbloom-1.0.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-cp311-cp311-macosx_11_0_arm64.whl (9.2 MB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

molbloom-1.0.0-cp311-cp311-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

molbloom-1.0.0-cp311-cp311-macosx_10_9_universal2.whl (9.2 MB view hashes)

Uploaded CPython 3.11 macOS 10.9+ universal2 (ARM64, x86-64)

molbloom-1.0.0-cp310-cp310-musllinux_1_1_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

molbloom-1.0.0-cp310-cp310-musllinux_1_1_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

molbloom-1.0.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-cp310-cp310-macosx_11_0_arm64.whl (9.2 MB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

molbloom-1.0.0-cp310-cp310-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

molbloom-1.0.0-cp310-cp310-macosx_10_9_universal2.whl (9.2 MB view hashes)

Uploaded CPython 3.10 macOS 10.9+ universal2 (ARM64, x86-64)

molbloom-1.0.0-cp39-cp39-musllinux_1_1_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

molbloom-1.0.0-cp39-cp39-musllinux_1_1_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

molbloom-1.0.0-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-cp39-cp39-macosx_11_0_arm64.whl (9.2 MB view hashes)

Uploaded CPython 3.9 macOS 11.0+ ARM64

molbloom-1.0.0-cp39-cp39-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

molbloom-1.0.0-cp39-cp39-macosx_10_9_universal2.whl (9.2 MB view hashes)

Uploaded CPython 3.9 macOS 10.9+ universal2 (ARM64, x86-64)

molbloom-1.0.0-cp38-cp38-musllinux_1_1_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

molbloom-1.0.0-cp38-cp38-musllinux_1_1_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

molbloom-1.0.0-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-cp38-cp38-macosx_11_0_arm64.whl (9.2 MB view hashes)

Uploaded CPython 3.8 macOS 11.0+ ARM64

molbloom-1.0.0-cp38-cp38-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

molbloom-1.0.0-cp38-cp38-macosx_10_9_universal2.whl (9.2 MB view hashes)

Uploaded CPython 3.8 macOS 10.9+ universal2 (ARM64, x86-64)

molbloom-1.0.0-cp37-cp37m-musllinux_1_1_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

molbloom-1.0.0-cp37-cp37m-musllinux_1_1_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

molbloom-1.0.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-cp37-cp37m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-cp37-cp37m-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

molbloom-1.0.0-cp36-cp36m-musllinux_1_1_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ x86-64

molbloom-1.0.0-cp36-cp36m-musllinux_1_1_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.6m musllinux: musl 1.1+ i686

molbloom-1.0.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.3 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.5+ x86-64

molbloom-1.0.0-cp36-cp36m-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.17+ i686 manylinux: glibc 2.5+ i686

molbloom-1.0.0-cp36-cp36m-macosx_10_9_x86_64.whl (9.2 MB view hashes)

Uploaded CPython 3.6m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page