Purchaseable SMILES filter
Project description
molbloom
Can I buy this molecule? Returns results in about 500 ns and consumes about 100MB of RAM (or 2 GB if using all ZINC20).
pip install molbloom
from molbloom import buy
buy('CCCO')
# True
but('ONN1CCCC1')
# False
If buy
returns True
- it may be purchasable with a measured error rate of 0.0003. If it returns False
- it is not purchasable.
The catalog information is from ZINC20. Add canonicalize=True
if your SMILES are not canonicalized (requires installing rdkit).
If you want to look at the broader catalog of all molecules that are not in stock:
buy('CCCO', instock=False)
the reference for that is all ZINC20 from October 2021. On first execution of instock=False
it will download 2.0 GB of data to a cache directory.
Custom Filter
Do you have your own list of SMILES? There are two ways to build a filter -- you can use a C tool that is very fast (1M / s) if your SMILES are in a file and already canonical. Or you can use the Python API to programmaticaly build a filter and canonicalize as you go. See below
Once built:
from molbloom import BloomFilter
bf = BloomFilter('myfilter.bloom')
# usage:
'CCCO' in bf
Build with C Tool
You can build your own filter using the code in the tool/
directory.
cd tool
make
./molbloom-bloom <MB of filter> <filter name> <approx number of compounds> <input file 1> <input file 2> ...
where each input file has SMILES on each line in the first column and is already canonicalized. The higher the MB, the lower the rate of false positives. If you want to choose the false positive rate rather than the size, you can use the equation:
$$ M = - \frac{N \ln \epsilon}{(\ln 2)^2} $$
where $M$ is the size in bits, $N$ is the number of compounds, and $\epsilon$ is the false positive rate.
Build with Python
from molbloom import CustomFilter, canon
bf = CustomFilter(100, 1000, 'myfilter')
bf.add('CCCO')
# canonicalize one
s = canon("CCCOC")
bf.add(s)
# save it
bf.save('test.bloom')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for molbloom-0.2.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bea49ff6d166be582a73c5f6a16f8e1b8e7dd4116b4cc4c5097fc153b1d11d82 |
|
MD5 | 2c200ce9dcdaa803649cbf0fac032b9a |
|
BLAKE2b-256 | cda44d20e893b21d94b91465b45c9ca699e1e53a6adca2e6196fa934b4053611 |
Hashes for molbloom-0.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 648bffb9abe144ee35503de11a63968fc9ef97148176a8dff7c06c2308eb439c |
|
MD5 | a41404f59f25465df8b49c2b6d0c6b60 |
|
BLAKE2b-256 | eeb7e4c73094520a2369786b33af6a83349518f1baa73c7a7d4485e6c3e3f6b4 |
Hashes for molbloom-0.2.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69330ead378eac110bf29c990b60bc8e993d7635a60a7e2ed70c6f523162bba1 |
|
MD5 | b1df9b94ed8cab87218aba95e3a7845d |
|
BLAKE2b-256 | 9df558e158ea21e76f98785aa96e86668189eca71c4f52ba10134bdfe58fce61 |
Hashes for molbloom-0.2.0-cp311-cp311-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 608ca865bb2c69360c3805d765aea3bb6836da82548c4af043a85def4e85ab5c |
|
MD5 | d57e88f7891b52146e472fbd3313e468 |
|
BLAKE2b-256 | 95b74d0889b327cab6093ad939ddb3d2a2b47811990c06cfd1af7da41efff081 |
Hashes for molbloom-0.2.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 85cba9fdb4657b9555b08450614f8682ed17dfdcb3c4d1e78cfde8f25d31c3b9 |
|
MD5 | 6d14ce27d0f19198ddb3dcd88e580ad4 |
|
BLAKE2b-256 | d739f23e0b685f09e457bcfdffbb9b34474ff0a1b2324abbce3ea9afe6916ddf |
Hashes for molbloom-0.2.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2c15887217bb59b83731c1ecfebf1313b1cc554565987c18dd9cc1b0776caca |
|
MD5 | 759dd619efb5eb18529cc77eddabb6a5 |
|
BLAKE2b-256 | ed7333654168807498567c53e76360218aee8fa02ae6d30b7fedbd90380167d2 |
Hashes for molbloom-0.2.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f162bf8bf7b92e75e570269e970d90cdc98602457bab489401453f90fdb4cc0a |
|
MD5 | 542ce9d35e2c4dc6d55524fa6ef9682e |
|
BLAKE2b-256 | 6b915db2090d4d75e509a290cca692603107c5919d10b9503083f5d10586910b |
Hashes for molbloom-0.2.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9ad75c74ef6bc15e4861b02802ab66c8253dde6ff2271991dbb26d656a4538aa |
|
MD5 | f28572a28b816eaf8845ea7083a2278d |
|
BLAKE2b-256 | 95c104033a5b35a22fe587d40b9681bc5bdfd5476c0a6c6df3f8b7482b6722cb |
Hashes for molbloom-0.2.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a504ffe47002ed4dafe82197e128de408308b36d3744a552fb9a46f7e768f24 |
|
MD5 | 2657ee9cd307ca0a3d8ca078401ae3f6 |
|
BLAKE2b-256 | ddc8abd1dd168fee6e3cbfcd63f5a1113dfc0b60a5ee6efe4be2ee90a0cc94c1 |
Hashes for molbloom-0.2.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dc74d177bba5e0f07d1d224ff91ae9f2e307d390473a17251709b5ca0bad292e |
|
MD5 | 15406589e888a1e76bb270b11c4c4f0b |
|
BLAKE2b-256 | 6dda6cf539205eda6012beeda0d5f8ada5b28b466569dfa58ec8aec32917a8f3 |
Hashes for molbloom-0.2.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 032f8f5e93720bd7b09ffb981f31000b432b7a7eb656f8ba525e57c02b6ced61 |
|
MD5 | 94622cc376fa7c16cda94e6da8487c47 |
|
BLAKE2b-256 | 925a3e9df0d1ff0d9900714fdbd1f44ec92c541016b56fff123edc3a1b7a4594 |