Skip to main content

Purchaseable SMILES filter

Project description

molbloom

Can I buy this molecule? Returns results in about 500 ns and consumes about 100MB of RAM (or 2 GB if using all ZINC20).

pip install molbloom
from molbloom import buy
buy('CCCO')
# True
buy('ONN1CCCC1')
# False

If buy returns True - it may be purchasable with a measured error rate of 0.0003. If it returns False - it is not purchasable. The catalog information is from ZINC20. Add canonicalize=True if your SMILES are not canonicalized (requires installing rdkit).

There are other available catalogs - see options with molbloom.catalogs(). Most catalogs require an initial download. buy('CCCO', catalog='zinc-instock-mini) doesn't require a download and is included in the package. Useful for testing, but has a high false positive rate of 1%.

Simple Reagents

By default, it will first check against common organic reagents like water, ether, etc. You can disable this check by adding check_common=False

Querying Small World

Just because buy returns True doesn't mean you can buy it -- you should follow-up with a real query at ZINC or you can use the search feature in SmallWorld to find similar purchasable molecules.

from smallworld_api import SmallWorld
sw = SmallWorld()

aspirin = 'O=C(C)Oc1ccccc1C(=O)O'
results = sw.search(aspirin, dist=5, db=sw.REAL_dataset)

this will query ZINC Small World.

Custom Filter

Do you have your own list of SMILES? There are two ways to build a filter -- you can use a C tool that is very fast (1M / s) if your SMILES are in a file and already canonical. Or you can use the Python API to programmaticaly build a filter and canonicalize as you go. See below

Once your custom filter is built:

from molbloom import BloomFilter
bf = BloomFilter('myfilter.bloom')
# usage:
'CCCO' in bf

Build with C Tool

You can build your own filter using the code in the tool/ directory.

cd tool
make
./molbloom-bloom <MB of final filter> <filter name> <approx number of compounds> <input file 1> <input file 2> ...

where each input file has SMILES on each line in the first column and is already canonicalized. The higher the MB, the lower the rate of false positives. If you want to choose the false positive rate rather than the size, you can use the equation:

$$ M = - \frac{N \ln \epsilon}{(\ln 2)^2} $$

where $M$ is the size in bits, $N$ is the number of compounds, and $\epsilon$ is the false positive rate.

Build with Python

You can also build a filter using python as follows:

from molbloom import CustomFilter, canon
bf = CustomFilter(size=100, n=1000, name='myfilter')
bf.add('CCCO')
# canonicalize one record
s = canon("CCCOC")
bf.add(s)
# finalize filter into a file
bf.save('test.bloom')

Citation

@article{medina2023bloom,
  title={Bloom filters for molecules},
  author={Medina, Jorge and White, Andrew D},
  journal={Journal of Cheminformatics},
  volume={15},
  number={1},
  pages={95},
  year={2023},
  publisher={Springer}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molbloom-3.2.0.tar.gz (9.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

molbloom-3.2.0-cp313-cp313-win_amd64.whl (9.2 MB view details)

Uploaded CPython 3.13Windows x86-64

molbloom-3.2.0-cp313-cp313-win32.whl (9.2 MB view details)

Uploaded CPython 3.13Windows x86

molbloom-3.2.0-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

molbloom-3.2.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

molbloom-3.2.0-cp313-cp313-macosx_11_0_arm64.whl (9.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

molbloom-3.2.0-cp312-cp312-win_amd64.whl (9.2 MB view details)

Uploaded CPython 3.12Windows x86-64

molbloom-3.2.0-cp312-cp312-win32.whl (9.2 MB view details)

Uploaded CPython 3.12Windows x86

molbloom-3.2.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

molbloom-3.2.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

molbloom-3.2.0-cp312-cp312-macosx_11_0_arm64.whl (9.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

molbloom-3.2.0-cp311-cp311-win_amd64.whl (9.2 MB view details)

Uploaded CPython 3.11Windows x86-64

molbloom-3.2.0-cp311-cp311-win32.whl (9.2 MB view details)

Uploaded CPython 3.11Windows x86

molbloom-3.2.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

molbloom-3.2.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (9.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

molbloom-3.2.0-cp311-cp311-macosx_11_0_arm64.whl (9.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file molbloom-3.2.0.tar.gz.

File metadata

  • Download URL: molbloom-3.2.0.tar.gz
  • Upload date:
  • Size: 9.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molbloom-3.2.0.tar.gz
Algorithm Hash digest
SHA256 ff73be3f31c3214b04033ea148a5781f66fb5a512b0641f90c35507d2c5aca82
MD5 9a6442bcc5931bb85ec9f1f5f37e9c2a
BLAKE2b-256 f0a55f116881a548b65a8151eda981d1546a751577335a0fcf9631382684e7ee

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: molbloom-3.2.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 9.2 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molbloom-3.2.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 e18482ea9802467c5388748f300d0881f2a7a7c293443ad5b5dc9b97b54a0ca1
MD5 24113f9fe52b38acd612a6757f6b8909
BLAKE2b-256 af91009790db69cc2777d62ed0574017066fba2f1462f3cec01c486b543bad73

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp313-cp313-win32.whl.

File metadata

  • Download URL: molbloom-3.2.0-cp313-cp313-win32.whl
  • Upload date:
  • Size: 9.2 MB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molbloom-3.2.0-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 e11481c12072284b97d0dd6d9a4c0583b03f4a3bad5a4a082c958f5323e828ab
MD5 805ab984885665f2cf89f56e5cee9564
BLAKE2b-256 2037583c14c1217d390132da54c04ada1899ff09d6e5f8064755822454e60cf9

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp313-cp313-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e63635eddb6c380804b008cc8783c048e31d6acd9cb3bc393469c6fdaeb104a6
MD5 ed3baf6595491cab745b84fd746ef2a2
BLAKE2b-256 70605cf47040e66c1370b366b9658ca600872333cb06772a4ec823866b1b925a

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 039a854334e76345283ca1edff7cde23068645f8c03446339b85a51fd0b70940
MD5 43dcc8c89dfab0d8a37a79edbae3b018
BLAKE2b-256 26a0548c859a58b74d356c0edaf3d6a1f52bdaa3094cb708a77c96f949b128bc

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c180514339572e55c47532b2fb1268f9a1e2b739e0ba02dab95a376072dd06df
MD5 76b172ff0e44b50bea54d337832e0a61
BLAKE2b-256 f975c624a7c426f091b6c20a43a3e44a1a5696f8b89f12ab46df4f7b8245b7db

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: molbloom-3.2.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 9.2 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molbloom-3.2.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 9c32f22d82caae65e3bb0bf09e6a705ecf05881dce3cc001acf3648e893da2e3
MD5 4600eb3f1cd43dd82ff63e06b47eebc9
BLAKE2b-256 23eb6c8df9be34b723b4c560c574271abeff192741d9a008a15e5adc207e56b4

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp312-cp312-win32.whl.

File metadata

  • Download URL: molbloom-3.2.0-cp312-cp312-win32.whl
  • Upload date:
  • Size: 9.2 MB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molbloom-3.2.0-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 d5481ed22ee123a35612203dc55faff540fa6651fd62299a374668f4acef2c14
MD5 74c61025e701294fc0c40d5ae8e76b20
BLAKE2b-256 8760aec2331ddd1595fbad38ff7432e046f13fa4c91e03a4a17aec46cf5be663

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b7a122a8c8009e3f409b09dce4ff82354f49600b7df3da0c62c1ea8e9b4e61d9
MD5 52f7880618044f765e21c7c015e4a3ff
BLAKE2b-256 617cc840c3f6e1bcee87d43e8ce2776602d96ef74db027fd5b034d94c3607fae

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 39052799fb1ecb16a97963c73eb2544ab08911230bb31e6732f70439cba4788c
MD5 f4d1d1e09582e226f4a056c2e8f5990d
BLAKE2b-256 d35226d31dc9b7991973e772d2ed5f7914d970dbbadd7136899a33d01fb19968

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2c7ec4292e6fb818b1d020f060036ab453d4d20f43b5c706efbe7955519d0fce
MD5 ecb1f697df1982a62228b1fcb4c444fd
BLAKE2b-256 cce304d95589d9336e5209287ca879c2f66e14fbe5e9fd9ed91de432513b80a5

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: molbloom-3.2.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 9.2 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molbloom-3.2.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5fabdd0450e2c27878101a7fa403a83dc88662611a8b78cce4f6919a271e89b1
MD5 2c86a9158ff32f642269249e1293aed0
BLAKE2b-256 e1b43e7ea765bfd07af7fcf5f07ed8cac7fffbf318aa898b3f7c28228d013f6e

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp311-cp311-win32.whl.

File metadata

  • Download URL: molbloom-3.2.0-cp311-cp311-win32.whl
  • Upload date:
  • Size: 9.2 MB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for molbloom-3.2.0-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 d352ca169137a82a1f73f52a722566ee74874f601855ee9c2a2afe7679f5a80a
MD5 298346bba7f6526ea203a742545661f0
BLAKE2b-256 24896afd05e4dacdd0f9ca6347214230f9ce2bc5df1bddc175750fc53405b3c0

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 84d0557300c6f0a0068927c6498ac1ecedd3984a92f582fc527e9c86371a0a70
MD5 fc923a4e39d3533d7d3f06002a63a299
BLAKE2b-256 7ed5b12dfdb195c81a1889c653ead31783fceacaf533b18551f2e73f4a70a69d

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 f81a7aeff94eae51b2337d564654909c70dc3f0ebadd015f6553c9c401cf468d
MD5 a98fc74fc6ac6e46b6950b56eb482045
BLAKE2b-256 83b4bfd99e0ed25e53745a5ecd43136f82b0444939f042131e8053ac070b2c40

See more details on using hashes here.

File details

Details for the file molbloom-3.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for molbloom-3.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ce21b49204215ffdef67c63ecf2cdd4c13da5a1d21c29a35ce8a1a4007863c3d
MD5 98392757f4378491d139e3f23c414c50
BLAKE2b-256 6b5ab03f9055aaa7bc76ed5efdb0e7b70921817e3dda98fbcbd9ef1b63bd682c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page