Skip to main content

Streamlit app to explore chemical clustering!

Project description

Logo ChemCluster

- ChemCluster -

ChemCluster is an interactive web application for cheminformatics and molecular analysis, focusing on forming and visualizing molecular clusters built using Streamlit, RDKit, and scikit-learn.

Final project for the course Practical Programming in Chemistry — EPFL CH-200

📦 Package overview

ChemCluster is an interactive cheminformatics platform developed at EPFL in 2025 as part of the Practical Programming in Chemistry course. It is a user-friendly web application designed to explore and analyze chemical structures, either individually via the formation of conformers or as datasets.

This tool enables users to compute key molecular properties, visualize 2D and 3D structures, and perform clustering based on molecular similarity or conformer geometry. It also offers filtering options to help select clusters matching specific physicochemical criteria.

🌟 Features

  • Load molecule files (.sdf, .mol) or SMILES in .csv format
  • Compute key molecular descriptors (MW, logP, TPSA, H-bond donors/acceptors, etc.)
  • Visualize 2D molecular structures with RDKit
  • Generate 3D conformers and visualize them interactively using Py3Dmol
  • Apply PCA for dimensionality reduction
  • Cluster molecules using KMeans with automatic silhouette score optimization
  • Click to view molecular properties directly from PCA plot
  • Export clusters and molecular data to .csv

🛠️ Installation

  1. Clone the repository:
git clone https://github.com/erubbia/ChemCluster.git
cd ChemCluster
  1. Create and activate the conda environment:
conda env create -f environment.yml
conda activate chemcluster-env
  1. Run the Streamlit application:
streamlit run app.py

📖 Usage

After launching the app, access it via Streamlit’s local interface.

You can:

  • Analyze a single molecule by inputting a SMILES string or drawing the structure
  • Upload a dataset of molecules to perform PCA and clustering
  • Click on any point in the scatter plot to view its structure and properties
  • Use filters to identify clusters with desirable properties (e.g., high LogP, low MW)
  • Export selected clusters as CSV files for further analysis

📂 License

MIT License


👨‍🔬 Developers

  • Elisa Rubbia, Master's student in Molecular and Biological Chemistry at EPFL GitHub - erubbia

  • Romain Guichonnet, Master's student in Molecular and Biological Chemistry at EPFL GitHub - Romainguich

  • Flavia Zabala Perez, Master's student in Molecular and Biological Chemistry at EPFL GitHub - Flaviazab

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemcluster-0.1.1.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chemcluster-0.1.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file chemcluster-0.1.1.tar.gz.

File metadata

  • Download URL: chemcluster-0.1.1.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for chemcluster-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b5115444fbc44f3d094f7889f648c8e1a50f2d41102683af27f8b61107eba35a
MD5 a3b0f0915c79d9e6101adaf913a8ac40
BLAKE2b-256 84c8ff5e56b69b17ede9494dbf4c5fd413cf81d1ee8e5ebfd7bca3289ae716c0

See more details on using hashes here.

File details

Details for the file chemcluster-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: chemcluster-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for chemcluster-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e289a2024d05c2bb710278ed5d8a846a0a97321d32bf3076591d70ccd58f63aa
MD5 38fff2b870ea1d8e6ab9c37cc74d4a14
BLAKE2b-256 e87b71061808c2efc30cf88b858d57b7e0733ac6e08a635cfb462a37882dd107

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page