A command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures
Project description
🧪 MoleculaPy 🧪
A command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures.
Key Features • Installation • How To Use • Contact • Credits • License
MoleculaPy is a powerful command-line interface (CLI) application developed in Python, designed for chemoinformatics enthusiasts and researchers. Leveraging the renowned RDKit library, MoleculaPy empowers users to effortlessly compute a diverse set of molecular descriptors and fingerprints for compounds specified in the SMILES (Simplified Molecular Input Line Entry System) format, all within the convenience of their terminal.
Key Features
-
📝 SMILES Compatibility: MoleculaPy seamlessly processes chemical data in the SMILES format, the industry-standard notation for representing molecular structures.
-
🧬 Comprehensive Descriptors: The application provides an extensive set of molecular descriptors, in a total number of 209. This breadth empowers users to gain deep insights into the properties and characteristics of chemical compounds.
-
🔍 Fingerprint Generation: MoleculaPy offers robust functionality for generating molecular fingerprints, a critical component for tasks such as similarity analysis and virtual screening: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.
-
📁 CSV File Support: Import and process large datasets of compounds effortlessly with MoleculaPy's CSV file support, streamlining high-throughput data analysis.
-
🧪 Scientific Accuracy: MoleculaPy relies on the RDKit library, known for its scientific rigor and reliability in chemoinformatics, ensuring trustworthy results for research and analysis.
-
🖥️ User-Friendly Command Line: The CLI interface is designed to be user-friendly and intuitive, catering to both seasoned researchers and newcomers in the field.
-
🧂 Salt Removal Option: MoleculaPy offers users the flexibility to choose whether they want to remove salts from molecules during processing. This feature is particularly valuable when working with complex chemical datasets, allowing for cleaner and more accurate analyses.
-
📄 Logging for Transparency: MoleculaPy integrates a robust logging system that maintains detailed records of application activities. This ensures transparency and facilitates tasks such as debugging, progress tracking, auditing, and reproducibility.
Installation
To install this app, just type in your CLI the following command:
pip install moleculapy
Then make sure that the installation process went correctly by typing moleculepy -h
in the CLI.
>>> moleculepy -h
usage: MoleculaPy [-h] [--method {descriptors,fingerprints}] [--fp_type {Atom,MACCS,Morgan,Topological,RDKit}]
[--remove_salt | --no-remove_salt] [--n_bits N_BITS]
input_file output_file
Calculate molecular descriptors and fingerprints for molecules provided in a CSV file.
positional arguments:
input_file Path to the input file
output_file Path to the output file
options:
-h, --help show this help message and exit
--method {descriptors,fingerprints}
(Optional) Calculation method: descriptors or fingeprints (default: descriptors)
--fp_type {Atom,MACCS,Morgan,Topological,RDKit}
(Optional) Fingerprint type (default: Morgan)
--remove_salt, --no-remove_salt
(Optional) Remove salts from SMILE. (default: --remove_salt)
--n_bits N_BITS (Optional) Number of bits of a given fingerprints type (default: 2048)
How To Use
The application is fully compatible with Python 3.9+.
Setting Up
By default, the program requires two arguments: input_file
and output_file
. Both are paths - the CSV file containing SMILES molecules and the output file, respectively.
Suppose we have a file smiles_samples.csv
, which contains SMILES molecules (and other information, in this case it is not important). The column containing SMILES must be named "SMILES" (case-insensitive).
Calculate molecular descriptors
To calculate molecular descriptors, we do not need to specify optional parameters. Thus, it is sufficient that we call:
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_desc_output.csv
By default, MoleculaPy removes salts from chemical compounds, To oppose this, you must use the --no-remove_salt parameter
:
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_desc_output.csv --no-remove-salt
Calculate fingerprints
With MoleculaPy, you can calculate various n-dimensional vectors of molecules, known as fingerprints: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.
To do this, you need to take care of two optional arguments: --method
and --fp_type
. The first argument specifies the calculation method (molecular descriptors or fingerprints), and the second one -- the fingerprint type.
For example, if you want to calculate 2048-dimensional Morgan fingerprints:
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_morgan_output.csv --method fingerprints --fp_type Morgan
Atom, Morgan, RDKit and Topological compute as 2048-dimensional vectors by default, and MACCS computes as 166-dimensional vectors. If you want to change it, you can specify the another optional parameter --n_bits
.
For example, if you want to calculate 512-dimensional fingerprints vectors of Atom type:
moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_atom_output.csv --method fingerprints --fp_type Atom --n_bits 512
Logging
All calculations performed by the application are logged. The logs are stored in the logs
folder in the path where the application was installed. The path to the logs will be displayed in the CLI after the calculation session is completed.
Contact
If you have any problems, ideas or general feedback, please don't hesitate to contact me at kam.pytlak@gmail.com. I'd really appreciate it!
Credits
This software uses the following open source packages:
License
MIT
GitHub @kamilpytlak · LinkedIn kamil-pytlak
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file moleculapy-1.1.3.tar.gz
.
File metadata
- Download URL: moleculapy-1.1.3.tar.gz
- Upload date:
- Size: 119.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.9 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b01d133c5e6f4aed72bca0bc4b1ac456aec50ee23d7655038f3df23ca106a2d |
|
MD5 | 491dd3454c2048ae8b9c112611160741 |
|
BLAKE2b-256 | 6a4a7605f77a718a5943fc9dd38ef513053292fcf957e5ce7dedf26a643b3900 |
File details
Details for the file moleculapy-1.1.3-py3-none-any.whl
.
File metadata
- Download URL: moleculapy-1.1.3-py3-none-any.whl
- Upload date:
- Size: 126.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.9 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | af72ed8df853b41baf95e666120c4e60c711ec856b8433809a75b61d6a308b17 |
|
MD5 | 6a24b07274e0705309b5c23101c2d107 |
|
BLAKE2b-256 | 4686fc492d43631dcfe89f7377dbec99624d64ebabac78efb0b0bcdbd7b84725 |