Skip to main content

A command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures

Project description

🧪 MoleculaPy 🧪

A command-line application that utilizes the RDKit library to compute molecular descriptors and fingerprints, aiding in the analysis and characterization of chemical structures.

Key FeaturesInstallationHow To UseContactCreditsLicense

MoleculaPy is a powerful command-line interface (CLI) application developed in Python, designed for chemoinformatics enthusiasts and researchers. Leveraging the renowned RDKit library, MoleculaPy empowers users to effortlessly compute a diverse set of molecular descriptors and fingerprints for compounds specified in the SMILES (Simplified Molecular Input Line Entry System) format, all within the convenience of their terminal.

Key Features

  • 📝 SMILES Compatibility: MoleculaPy seamlessly processes chemical data in the SMILES format, the industry-standard notation for representing molecular structures.

  • 🧬 Comprehensive Descriptors: The application provides an extensive set of molecular descriptors, in a total number of 209. This breadth empowers users to gain deep insights into the properties and characteristics of chemical compounds.

  • 🔍 Fingerprint Generation: MoleculaPy offers robust functionality for generating molecular fingerprints, a critical component for tasks such as similarity analysis and virtual screening: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.

  • 📁 CSV File Support: Import and process large datasets of compounds effortlessly with MoleculaPy's CSV file support, streamlining high-throughput data analysis.

  • 🧪 Scientific Accuracy: MoleculaPy relies on the RDKit library, known for its scientific rigor and reliability in chemoinformatics, ensuring trustworthy results for research and analysis.

  • 🖥️ User-Friendly Command Line: The CLI interface is designed to be user-friendly and intuitive, catering to both seasoned researchers and newcomers in the field.

  • 🧂 Salt Removal Option: MoleculaPy offers users the flexibility to choose whether they want to remove salts from molecules during processing. This feature is particularly valuable when working with complex chemical datasets, allowing for cleaner and more accurate analyses.

  • 📄 Logging for Transparency: MoleculaPy integrates a robust logging system that maintains detailed records of application activities. This ensures transparency and facilitates tasks such as debugging, progress tracking, auditing, and reproducibility.

Installation

To install this app, just type in your CLI the following command:

pip install moleculapy

Then make sure that the installation process went correctly by typing moleculepy -h in the CLI.

>>> moleculepy -h

usage: MoleculaPy [-h] [--method {descriptors,fingerprints}] [--fp_type {Atom,MACCS,Morgan,Topological,RDKit}]
                  [--remove_salt | --no-remove_salt] [--n_bits N_BITS]
                  input_file output_file

Calculate molecular descriptors and fingerprints for molecules provided in a CSV file.

positional arguments:
  input_file            Path to the input file
  output_file           Path to the output file

options:
  -h, --help            show this help message and exit
  --method {descriptors,fingerprints}
                        (Optional) Calculation method: descriptors or fingeprints (default: descriptors)
  --fp_type {Atom,MACCS,Morgan,Topological,RDKit}
                        (Optional) Fingerprint type (default: Morgan)
  --remove_salt, --no-remove_salt
                        (Optional) Remove salts from SMILE. (default: --remove_salt)
  --n_bits N_BITS       (Optional) Number of bits of a given fingerprints type (default: 2048)

How To Use

The application is fully compatible with Python 3.9+.

Setting Up

By default, the program requires two arguments: input_file and output_file. Both are paths - the CSV file containing SMILES molecules and the output file, respectively.

Suppose we have a file smiles_samples.csv, which contains SMILES molecules (and other information, in this case it is not important). The column containing SMILES must be named "SMILES" (case-insensitive).

Calculate molecular descriptors

To calculate molecular descriptors, we do not need to specify optional parameters. Thus, it is sufficient that we call:

moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_desc_output.csv

By default, MoleculaPy removes salts from chemical compounds, To oppose this, you must use the --no-remove_salt parameter:

moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_desc_output.csv --no-remove-salt

Calculate fingerprints

With MoleculaPy, you can calculate various n-dimensional vectors of molecules, known as fingerprints: n-dimensional Atom, Morgan, RDKit, Topological and 166-dimensional MACCS.

To do this, you need to take care of two optional arguments: --method and --fp_type. The first argument specifies the calculation method (molecular descriptors or fingerprints), and the second one -- the fingerprint type.

For example, if you want to calculate 2048-dimensional Morgan fingerprints:

moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_morgan_output.csv --method fingerprints --fp_type Morgan

Atom, Morgan, RDKit and Topological compute as 2048-dimensional vectors by default, and MACCS computes as 166-dimensional vectors. If you want to change it, you can specify the another optional parameter --n_bits.

For example, if you want to calculate 512-dimensional fingerprints vectors of Atom type:

moleculapy --input_file .\smiles_sample.csv --output_file .\smiles_atom_output.csv --method fingerprints --fp_type Atom --n_bits 512

Logging

All calculations performed by the application are logged. The logs are stored in the logs folder in the path where the application was installed. The path to the logs will be displayed in the CLI after the calculation session is completed.

Contact

If you have any problems, ideas or general feedback, please don't hesitate to contact me at kam.pytlak@gmail.com. I'd really appreciate it!

Credits

This software uses the following open source packages:

License

MIT


GitHub @kamilpytlak  ·  LinkedIn kamil-pytlak

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moleculapy-1.1.3.tar.gz (119.6 kB view details)

Uploaded Source

Built Distribution

moleculapy-1.1.3-py3-none-any.whl (126.4 kB view details)

Uploaded Python 3

File details

Details for the file moleculapy-1.1.3.tar.gz.

File metadata

  • Download URL: moleculapy-1.1.3.tar.gz
  • Upload date:
  • Size: 119.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.9 Windows/10

File hashes

Hashes for moleculapy-1.1.3.tar.gz
Algorithm Hash digest
SHA256 1b01d133c5e6f4aed72bca0bc4b1ac456aec50ee23d7655038f3df23ca106a2d
MD5 491dd3454c2048ae8b9c112611160741
BLAKE2b-256 6a4a7605f77a718a5943fc9dd38ef513053292fcf957e5ce7dedf26a643b3900

See more details on using hashes here.

File details

Details for the file moleculapy-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: moleculapy-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 126.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.9 Windows/10

File hashes

Hashes for moleculapy-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 af72ed8df853b41baf95e666120c4e60c711ec856b8433809a75b61d6a308b17
MD5 6a24b07274e0705309b5c23101c2d107
BLAKE2b-256 4686fc492d43631dcfe89f7377dbec99624d64ebabac78efb0b0bcdbd7b84725

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page