Skip to main content

Generate molecular fingerprints with guaranteed collision‑free bits.

Project description

bit_collision_free_MF

A Python package for generating molecular fingerprints without bit collisions.

Description

bit_collision_free_MF generates Morgan fingerprints while eliminating bit collisions, which can significantly improve the accuracy and reliability of molecular fingerprints in cheminformatics applications. The package automatically determines the optimal fingerprint length to ensure that each structural feature maps to a unique bit in the fingerprint.

Installation

Requirements

  • Python 3.9 or higher
  • numpy
  • pandas
  • rdkit

Simple Installation

pip install bit_collision_free_MF

This will automatically install all dependencies, including RDKit.

Manual Installation

# Install dependencies
pip install numpy pandas rdkit

# Install the package
pip install bit_collision_free_MF

For development installation:

# Clone the repository
git clone https://github.com/yourusername/bit_collision_free_MF.git
cd bit_collision_free_MF

# Install in development mode
pip install -e .

Troubleshooting

If you encounter issues installing RDKit:

  1. Verify Python version: This package requires Python 3.9 or higher.
  2. Alternative installation methods:
    • For older Python versions: pip install rdkit-pypi
    • Using conda: conda install -c conda-forge rdkit

Features

  • Automatically determines the optimal fingerprint length to avoid bit collisions
  • Supports custom fingerprint radius
  • Option to remove zero-value columns
  • Easy CSV export with customizable headers
  • Seamless integration with pandas and NumPy

Usage

Basic Usage

from bit_collision_free_MF import generate_fingerprints, save_fingerprints
import pandas as pd

# Load your data
data = pd.read_csv('your_molecules.csv')

# Generate fingerprints
fingerprints, fp_generator = generate_fingerprints(
    data, 
    smiles_column='smiles',
    radius=1,
    remove_zero_columns=True
)

# Save fingerprints to CSV
save_fingerprints(
    fingerprints,
    fp_generator,
    output_path='path/to/output.csv',
    include_header=True
)

Using the CollisionFreeMorganFP Class Directly

from bit_collision_free_MF import CollisionFreeMorganFP
import pandas as pd

# Load your data
data = pd.read_csv('your_molecules.csv')
smiles_list = data['smiles'].tolist()

# Create and fit the fingerprint generator
fp_generator = CollisionFreeMorganFP(radius=1)
fp_generator.fit(smiles_list)

# Generate fingerprints
fingerprints = fp_generator.transform(smiles_list, remove_zero_columns=True)

# Get feature names
feature_names = fp_generator.get_feature_names()

# Create a DataFrame with the fingerprints
result_df = pd.DataFrame(fingerprints, columns=feature_names)

# Save to CSV
result_df.to_csv('fingerprints.csv', index=False)

License

This software is currently not open source. All rights reserved. Redistribution, modification, or use of this software in any form is not permitted until the associated research article is formally accepted and published.

Upon acceptance, the software will be released under the MIT License.

Contact

For academic inquiries or collaboration, please contact:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bit_collision_free_mf-0.2.0.tar.gz (8.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bit_collision_free_mf-0.2.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file bit_collision_free_mf-0.2.0.tar.gz.

File metadata

  • Download URL: bit_collision_free_mf-0.2.0.tar.gz
  • Upload date:
  • Size: 8.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.7

File hashes

Hashes for bit_collision_free_mf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 18fafb5a864f115893ab4b44cd367ec8c8aa9b4a95f975fba53dd8b34f210069
MD5 73a266f39b1aca401071727b4697ac25
BLAKE2b-256 eae897325ed46005caaf7490e8fa03ffdd581975a48927915c88bd695314b137

See more details on using hashes here.

File details

Details for the file bit_collision_free_mf-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for bit_collision_free_mf-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc1cc78830c42e017dad71853c75f01b797e5009ab36a756d5abfbd5c515e501
MD5 03a1f8af14a4903ce307b9ea7a10483e
BLAKE2b-256 4bd43f4f1781d703dd583659e9811bc416fc899b07d72a6480ad2416a3c800ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page