Skip to main content

Native handling of OpenEye objects in Pandas

Project description

OEPandas

Python 3.10+ OpenEye Toolkits Pandas 2.1+

Deep integration of OpenEye objects into Pandas DataFrames with native support for molecules and design units.


🚀 Quick Start

pip install oepandas
import oepandas as oepd
from openeye import oechem

# Load molecule data from various formats
df = oepd.read_sdf("molecules.sdf")
df = oepd.read_oeb("molecules.oeb.gz")  
df = oepd.read_molecule_csv("data.csv", molecule_columns="SMILES")

# Use pandas normally with molecules
df["num_oxygens"] = df.Molecule.apply(lambda mol: oechem.OECount(mol, oechem.OEIsOxygen()))

✨ Features

  • Native OpenEye Integration: Store OEGraphMol and OEDesignUnit objects directly in pandas DataFrames
  • Multiple File Formats: Read SDF, OEB, CSV, SMI, OEDB, and OEDU files seamlessly
  • Pandas Extensions: Rich accessor methods for molecular operations (.to_smiles(), .depict(), .get_mols(), etc.)
  • Type Safety: Full type hints and PyCharm IDE support
  • Performance: Optimized for large molecular datasets

📖 Table of Contents


🔧 Installation

Requirements

  • Python 3.10+
  • pandas 2.1.0+
  • numpy
  • OpenEye Toolkits 2023.1.0+
  • more-itertools

Note on OpenEye Toolkits License: OpenEye Toolkits requires a commercial license. However, free licenses are available for academic and non-profit institutions. Visit OpenEye Scientific or contact OpenEye directly to request an academic license.

Install from PyPI

pip install oepandas

Development Installation

git clone https://github.com/scott-arne/oepandas.git
cd oepandas
pip install -e ".[dev]"

📚 Basic Usage

Reading Molecular Data

OEPandas provides readers for all major chemical file formats supported by the OpenEye Toolkits, including their proprietary formats and record files:

import oepandas as oepd

# SDF files - molecules with properties
df = oepd.read_sdf("molecules.sdf")
print(df.head())
#                                   Molecule    Title      MolWt
# 0  <oechem.OEGraphMol; proxy...>   Aspirin   180.157
# 1  <oechem.OEGraphMol; proxy...> Ibuprofen   206.281

# CSV files with SMILES
df = oepd.read_molecule_csv("data.csv", molecule_columns="SMILES")

# OEB files (binary format)
df = oepd.read_oeb("molecules.oeb.gz")

# Design unit files
df = oepd.read_oedu("complexes.oedu")

# OERecord databases
df = oepd.read_oedb("records.oedb")

Working with Molecules

Once loaded, use pandas normally with rich molecular extensions:

from openeye import oechem

# Standard pandas operations work
filtered_df = df[df.MolWt > 200]

# Rich molecular accessors
smiles = df.Molecule.to_smiles()
images = df.Molecule.depict(width=300, height=200)

# Apply OpenEye functions
df["oxygen_count"] = df.Molecule.apply(lambda mol: oechem.OECount(mol, oechem.OEIsOxygen()))
df["has_ring"] = df.Molecule.apply(lambda mol: oechem.OEDetermineRingMembership(mol) > 0)

# Convert to different formats
df["canonical_smiles"] = df.Molecule.to_smiles(flavor=oechem.OESMILESFlag_Canonical)

Design Units

Work with protein-ligand complexes:

# Read design unit file
df = oepd.read_oedu("protein_ligand_complexes.oedu")

# Extract components
df["Ligand"] = df.Design_Unit.get_ligands()
df["Protein"] = df.Design_Unit.get_proteins()

# Analyze components
df["ligand_mw"] = df.Ligand.apply(oechem.OECalculateMolecularWeight)
df["protein_residues"] = df.Protein.apply(lambda mol: oechem.OECount(mol, oechem.OEIsResidue()))

🔥 Advanced Features

Custom Accessors

OEPandas registers many useful pandas accessors automatically:

# Molecular property accessors
df.Molecule.copy_molecules()          # Deep copy molecules
df.Molecule.to_smiles()              # Generate SMILES strings
df.Molecule.depict()                 # Generate 2D depictions
df.Molecule.as_molecule()            # Convert to different formats

# Design unit accessors  
df.Design_Unit.get_ligands()         # Extract ligand molecules
df.Design_Unit.get_proteins()        # Extract protein molecules
df.Design_Unit.copy_design_units()   # Deep copy design units

# DataFrame-level accessors
df.oechem.write_sdf("output.sdf")    # Write to SDF file
df.oechem.write_oeb("output.oeb")    # Write to OEB file

📘 Examples

Comprehensive Jupyter notebooks are available in the examples/ directory:

  • 01_getting_started.ipynb - Basic usage, molecular calculations, and data manipulation
  • 02_advanced_features.ipynb - File I/O, design units, performance optimization, and ML integration

📋 API Reference

File Readers

  • read_sdf(filename, **kwargs) - Read SDF files
  • read_oeb(filename, **kwargs) - Read OEB files
  • read_oedu(filename, **kwargs) - Read OEDU files
  • read_molecule_csv(filename, molecule_columns, **kwargs) - Read CSV with molecules
  • read_smi(filename, **kwargs) - Read SMILES files
  • read_oedb(filename, **kwargs) - Read OEDB files

Core Classes

  • MoleculeArray / MoleculeDtype - Pandas extension for molecules
  • DesignUnitArray / DesignUnitDtype - Pandas extension for design units
  • DisplayArray / DisplayDtype - Pandas extension for molecular displays

🛠 Development

Running Tests

invoke test
# or
pytest

Building Package

invoke build
# or  
python -m build

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


📄 License

This project is licensed under the MIT License. See the LICENSE file for details.


👤 Author

Scott Arne Johnson


🔗 Related Projects

  • OpenEye Toolkits - The underlying cheminformatics toolkit
  • Pandas - Data analysis library that OEPandas extends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oepandas-2.0.1-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file oepandas-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: oepandas-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for oepandas-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 755315d6bb1862c03effadb3ddd185e70f54f52e9256c4fafe83f3a3551bb5ed
MD5 fe015c64fc9df575223988095d548f19
BLAKE2b-256 ea92c8b57f895494c60e5ce2545dad24f6ab488431928de4584a06162b554cdb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page