Skip to main content

Native handling of OpenEye objects in Pandas

Project description

OEPandas

Python 3.11+ OpenEye Toolkits Pandas 2.2+

Deep integration of OpenEye objects into Pandas DataFrames with native support for molecules and design units.


🚀 Quick Start

pip install oepandas
import oepandas as oepd
from openeye import oechem

# Load molecule data from various formats
df = oepd.read_sdf("molecules.sdf")
df = oepd.read_oeb("molecules.oeb.gz")  
df = oepd.read_molecule_csv("data.csv", molecule_columns="SMILES")

# Use pandas normally with molecules
df["num_oxygens"] = df.Molecule.apply(lambda mol: oechem.OECount(mol, oechem.OEIsOxygen()))

✨ Features

  • Native OpenEye Integration: Store OEGraphMol and OEDesignUnit objects directly in pandas DataFrames
  • Multiple File Formats: Read SDF, OEB, CSV, SMI, OEDB, and OEDU files seamlessly
  • Pandas Extensions: Rich accessor methods for molecular operations (.to_smiles(), .depict(), .get_mols(), etc.)
  • Type Safety: Full type hints and PyCharm IDE support
  • Performance: Optimized for large molecular datasets

📖 Table of Contents


🔧 Installation

Requirements

  • Python 3.10+
  • pandas 2.1.0+
  • numpy
  • OpenEye Toolkits 2023.1.0+
  • more-itertools

Note on OpenEye Toolkits License: OpenEye Toolkits requires a commercial license. However, free licenses are available for academic and non-profit institutions. Visit OpenEye Scientific or contact OpenEye directly to request an academic license.

Install from PyPI

pip install oepandas

Development Installation

git clone https://github.com/scott-arne/oepandas.git
cd oepandas
pip install -e ".[dev]"

📚 Basic Usage

Reading Molecular Data

OEPandas provides readers for all major chemical file formats supported by the OpenEye Toolkits, including their proprietary formats and record files:

import oepandas as oepd

# SDF files - molecules with properties
df = oepd.read_sdf("molecules.sdf")
print(df.head())
#                                   Molecule    Title      MolWt
# 0  <oechem.OEGraphMol; proxy...>   Aspirin   180.157
# 1  <oechem.OEGraphMol; proxy...> Ibuprofen   206.281

# CSV files with SMILES
df = oepd.read_molecule_csv("data.csv", molecule_columns="SMILES")

# OEB files (binary format)
df = oepd.read_oeb("molecules.oeb.gz")

# Design unit files
df = oepd.read_oedu("complexes.oedu")

# OERecord databases
df = oepd.read_oedb("records.oedb")

Working with Molecules

Once loaded, use pandas normally with rich molecular extensions:

from openeye import oechem

# Standard pandas operations work
filtered_df = df[df.MolWt > 200]

# Rich molecular accessors
smiles = df.Molecule.to_smiles()
images = df.Molecule.depict(width=300, height=200)

# Apply OpenEye functions
df["oxygen_count"] = df.Molecule.apply(lambda mol: oechem.OECount(mol, oechem.OEIsOxygen()))
df["has_ring"] = df.Molecule.apply(lambda mol: oechem.OEDetermineRingMembership(mol) > 0)

# Convert to different formats
df["canonical_smiles"] = df.Molecule.to_smiles(flavor=oechem.OESMILESFlag_Canonical)

Design Units

Work with protein-ligand complexes:

# Read design unit file
df = oepd.read_oedu("protein_ligand_complexes.oedu")

# Extract components
df["Ligand"] = df.Design_Unit.get_ligands()
df["Protein"] = df.Design_Unit.get_proteins()

# Analyze components
df["ligand_mw"] = df.Ligand.apply(oechem.OECalculateMolecularWeight)
df["protein_residues"] = df.Protein.apply(lambda mol: oechem.OECount(mol, oechem.OEIsResidue()))

🔥 Advanced Features

Custom Accessors

OEPandas registers many useful pandas accessors automatically:

# Molecular property accessors
df.Molecule.copy_molecules()          # Deep copy molecules
df.Molecule.to_smiles()              # Generate SMILES strings
df.Molecule.depict()                 # Generate 2D depictions
df.Molecule.as_molecule()            # Convert to different formats

# Design unit accessors  
df.Design_Unit.get_ligands()         # Extract ligand molecules
df.Design_Unit.get_proteins()        # Extract protein molecules
df.Design_Unit.copy_design_units()   # Deep copy design units

# DataFrame-level accessors
df.oechem.write_sdf("output.sdf")    # Write to SDF file
df.oechem.write_oeb("output.oeb")    # Write to OEB file

📘 Examples

Comprehensive Jupyter notebooks are available in the examples/ directory:

  • 01_getting_started.ipynb - Basic usage, molecular calculations, and data manipulation
  • 02_advanced_features.ipynb - File I/O, design units, performance optimization, and ML integration

📋 API Reference

File Readers

  • read_sdf(filename, **kwargs) - Read SDF files
  • read_oeb(filename, **kwargs) - Read OEB files
  • read_oedu(filename, **kwargs) - Read OEDU files
  • read_molecule_csv(filename, molecule_columns, **kwargs) - Read CSV with molecules
  • read_smi(filename, **kwargs) - Read SMILES files
  • read_oedb(filename, **kwargs) - Read OEDB files

Core Classes

  • MoleculeArray / MoleculeDtype - Pandas extension for molecules
  • DesignUnitArray / DesignUnitDtype - Pandas extension for design units
  • DisplayArray / DisplayDtype - Pandas extension for molecular displays

🛠 Development

Running Tests

invoke test
# or
pytest

Building Package

invoke build
# or  
python -m build

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


📄 License

This project is licensed under the MIT License. See the LICENSE file for details.


👤 Author

Scott Arne Johnson


🔗 Related Projects

  • OpenEye Toolkits - The underlying cheminformatics toolkit
  • Pandas - Data analysis library that OEPandas extends

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oepandas-2.1.1-py3-none-any.whl (35.5 kB view details)

Uploaded Python 3

File details

Details for the file oepandas-2.1.1-py3-none-any.whl.

File metadata

  • Download URL: oepandas-2.1.1-py3-none-any.whl
  • Upload date:
  • Size: 35.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for oepandas-2.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e5f598da26c4de61f9723387e93c2b4932240086f4c0485270817811f5ab1342
MD5 710d84eb38204d893adaddb0db0d8d81
BLAKE2b-256 8032a2f2b2e015342cc43c2036b9c65c02d11d36c96a50352a82dd4dcbc1b4f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page