Native handling of OpenEye objects in Pandas
Project description
OEPandas
Deep integration of OpenEye objects into Pandas DataFrames with native support for molecules and design units.
🚀 Quick Start
pip install oepandas
import oepandas as oepd
from openeye import oechem
# Load molecule data from various formats
df = oepd.read_sdf("molecules.sdf")
df = oepd.read_oeb("molecules.oeb.gz")
df = oepd.read_molecule_csv("data.csv", molecule_columns="SMILES")
# Use pandas normally with molecules
df["num_oxygens"] = df.Molecule.apply(lambda mol: oechem.OECount(mol, oechem.OEIsOxygen()))
✨ Features
- Native OpenEye Integration: Store
OEGraphMolandOEDesignUnitobjects directly in pandas DataFrames - Multiple File Formats: Read SDF, OEB, CSV, SMI, OEDB, and OEDU files seamlessly
- Pandas Extensions: Rich accessor methods for molecular operations (
.to_smiles(),.depict(),.get_mols(), etc.) - Type Safety: Full type hints and PyCharm IDE support
- Performance: Optimized for large molecular datasets
📖 Table of Contents
🔧 Installation
Requirements
- Python 3.10+
- pandas 2.1.0+
- numpy
- OpenEye Toolkits 2023.1.0+
- more-itertools
Note on OpenEye Toolkits License: OpenEye Toolkits requires a commercial license. However, free licenses are available for academic and non-profit institutions. Visit OpenEye Scientific or contact OpenEye directly to request an academic license.
Install from PyPI
pip install oepandas
Development Installation
git clone https://github.com/scott-arne/oepandas.git
cd oepandas
pip install -e ".[dev]"
📚 Basic Usage
Reading Molecular Data
OEPandas provides readers for all major chemical file formats supported by the OpenEye Toolkits, including their proprietary formats and record files:
import oepandas as oepd
# SDF files - molecules with properties
df = oepd.read_sdf("molecules.sdf")
print(df.head())
# Molecule Title MolWt
# 0 <oechem.OEGraphMol; proxy...> Aspirin 180.157
# 1 <oechem.OEGraphMol; proxy...> Ibuprofen 206.281
# CSV files with SMILES
df = oepd.read_molecule_csv("data.csv", molecule_columns="SMILES")
# OEB files (binary format)
df = oepd.read_oeb("molecules.oeb.gz")
# Design unit files
df = oepd.read_oedu("complexes.oedu")
# OERecord databases
df = oepd.read_oedb("records.oedb")
Working with Molecules
Once loaded, use pandas normally with rich molecular extensions:
from openeye import oechem
# Standard pandas operations work
filtered_df = df[df.MolWt > 200]
# Rich molecular accessors
smiles = df.Molecule.to_smiles()
images = df.Molecule.depict(width=300, height=200)
# Apply OpenEye functions
df["oxygen_count"] = df.Molecule.apply(lambda mol: oechem.OECount(mol, oechem.OEIsOxygen()))
df["has_ring"] = df.Molecule.apply(lambda mol: oechem.OEDetermineRingMembership(mol) > 0)
# Convert to different formats
df["canonical_smiles"] = df.Molecule.to_smiles(flavor=oechem.OESMILESFlag_Canonical)
Design Units
Work with protein-ligand complexes:
# Read design unit file
df = oepd.read_oedu("protein_ligand_complexes.oedu")
# Extract components
df["Ligand"] = df.Design_Unit.get_ligands()
df["Protein"] = df.Design_Unit.get_proteins()
# Analyze components
df["ligand_mw"] = df.Ligand.apply(oechem.OECalculateMolecularWeight)
df["protein_residues"] = df.Protein.apply(lambda mol: oechem.OECount(mol, oechem.OEIsResidue()))
🔥 Advanced Features
Custom Accessors
OEPandas registers many useful pandas accessors automatically:
# Molecular property accessors
df.Molecule.copy_molecules() # Deep copy molecules
df.Molecule.to_smiles() # Generate SMILES strings
df.Molecule.depict() # Generate 2D depictions
df.Molecule.as_molecule() # Convert to different formats
# Design unit accessors
df.Design_Unit.get_ligands() # Extract ligand molecules
df.Design_Unit.get_proteins() # Extract protein molecules
df.Design_Unit.copy_design_units() # Deep copy design units
# DataFrame-level accessors
df.oechem.write_sdf("output.sdf") # Write to SDF file
df.oechem.write_oeb("output.oeb") # Write to OEB file
📘 Examples
Comprehensive Jupyter notebooks are available in the examples/ directory:
- 01_getting_started.ipynb - Basic usage, molecular calculations, and data manipulation
- 02_advanced_features.ipynb - File I/O, design units, performance optimization, and ML integration
📋 API Reference
File Readers
read_sdf(filename, **kwargs)- Read SDF filesread_oeb(filename, **kwargs)- Read OEB filesread_oedu(filename, **kwargs)- Read OEDU filesread_molecule_csv(filename, molecule_columns, **kwargs)- Read CSV with moleculesread_smi(filename, **kwargs)- Read SMILES filesread_oedb(filename, **kwargs)- Read OEDB files
Core Classes
MoleculeArray/MoleculeDtype- Pandas extension for moleculesDesignUnitArray/DesignUnitDtype- Pandas extension for design unitsDisplayArray/DisplayDtype- Pandas extension for molecular displays
🛠 Development
Running Tests
invoke test
# or
pytest
Building Package
invoke build
# or
python -m build
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
👤 Author
Scott Arne Johnson
- Email: scott.arne.johnson@gmail.com
🔗 Related Projects
- OpenEye Toolkits - The underlying cheminformatics toolkit
- Pandas - Data analysis library that OEPandas extends
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oepandas-2.1.1-py3-none-any.whl.
File metadata
- Download URL: oepandas-2.1.1-py3-none-any.whl
- Upload date:
- Size: 35.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5f598da26c4de61f9723387e93c2b4932240086f4c0485270817811f5ab1342
|
|
| MD5 |
710d84eb38204d893adaddb0db0d8d81
|
|
| BLAKE2b-256 |
8032a2f2b2e015342cc43c2036b9c65c02d11d36c96a50352a82dd4dcbc1b4f3
|