Skip to main content

A unified Python toolkit for biological data harmonization and ontology mapping

Project description

biomapper

A unified Python toolkit for biological data harmonization and ontology mapping. biomapper provides a single interface for standardizing identifiers and mapping between various biological ontologies, making multi-omic data integration more accessible and reproducible.

Features

Core Functionality

  • ID Standardization: Unified interface for standardizing biological identifiers
  • Ontology Mapping: Comprehensive ontology mapping using major biological databases and AI-powered techniques
  • Data Validation: Robust validation of input data and mappings
  • Extensible Architecture: Easy integration of new data sources and mapping services

Supported Systems

ID Standardization Tools

  • RaMP-DB: Integration with the Rapid Mapping Database for metabolites and pathways

Mapping Services

  • ChEBI: Chemical Entities of Biological Interest database integration
  • UniChem: Cross-referencing of chemical structure identifiers
  • UniProt: Protein-focused mapping capabilities
  • RefMet: Reference list of metabolite names and identifiers
  • RAG-Based Mapping: AI-powered mapping using Retrieval Augmented Generation
  • Multi-Provider RAG: Combining multiple data sources for improved mapping accuracy

Installation

Using pip

pip install biomapper

Development Setup

  1. Install Python 3.11 with pyenv (if not already installed):
# Install pyenv dependencies
sudo apt-get update
sudo apt-get install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev \
libncursesw5-dev xz-utils tk-dev libffi-dev liblzma-dev python3-openssl

# Install pyenv
curl https://pyenv.run | bash

# Add to your shell configuration
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init -)"' >> ~/.bashrc

# Reload shell configuration
source ~/.bashrc

# Install Python 3.11
pyenv install 3.11.7
pyenv local 3.11.7
  1. Install Poetry (if not already installed):
curl -sSL https://install.python-poetry.org | python3 -

# Add Poetry to your PATH
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc
  1. Clone and set up the project:
git clone https://github.com/yourusername/biomapper.git
cd biomapper

# Install dependencies with Poetry
poetry install

Quick Start

from biomapper.mapping import UniProtFocusedMapper, MetaboliteNameMapper
from biomapper.standardization import RaMPClient

# Example 1: Using UniProt-focused mapping
uniprot_mapper = UniProtFocusedMapper()
protein_mapping = uniprot_mapper.map_identifier("P12345")

# Example 2: Using Metabolite Name Mapping
metabolite_mapper = MetaboliteNameMapper()
metabolite_mapping = metabolite_mapper.map_name("glucose")

# Example 3: Using RaMP-DB
# Initialize the RaMP client
ramp_client = RaMPClient()

# Get database versions
versions = ramp_client.get_source_versions()

# Get pathways for metabolites
# Example: Get pathways for Creatine (HMDB0000064)
pathways = ramp_client.get_pathways_from_analytes(["hmdb:HMDB0000064"])

# Example 4: Using RAG-based mapping
from biomapper.mapping import RagMapper

rag_mapper = RagMapper()
rag_results = rag_mapper.map_name("alpha-D-glucose")

Development

Using Poetry

# Activate virtual environment
poetry shell

# Run a command in the virtual environment
poetry run python script.py

# Add a new dependency
poetry add package-name

# Add a development dependency
poetry add --group dev package-name

# Update dependencies
poetry update

# Show currently installed packages
poetry show

# Build the package
poetry build

Running Tests

# Run tests
poetry run pytest

# Run tests with coverage
poetry run pytest --cov=biomapper

Code Quality

# Format code with black
poetry run black .

# Run linting
poetry run flake8 .

# Type checking
poetry run mypy .

Project Structure

biomapper/
├── biomapper/           # Main package directory
│   ├── core/           # Core functionality
│   │   ├── metadata.py # Metadata handling
│   │   └── validators.py # Data validation
│   ├── standardization/# ID standardization components
│   ├── mapping/        # Ontology mapping components
│   ├── utils/          # Utility functions
│   └── schemas/        # Data schemas and models
├── tests/              # Test files
├── docs/               # Documentation
├── scripts/            # Utility scripts
├── pyproject.toml      # Poetry configuration and dependencies
└── poetry.lock        # Lock file for dependencies

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Support

For support, please open an issue in the GitHub issue tracker.

Roadmap

  • Initial release with core functionality
  • Implement RAG-based mapping capabilities
  • Add support for major chemical/biological databases (ChEBI, UniChem, UniProt)
  • Add caching layer for improved performance
  • Expand RAG capabilities with more specialized models
  • Add batch processing capabilities
  • Develop REST API interface

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biomapper-0.3.2.tar.gz (41.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

biomapper-0.3.2-py3-none-any.whl (46.7 kB view details)

Uploaded Python 3

File details

Details for the file biomapper-0.3.2.tar.gz.

File metadata

  • Download URL: biomapper-0.3.2.tar.gz
  • Upload date:
  • Size: 41.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.7 Linux/6.9.3-76060903-generic

File hashes

Hashes for biomapper-0.3.2.tar.gz
Algorithm Hash digest
SHA256 275c8b5a800de1dca35040d39c6d39984da04496395f95095e43a685d0a01043
MD5 18071232dbe61dcf83f86f7d8bbefc96
BLAKE2b-256 0b2a35da5e49b97900a5fbd795a62b8dfa82b6c03b4e7bc256b2b04295e97969

See more details on using hashes here.

File details

Details for the file biomapper-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: biomapper-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 46.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.7 Linux/6.9.3-76060903-generic

File hashes

Hashes for biomapper-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8eda54922bad8befc1fc43af9d033d25caf30f91d82dbe2f166ee1dcf2668817
MD5 0eae53bc4cc592ac131528734270c982
BLAKE2b-256 b21c65a2d0205f629dd5c9d2552fa15b9586ddeee04a5b5004642b09f0020554

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page