Skip to main content

A modular Python package for fetching, enriching, and analyzing omics metadata and publications.

Project description

omix

A Python package that:

  • Fetches comprehensive metadata from public databases (ENA, and soon others).
  • Enriches coordinates, dates, host/environment categories, and experimental protocols.
  • Searches across multiple publication sources (Crossref, Europe PMC, NCBI, Semantic Scholar, etc.).
  • Extracts methodology from full‑text using LLMs (optional).
  • Validates findings against reference databases (e.g., primer databases for 16S).
  • Works for any omics via plugins.

Installation

pip install omix
# with LLM support:
pip install omix[llm]

Quick Start

Command Line

# Enrich a metadata file with ENA data
omix fetch-metadata samples.tsv --email you@example.com

# Fetch publications for one or more accessions
omix fetch-publications PRJNA864623 --omics 16S --api-key $LLM_KEY

# Run the full metadata cleaning and enrichment pipeline
omix run-pipeline metadata.csv -o enriched.csv

# NEW: Unified pipeline (metadata + publications + validation + integration)
omix enrich-with-publications samples.csv -o enriched_complete.csv --config config.yaml

Unified Metadata + Publications Pipeline

The enrich-with-publications command provides an end-to-end workflow:

  1. Metadata Enrichment: Fetches comprehensive data from ENA (sequences, samples, runs)
  2. Publication Discovery: Searches across 12+ publication APIs (Crossref, EuropePMC, NCBI, Semantic Scholar, arXiv, bioRxiv, CORE, DataCite, DOAJ, PLOS, Unpaywall, Zenodo)
  3. Publication Validation: Filters to only include publications with direct accession mentions
  4. Integration: Merges publication counts and DOIs into the enriched metadata

Output includes all ENA metadata fields plus:

  • publication_count: Number of validated publications per study
  • publication_dois: Semicolon-separated list of publication DOIs
# Basic usage
omix enrich-with-publications input.csv -o output.csv

# With debug config for faster testing
omix enrich-with-publications input.csv -o output.csv --config config.debug.yaml

# Skip validation (keep all publications found)
omix enrich-with-publications input.csv -o output.csv --no-validate

# With LLM-based methodology extraction
omix enrich-with-publications input.csv -o output.csv --api-key $LLM_KEY

Python API

from omix import Config
from omix.metadata.file_workflow import enrich_metadata_from_path
import asyncio

config = Config(email="you@example.com")
df = asyncio.run(enrich_metadata_from_path("samples.csv", config=config))
print(df.head())

Configuration

omix can be configured via a YAML file:

credentials:
  email: "your.email@example.com"
  ena_email: "ena@example.com"
  llm_api_key: "sk-..."
  ncbi_api_key: "..."

apis:
  sequence:
    ena:
      enabled: true
      max_concurrent: 5
      batch_size: 100
      cache_ttl_days: 30
      fetch_phases: true

metadata:
  sample_id_column: "#sampleid"
  exclude_host: false

paths:
  cache_dir: ".cache"
  logs_dir: "logs"
  primer_db: null

Pass it with --config my_config.yaml or set environment variables like OMIX_EMAIL.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omix-0.1.0.tar.gz (89.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omix-0.1.0-py3-none-any.whl (115.1 kB view details)

Uploaded Python 3

File details

Details for the file omix-0.1.0.tar.gz.

File metadata

  • Download URL: omix-0.1.0.tar.gz
  • Upload date:
  • Size: 89.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omix-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e330f2e8834e8286247c494782c50490903fbce010b60364b69867b49a42f55
MD5 02663e6509459372fe12ce994ee52ab0
BLAKE2b-256 056aaf5af1181bda87561c094c47560eafb8d95a5179d6b51ff14c0917765fab

See more details on using hashes here.

Provenance

The following attestation bundles were made for omix-0.1.0.tar.gz:

Publisher: python-publish.yml on heathermacgregor/omix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omix-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: omix-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 115.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for omix-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11c741c2aabe7f3d29ab953239982507ddbc183308ab1ca98bc9163cf2e89d6c
MD5 4b618f304a4ce9f0b380ab76b2a98df6
BLAKE2b-256 c98c7f0156528a58910ea5e8c3c51dea50d5d70d6f64697816a82eec14622726

See more details on using hashes here.

Provenance

The following attestation bundles were made for omix-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on heathermacgregor/omix

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page