A Python library for efficiently fetching and processing gene homolog data from the Phytozome database.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KrisKari

These details have not been verified by PyPI

Project description

PhytoMiner

This is a package for fetching Phytozome data

A Python library for efficiently fetching and processing gene homolog data from the Phytozome database via its InterMine API.

This library is designed to simplify complex, iterative bioinformatic queries, allowing researchers to trace gene homology across multiple species with ease.

Features

Three-Step Pipeline: A clear, sequential workflow for fetching homologs, merging local data, and retrieving detailed gene information.
Iterative Search: Automatically performs chained searches using homologs found in previous steps to build a comprehensive dataset.
Parallel Processing: Utilizes multithreading for efficient, parallel data fetching, significantly speeding up large queries.
Checkpointing: Automatically saves and loads intermediate results to prevent losing progress and allow for easy resumption of long-running jobs.
Data Processing & Visualization: Includes functions to clean, de-duplicate, and enrich data, plus a utility to quickly generate a heatmap of homolog distribution.

Installation

You can install the latest PhytoMiner release directly from PyPI:

pip install phytominer

Usage

Here is a complete example of the three-step workflow:

Define a set of known genes in a source organism (e.g., A. thaliana). Run homologs_pipe to find homologs in other species. Run join_tsvs to combine the homolog data with local metadata from TSV files. Run genes_pipe to fetch detailed gene data for the final homolog set.

import logging
from phytominer.workflow import step1_homolog_pipe, step2_merge_pipe, step3_gene_pipe

# It's highly recommended to configure logging to see the progress
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# 1. Define initial search parameters
# The initial organism to start the search from
initial_organism = "A. thaliana TAIR10"

# A dictionary of initial gene IDs and their corresponding subunit names
initial_genes = {
    "AT1G01090": "NDHA",
    "AT1G01120": "NDHB",
    "ATCG00520": "NDHC",
}

# A list of other organisms to find homologs in
subsequent_organisms = [
    "S. bicolor v3.1.1",
    "O. sativa Kitaake v3.1",
    "S. viridis v2.1"
]

# 2. Run the three-step pipeline
# Step 1: Fetch all homolog data, starting with the initial organism
# and iterating through the subsequent ones.
step1_df = homologs_pipe(
    initial_organism=initial_organism,
    initial_genes_dict=initial_genes,
    subsequent_organisms=subsequent_organisms
)

# Step 2: Merge the homolog data with local TSV files containing additional metadata.
# This step assumes you have a directory with TSV files (e.g., 'data/tsv/').
step2_df = join_tsvs()

# Step 3: Fetch detailed gene data (e.g., expression, sequence) for the homologs found.
step3_df = genes_pipe()

# The final DataFrames are saved to CSV files at each step (e.g., step1output.csv).
print("PhytoMiner workflow complete!")

API Overview

The phytominer library is structured around a sequential, three-step workflow.

Workflow Functions These are the main functions you'll use, found in phytominer.workflow.

homologs_pipe(...): Orchestrates the entire homolog search. It starts with an initial set of genes, finds their homologs, and then iteratively searches for homologs of the results in other specified organisms. It handles checkpointing and produces a final, processed DataFrame of homolog data. join_tsvs(...): Takes the output from Step 1 and merges it with local TSV files containing supplementary data (e.g., subunit validation). genes_pipe(...): Takes the output from Step 2 and fetches detailed gene information (sequences, expression data, etc.) for all unique homologs identified in the pipeline.

Utility Functions

These helper functions are available in phytominer.utils.

pivotmap(dataframe, ...): Generates a pivot table and a corresponding heatmap to visualize the count of homologs across different species and subunits. log_summary(df, ...): Logs a concise summary of a DataFrame's shape, columns, memory usage, and other key statistics.

Continuous Integration & Deployment

This project uses GitHub Actions for automated testing and publishing.

Automated Testing:
Every push to the main branch triggers the test suite using Python 3.9.
Automated Publishing:
When a new release is published on GitHub, the package is automatically built and uploaded to PyPI.

You can find the workflow configuration in .github/workflows/python-publish.yml.

Contributing

Contributions are welcome! If you have a suggestion or find a bug, please open an issue. Pull requests are also encouraged.

Fork the repository.
Create your feature branch (git checkout -b feature/AmazingFeature).
Commit your changes (git commit -m 'Add some AmazingFeature').
Push to the branch (git push origin feature/AmazingFeature).
Open a Pull Request.

Running Tests Locally

To run the test suite locally:

pip install -e .[dev]
pytest

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

Author: Kris Kari Email: toffe.kari@gmail.com

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KrisKari

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jul 27, 2025

0.1.4

Jul 22, 2025

0.1.3

Jul 7, 2025

0.1.1

Jul 6, 2025

0.1.0

Jul 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phytominer-0.2.0.tar.gz (16.9 kB view details)

Uploaded Jul 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phytominer-0.2.0-py3-none-any.whl (15.2 kB view details)

Uploaded Jul 27, 2025 Python 3

File details

Details for the file phytominer-0.2.0.tar.gz.

File metadata

Download URL: phytominer-0.2.0.tar.gz
Upload date: Jul 27, 2025
Size: 16.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phytominer-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`337ec1530a8db00964858c4c1f3dce81db26d2d8db208e0e347dcc3410895d5a`
MD5	`1425c9fb783b4fb41f9575194ab1165a`
BLAKE2b-256	`44b08119a3e3a106707c81a879de41382ea47ed4e39442d8b336980114e0ff99`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phytominer-0.2.0.tar.gz:

Publisher: python-publish.yml on boffus/PhytoMiner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phytominer-0.2.0.tar.gz
- Subject digest: 337ec1530a8db00964858c4c1f3dce81db26d2d8db208e0e347dcc3410895d5a
- Sigstore transparency entry: 315671633
- Sigstore integration time: Jul 27, 2025
Source repository:
- Permalink: boffus/PhytoMiner@90958be1b50e442f9125594553d44fc4e9f627d8
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/boffus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@90958be1b50e442f9125594553d44fc4e9f627d8
- Trigger Event: release

File details

Details for the file phytominer-0.2.0-py3-none-any.whl.

File metadata

Download URL: phytominer-0.2.0-py3-none-any.whl
Upload date: Jul 27, 2025
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for phytominer-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3e0e65c6a24982551e172f94e279ca1a8bb4f9bfead7ed3ec8420d7b9672f6bc`
MD5	`b2fd56ac3b077ce2ffb5b75c742674bc`
BLAKE2b-256	`83dba690b69a757b9b35655471e161acf79a69f732d3116ebbb0d1df38c37119`

See more details on using hashes here.

Provenance

The following attestation bundles were made for phytominer-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on boffus/PhytoMiner

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: phytominer-0.2.0-py3-none-any.whl
- Subject digest: 3e0e65c6a24982551e172f94e279ca1a8bb4f9bfead7ed3ec8420d7b9672f6bc
- Sigstore transparency entry: 315671636
- Sigstore integration time: Jul 27, 2025
Source repository:
- Permalink: boffus/PhytoMiner@90958be1b50e442f9125594553d44fc4e9f627d8
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/boffus
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@90958be1b50e442f9125594553d44fc4e9f627d8
- Trigger Event: release

phytominer 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PhytoMiner

Features

Installation

Usage

API Overview

Utility Functions

Continuous Integration & Deployment

Contributing

Running Tests Locally

License

Contact

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance