Skip to main content

Pipeline for processing GEO data and uploading it to the PEPHub

Project description

geopephub

Automatic uploader of GEO metadata projects to PEPhub.

This repository contains geopephub CLI, that enables to automatic upload GEO projects to PEPhub based on date and scheduled automatic uploading using GitHub actions. Additionally, the CLI includes a download command, enabling users to retrieve projects from specified namespace directly from the PEPhub database. This feature is particularly helpful for downloading all GEO projects at once.

Installation

To install geopephub use this command:

pip install git+https://github.com/pepkit/geopephub.git

Overview:

The geopephub consists of 4 main functionalities:

  1. Queuer: This module comprises functions that scan for new projects in GEO, generate a new cycle for the current run, and log details for each GEO project. It sets the project status to queued and adds it to the database.
  2. Uploader: Checks if there are any queued cycles in the cycle_status table. It retrieves a list of queued projects, executes GEOfetch to download them, and uploads the results to PEPhub database using pepdbagent. geopephub updates the project upload status at each step, allowing for later checks to determine why the upload failed and what occurred.
  3. Checker: This component examines previous cycles, verifies their status, and determines if they were executed. If a cycle was not executed or was unsuccessful, it triggers a rerun. In cases where only one project was unsuccessful, it attempts to upload it again. Additionally, if the cycle does not exist, it creates one using the queuer and uploads files using the uploader.
  4. Downloader: Retrieves projects from the specified namespace, filters by uploading or updating date, and optionally sorts by name or date. It also allows setting a limit on the number of downloaded projects. Projects can be downloaded locally or to a specified S3 bucket. For more information, use the geopephub --help command

More information about these processes can be found in the flowcharts and overview below.

Queuer Flowchart:

%%{init: {'theme':'forest'}}%%
stateDiagram-v2
    s1 --> s2 
    s2 --> s3
    s3 --> s4
    s4 --> s5
    s1: Create a new cycle
    s2: Find GEO updated projects with geofetch Finder
    s3: Add projects to the queue in sample status table
    s4: Change cycle status to queued
    s5: Exit

Uploader Flowchart:

%%{init: {'theme':'forest'}}%%
stateDiagram-v2
    s1 --> s2 
    s2 --> s3
    s3 --> s4
    s4 --> s5
    s5 --> s6
    s6 --> s7
    s7 --> s8

    s7 --> s2
    s6 --> s3

    s1: Get queued cycles by specifying namespace
    s2: Change status of the cycle
    s2: Get each element from list of queued cycle
    s3: Get each project (GSE) from one cycle
    s4: Change status of the project in project_status_table
    s5: Get specified project by running Geofetcher
    s6: Using pepdbagent add project to the DB
    s6: Change status of the project in project_status_table
    s7: Change status of cycle in cycle_status_table
    s8: Exit

Checker Flowchart:

graph TD
    A[Choose cycle to check] --> B{Did it run?}
    B -->|Yes| C{Was it successful?}
    B -->|No| D[Run Queuer for the cycle]
    C -->|Yes| E{Did all samples succeed?}
    C -->|No| D

    D --> D1[Run Uploader for the cycle]
    D1 --> K

    E --> |Yes| K[Exit]
    E --> |No| G[Retrieve failed samples]

    G --> H[Run Queuer for samples]
    H --> F[Run Uploader for queued samples]
    
    F --> I[Change samples status in the table]

    I --> J[Change cycle status in the table]

    J --> K[Exit]

Download all namespace.

How to run it on rivanna:

# install geopephub from dev branch
pip install git+https://github.com/pepkit/geopephub.git@dev

# set all env vars 

# run:
geopephub auto-download --destination /project/shefflab/brickyard/datasets_downloaded/pephub/geo

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geopephub-0.1.1.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geopephub-0.1.1-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file geopephub-0.1.1.tar.gz.

File metadata

  • Download URL: geopephub-0.1.1.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geopephub-0.1.1.tar.gz
Algorithm Hash digest
SHA256 aa11336af9f171b01c504eae764025a796ee77406de3ae2787ddfe4ff423575e
MD5 99ecbbad84ac5d0bf9eb87c9b784591f
BLAKE2b-256 346b3a01ce4a8ba68dee0334260b894a05adf26022ff007ac20eaa72dfa04395

See more details on using hashes here.

Provenance

The following attestation bundles were made for geopephub-0.1.1.tar.gz:

Publisher: python-publish.yml on pepkit/geopephub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file geopephub-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: geopephub-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geopephub-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 997938d739201e0a945bdb8dc04d25efa7d7f6ce06f6d1c7f274203c8a1c5b5e
MD5 0f0485dd66235f97ad51df925f5ccf5e
BLAKE2b-256 f0d8378f608f3b0b524150b30aaf1772a2a33c32e49862fdd2a7d66f00e926c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for geopephub-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on pepkit/geopephub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page