Skip to main content

Merges MAGE-Tab files considering covariates

Project description

MAGE-Tab Merger

This package facilitates merging of MAGE-Tab components at different levels.

Note: IDF merging is still work in progress.

SDRF with no considerations on metadata

This functionality will simply produce a new SDRF out of all the SDRFs provided, taking care to follow all the structure in MAGE graph encoded inside the SDRFs.

usage: merge_sdrfs.py [-h] -d DIRECTORY_WITH_SDRFS -o OUTPUT [--accessions-file ACCESSIONS_FILE] [-a ACCESSIONS_LIST]

optional arguments:
  -h, --help            show this help message and exit
  -d DIRECTORY_WITH_SDRFS, --directory-with-sdrfs DIRECTORY_WITH_SDRFS
                        Directory with SDRFs to merge
  -o OUTPUT, --output OUTPUT
                        Path for output sdrf.
  --accessions-file ACCESSIONS_FILE
                        File with comma separated list of accessions to use only. Overrides accessions list.
  -a ACCESSIONS_LIST, --accessions-list ACCESSIONS_LIST
                        Comma-separated list of accessions to use only.

Merge condensed SDRFs based on meta-data relations

Towards running meta-analysis of multiple experiments, often meta-analysis algorithms will require that there is certain links between studies in terms of a metadata field. For instance, if the main covariate is expected to be the organism part when merging studies (so that you can answer questions like what is the expression of gene X in organism part Y based on all studies), then each study being merged needs to have samples in an organism part that one of the other studies at least has.

This functionality takes condensed SDRFs for multiple studies (which can be generated with the condensed_sdrf.pl script, part of atlas-perl-modules conda package) and suggest (and merge) the largest group of studies that can be merged to satisfy the metadata condition explained.

usage: merge_condensed_sdrfs.py [-h] -d INPUT_PATH -a ACCESSIONS -o OUTPUT -n NEW_ACCESSION [-b BATCH] [-t BATCH_TYPE] [-c COVARIATE] [--covariate-type COVARIATE_TYPE] [--covariate-skip-values COVARIATE_SKIP_VALUES]

optional arguments:
  -h, --help            show this help message and exit
  -d INPUT_PATH, --input-path INPUT_PATH
                        Directory with condensed SDRFs to merge
  -a ACCESSIONS, --accessions ACCESSIONS
                        List of accessions to process, comma separated
  -o OUTPUT, --output OUTPUT
                        Path for output. <new-accession>.condensed.sdrf.tsv and <new-accession>.selected_studies.txt will be created there.
  -n NEW_ACCESSION, --new-accession NEW_ACCESSION
                        New accession for the output
  -b BATCH, --batch BATCH
                        Header for storing batch or study
  -t BATCH_TYPE, --batch-type BATCH_TYPE
                        Type for batch, usually characteristic
  -c COVARIATE, --covariate COVARIATE
                        Header for main covariate, usually organism part
  --covariate-type COVARIATE_TYPE
                        Type for main covariate, usually characteristic
  --covariate-skip-values COVARIATE_SKIP_VALUES
                        Covariate values to skip when assessing the studies connectivity; a commma separated list of values

This will compute a graph with studies as nodes. Two studies will be connected if they share a covariate field value for any set of samples. So, for instance, if study A has organism parts lung, liver and pancreas, study B has organism parts liver and kidney, then study A and B will be connected by one edge because of both having liver. Out of this graph, the largest connected component will be selected and merged into a single condensed SDRF.

Two files will be created in the output directory:

  • .condensed.sdrf.tsv
  • .selected_studies.txt

The stdout will contain useful information about the main connected components.

Because some experiments may contain covariate values that are not useful, such as "whole organism" for organism part, then the --covariate-skip-values allows to skip such values from the graph creation.

If you need an SDRF with the equivalent merged content, then use the first script listed here limited to the accessions that where selected by this process.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MAGE-Tab merger-0.0.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

MAGE_Tab_merger-0.0.1-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file MAGE-Tab merger-0.0.1.tar.gz.

File metadata

  • Download URL: MAGE-Tab merger-0.0.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for MAGE-Tab merger-0.0.1.tar.gz
Algorithm Hash digest
SHA256 ac361c5bec4625a999694a0ea35eff93b023df86548e6fac5fe3843bd05f7e2a
MD5 b0d0f82ea0c81f0cd1cfd7dafc1abdd9
BLAKE2b-256 d0ecb9ee73f5739d606c5756049cd2fc7705fde58ad57515933405335ff1f060

See more details on using hashes here.

File details

Details for the file MAGE_Tab_merger-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: MAGE_Tab_merger-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.2

File hashes

Hashes for MAGE_Tab_merger-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 47f0432b8ad2c782ace1a132f95ab6ab9df3215c6c221a5fa08c6878311cfa48
MD5 a601a984db606cf46714428f6d3bdce4
BLAKE2b-256 7de82295f828acd95a77f44106c4320f70ecd59c5e9b0ddb3ecd6e62a628acbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page