Skip to main content

Retrieve a column for each in a set of tables, placing them in a single output table.

Project description

collect-columns

This tool retrieves a column from each in a set of tables and compiles into a single table. Optionally, additional attributes from the associated GTF/GFF file may be added to the output tables.

Installation

Install from PyPI: pip install collect-columns

Install from github:

  • Clone the repository: git clone https://github.com/biowdl/collect-columns.git
  • Enter the repository: cd collect-columns
  • Install using pip: pip install .

Usage

collect-columns output_path input_files...

It assumes that all input count tables are in the same format. By default the format is assumed to be headerless and tab separated, with the first column being the feature identifiers and the second the values of interest. The output table will use the same separator as the input tables and contain a header. The feature column will contain the feature identifiers, the value columns will be named after the input files or according to the names given through the -n option, which takes a list of names as argument.

Please note that if multiple rows with the same feature identifier exist in an input table, then these values will overwrite each other in the output table by default. See also the -S flag.

In order to use a different input format the following options can be given:

option arguments definition
-f a number The index of the column containing the feature identifiers.
-c a number The index of the column containing the values/counts.
-s a character The separator.
-H Indicates that the table has a header.
-S Indicates that values should be added up if multiple rows exist with the same feature id. The values will become floats if this flag is set. By default only the last value will be taken and a warning will be give.

To add additional attributes from a GTF/GFF, the following options can be given:

option arguments definition
-a a list of words The attributes to be added to the output table.
-g a path The gtf file from which the attributes will be retrieved.
-F a word The attribute used to map rows in the input tables to gtf record. Defaults to gene_id.

Examples

HTSeq-count

Using the output from HTSeq-count as input the following command:

collect-columns all.tsv s1.tsv s2.tsv

will result in a table like:

feature s1.tsv s2.tsv
MSTRG.1 10 11
MSTRG.2 60 12
... ... ...

Stringtie

Using stringtie abundance output as input, the following command:

collect-columns all.FPKM s1.abundance s2.abundance \
    -c 7 \
    -H \
    -a ref_gene_id gene_name \
    -g merged.gtf \
    -n sample1 sample2 \
    -S # Stringtie may at times return multiple rows for one gene, these values can simply be summed up.

will result in a table like:

feature ref_gene_id gene_name sample1 sample2
MSTRG.1 g_1 gene_1 185151.953125 151.964231
MSTRG.2 g_2 gene_2 100160.070312 1160.030213
... ... ... ... ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

collect-columns-1.0.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

collect_columns-1.0.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file collect-columns-1.0.0.tar.gz.

File metadata

  • Download URL: collect-columns-1.0.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for collect-columns-1.0.0.tar.gz
Algorithm Hash digest
SHA256 13aefe84172da40d5b3ba4fc0066b9b11f05c6530bf951fee688ed050b8ba618
MD5 77cc6d2bfae220bd1a5719eb5f34c6d7
BLAKE2b-256 994fe68a69440ff76793b8f5332a3337a70408f08647e5c40d13e963f204abe4

See more details on using hashes here.

File details

Details for the file collect_columns-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: collect_columns-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3

File hashes

Hashes for collect_columns-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0aaac536f0886c5819f4c4adc751ae90b3eeca7ab7e04146ab8b060251796adc
MD5 657cedf15071bd30cf1c47b0732a836f
BLAKE2b-256 52355eb5882ca64cd9d4d3c0144e729b095c535d02fe83e35be41dcc30522e3b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page