Retrieve a column for each in a set of tables, placing them in a single output table.
Project description
collect-columns
This tool retrieves a column from each in a set of tables and compiles into a single table. Optionally, additional attributes from the associated GTF/GFF file may be added to the output tables.
Installation
Install from PyPI: pip install collect-columns
Install from github:
- Clone the repository:
git clone https://github.com/biowdl/collect-columns.git
- Enter the repository:
cd collect-columns
- Install using pip:
pip install .
Usage
collect-columns output_path input_files...
It assumes that all input count tables are in the same format.
By default the format is assumed to be headerless and tab separated, with the
first column being the feature identifiers and the second the values of interest.
The output table will use the same separator as the input tables and contain
a header. The feature
column will contain the feature identifiers, the value
columns will be named after the input files or according to the names given
through the -n
option, which takes a list of names as argument.
Please note that if multiple rows with the same feature identifier exist in an input table, then these values will overwrite each other in the output table by default. See also the
-S
flag.
In order to use a different input format the following options can be given:
option | arguments | definition |
---|---|---|
-f |
a number | The index of the column containing the feature identifiers. |
-c |
a number | The index of the column containing the values/counts. |
-s |
a character | The separator. |
-H |
Indicates that the table has a header. | |
-S |
Indicates that values should be added up if multiple rows exist with the same feature id. The values will become floats if this flag is set. By default only the last value will be taken and a warning will be give. |
To add additional attributes from a GTF/GFF, the following options can be given:
option | arguments | definition |
---|---|---|
-a |
a list of words | The attributes to be added to the output table. |
-g |
a path | The gtf file from which the attributes will be retrieved. |
-F |
a word | The attribute used to map rows in the input tables to gtf record. Defaults to gene_id . |
Examples
HTSeq-count
Using the output from HTSeq-count as input the following command:
collect-columns all.tsv s1.tsv s2.tsv
will result in a table like:
feature | s1.tsv | s2.tsv |
---|---|---|
MSTRG.1 | 10 | 11 |
MSTRG.2 | 60 | 12 |
... | ... | ... |
Stringtie
Using stringtie abundance output as input, the following command:
collect-columns all.FPKM s1.abundance s2.abundance \
-c 7 \
-H \
-a ref_gene_id gene_name \
-g merged.gtf \
-n sample1 sample2 \
-S # Stringtie may at times return multiple rows for one gene, these values can simply be summed up.
will result in a table like:
feature | ref_gene_id | gene_name | sample1 | sample2 |
---|---|---|---|---|
MSTRG.1 | g_1 | gene_1 | 185151.953125 | 151.964231 |
MSTRG.2 | g_2 | gene_2 | 100160.070312 | 1160.030213 |
... | ... | ... | ... | ... |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file collect-columns-1.0.0.tar.gz
.
File metadata
- Download URL: collect-columns-1.0.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13aefe84172da40d5b3ba4fc0066b9b11f05c6530bf951fee688ed050b8ba618 |
|
MD5 | 77cc6d2bfae220bd1a5719eb5f34c6d7 |
|
BLAKE2b-256 | 994fe68a69440ff76793b8f5332a3337a70408f08647e5c40d13e963f204abe4 |
File details
Details for the file collect_columns-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: collect_columns-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0aaac536f0886c5819f4c4adc751ae90b3eeca7ab7e04146ab8b060251796adc |
|
MD5 | 657cedf15071bd30cf1c47b0732a836f |
|
BLAKE2b-256 | 52355eb5882ca64cd9d4d3c0144e729b095c535d02fe83e35be41dcc30522e3b |