Skip to main content

一个能操作GEO MINiML formatted family文件的Python工具(A Python tool that can manipulate GEO MINiML formatted files)

Project description

geo-factory

一个Python工具,可以操作GEO MINiML formatted family文件。

功能:

1. 合并每个样本的tbl文件
2. 根据平台文件把探针ID转换为gene symbol
3. 从family xml中提取样本表型信息

安装

使用pip安装:

$ pip install geo-factory

使用源码安装:

$ git clone git@github.com:taishengxin/geo-factory.git
$ cd geo-factory
$ python setup.py install

合并每个样本的tbl文件

$ geo-factory merge-tbls --help
Usage: geo-factory merge-tbls [OPTIONS]

  合并tbl文件到探针表达矩阵

Options:
  -w, --wildcard TEXT  MINiML tbl文件通配符, 例如:'GSE124647/GSM*txt',注意一定要加引号
                       [required]

  -o, --outfile PATH   输出探针表达谱文件  [required]
  --help               Show this message and exit.

例如:

$ geo-factory merge-tbls -w 'GSE124647/GSM*txt' -o probe_exp_GSE124647.tsv

关于输出的探针表达谱文件:

1. 一行代表一个探针、一列代表一个样本
2. 第一列是探针ID
3. 以tab键分割

根据平台文件把探针ID转换为gene symbol

$ geo-factory probe2gene --help
Usage: geo-factory probe2gene [OPTIONS]

  根据GEO平台文件把探针表达谱文件转换为基因表达谱文件

Options:
  -p, --probe-expression-matrix-file PATH
                                  探针表达谱文件  [required]
  -g, --geo-platform-file PATH    GEO平台文件  [required]
  -c, --col INTEGER               GEO平台文件哪一列是gene symbol  [required]
  -a, --aggregation-function [min|max|first|last|mean|median]
                                  当有多个探针对应同一个基因的时候使用什么方法合并,默认是median
  -o, --outfile PATH              输出基因表达谱文件  [required]
  --help                          Show this message and exit.

例如:

geo-factory probe2gene -p probe_exp_GSE124647.tsv -g GSE124647/GPL96-tbl-1.txt -c 11 -o gene_exp_GSE124647.tsv

关于输出的基因表达谱文件:

1. 一行代表一个基因、一列代表一个样本
2. 第一列是gene symbol
3. 以tab键分割

从family xml中提取样本表型信息

$ geo-factory parse-pheno --help
Usage: geo-factory parse-pheno [OPTIONS]

  从family XML中获取表型信息

Options:
  -f, --family-xml-file PATH  family XML文件  [required]
  -o, --outfile PATH          输出表型信息文件  [required]
  --help                      Show this message and exit.

例如:

$ geo-factory parse-pheno -f GSE124647/GSE124647_family.xml -o pheno_GSE124647.tsv

关于输出表型文件:

1. 行是样本、列是表型属性(例如,性别、年龄、生存时间)
2. 以tab键分割

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geo-factory-1.0.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

geo_factory-1.0.0-py2.py3-none-any.whl (5.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file geo-factory-1.0.0.tar.gz.

File metadata

  • Download URL: geo-factory-1.0.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.10

File hashes

Hashes for geo-factory-1.0.0.tar.gz
Algorithm Hash digest
SHA256 f1cafdbf5d2c10147c019d5e6ef1fa5a185a542fea19f9c665e8ed7c8fa43bb9
MD5 9c2d5d54977f72b75206a56c88675030
BLAKE2b-256 858a20da69f89659427ec7d24258742d4f80b5551d13f37afb07ae65f3970d87

See more details on using hashes here.

File details

Details for the file geo_factory-1.0.0-py2.py3-none-any.whl.

File metadata

  • Download URL: geo_factory-1.0.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 5.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.6.10

File hashes

Hashes for geo_factory-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 612df40fd5e7c674ebdaa5adecdfb151b9caf5e53dcb23e593e9eb49cfd08ae2
MD5 6b10e4b16d652c7492b9828b4644cb65
BLAKE2b-256 d91870cbf85f61d4abe8e5be05bd9e10ff4fd2c3483b6217079caf950d7a85da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page