a Python library and command line tool to make GEO data into gold.
Project description
geo-alchemy
a Python library and command line tool to make GEO data into gold.
why geo-alchemy
GEO is like a gold mine that contains a huge many gold ore. But processing these gold ore(GEO series) into gold(expression matrix, clinical data) is not very easy:
- how to map microarray probe to gene?
- how about multiple probes map to same gene?
- hot to get clinical data?
- ...
geo-alchemy was born to deal with it.
installation
If you only want use as Python library:
pip install geo-alchemy
If you also want use as command line software:
pip install 'geo-alchemy[cmd]'
use as Python library
parse metadata from GEO
parse platform
from geo_alchemy import PlatformParser
parser = PlatformParser.from_accession('GPL570')
platform1 = parser.parse()
# or
platform2 = PlatformParser.from_accession('GPL570').parse()
print(platform1 == platform2)
# get platform annotation data
platform = PlatformParser.from_accession('GPL570', view='full').parse()
print(platform.internal_data)
parse sample
from geo_alchemy import SampleParser
parser = SampleParser.from_accession('GSM1885279')
sample1 = parser.parse()
# or
sample2 = SampleParser.from_accession('GSM1885279').parse()
print(sample1 == sample2)
parse series
from geo_alchemy import SeriesParser
parser = SeriesParser.from_accession('GSE73091')
series1 = parser.parse()
# or
series2 = SeriesParser.from_accession('GSE73091').parse()
print(series1 == series2)
print(series1.platforms)
print(series1.samples)
print(series1.organisms)
serialization and deserialization
For the convenience of saving, all objects in geo-alchemy can be converted to dict, and this dict can be directly saved to a file in json form.
Moreover, geo-alchemy also provides methods to convert these dicts into objects.
from geo_alchemy import SeriesParser
series1 = SeriesParser.from_accession('GSE73091').parse()
data = series1.to_dict()
series2 = SeriesParser.parse_dict(data)
print(series1 == series2)
use as command line software
using OCM
OCM(object command mapping) is a Python framework mapping Python object to command line software. It can capture intermediate results of command, you can enable OCM output like this:
geo-alchemy xxx --ocmir
probe reannotation
Prerequisites:
- NCBI BLAST must be installed.
- BLAST Index must be generated.
for more details, refer to this page.
geo-alchemy -d reanno -p GPL15303 -s 9 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13
-p GPL15303
probe reannotation for GPL15303-s 9
the 9th column of platform annotation file is probe sequence-d xxx
blast indexes location
if your reference sequences are download from GENCODE, enable --gencode
can extract gene symbol from gene ID:
geo-alchemy -d reanno -p GPL15303 -s 9 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13 --gencode
preprocessing(microarray series only)
download metadata using network:
geo-alchemy pp -s GSE174772 -p GPL570 -g 11
-s GSE174772
preprocessing for GSE174772-p GPL570
preprocessing samples who use GPL570 of GSE174772-g 11
NO.11 column of GPL570 annotation file is gene
this command generate 2 files under current directory:
- clinical file
GSE174772_clinical.txt
- gene expression file
GSE174772_expression.txt
use existed series metadata:
import json
from geo_alchemy import SeriesParser
series = SeriesParser.from_accession('GSE174772').parse()
data = series.to_dict()
with open('GSE174772.json', 'w') as fp:
json.dump(data, fp)
geo-alchemy pp -sf GSE174772.json -g 11
using existing probe gene mapping file.
usually you use geo-alchemy reanno
do probe reannotation,
this make you get a probe gene mapping file, you can:
geo-alchemy reanno -p GPL6480 -s 17 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13 --gencode
geo-alchemy pp -s GSE12435 -m GPL6480_reanno.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file geo-alchemy-0.0.20.tar.gz
.
File metadata
- Download URL: geo-alchemy-0.0.20.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a92f19adeebcf647a7beee496247b186e1638480ad73294d00a55bfd705f17dc |
|
MD5 | 0e2820243aae9f4de6ad77dafd9c16a7 |
|
BLAKE2b-256 | 70d6fa017ecd24ecec89b3dc09849705fb4387a2c2f8eaeda705ddcf9b52b43a |
File details
Details for the file geo_alchemy-0.0.20-py3.7.egg
.
File metadata
- Download URL: geo_alchemy-0.0.20-py3.7.egg
- Upload date:
- Size: 48.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47df53636a3ba88318cc2ddd29bd1c65d89f78453de80365d3abf0095b0cb4eb |
|
MD5 | dd5209b7b81e191de09645301f85daee |
|
BLAKE2b-256 | 2a294be256f97aa0e05d0f5ba618af6ccc130a24e0f9bb60476ffd5ac468a6a3 |
File details
Details for the file geo_alchemy-0.0.20-py3-none-any.whl
.
File metadata
- Download URL: geo_alchemy-0.0.20-py3-none-any.whl
- Upload date:
- Size: 21.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 498bf270424dccf63bd0887b24cbe9759f53a3f0072c2a662724aab37aa44acb |
|
MD5 | bda626df9f7a3d8e3f2b28b0b02392d8 |
|
BLAKE2b-256 | d1e5671456325ac5dc7cc5783506bf171738949d77e58929d1d62045df344de9 |