Skip to main content

a Python library and command line tool to make GEO data into gold.

Project description

geo-alchemy

a Python library and command line tool to make GEO data into gold.

  1. why geo-alchemy
  2. installation
  3. use as Python library
  4. use as command line software

why geo-alchemy

GEO is like a gold mine that contains a huge many gold ore. But processing these gold ore(GEO series) into gold(expression matrix, clinical data) is not very easy:

  1. how to map microarray probe to gene?
  2. how about multiple probes map to same gene?
  3. hot to get clinical data?
  4. ...

geo-alchemy was born to deal with it.

installation

If you only want use as Python library:

pip install geo-alchemy

If you also want use as command line software:

pip install 'geo-alchemy[cmd]'

use as Python library

parse metadata from GEO

parse platform

from geo_alchemy import PlatformParser


parser = PlatformParser.from_accession('GPL570')
platform1 = parser.parse()


# or
platform2 = PlatformParser.from_accession('GPL570').parse()


print(platform1 == platform2)

# get platform annotation data
platform = PlatformParser.from_accession('GPL570', view='full').parse()
print(platform.internal_data)

parse sample

from geo_alchemy import SampleParser


parser = SampleParser.from_accession('GSM1885279')
sample1 = parser.parse()

# or
sample2 = SampleParser.from_accession('GSM1885279').parse()

print(sample1 == sample2)

parse series

from geo_alchemy import SeriesParser


parser = SeriesParser.from_accession('GSE73091')
series1 = parser.parse()

# or
series2 = SeriesParser.from_accession('GSE73091').parse()


print(series1 == series2)
print(series1.platforms)
print(series1.samples)
print(series1.organisms)

serialization and deserialization

For the convenience of saving, all objects in geo-alchemy can be converted to dict, and this dict can be directly saved to a file in json form.

Moreover, geo-alchemy also provides methods to convert these dicts into objects.

from geo_alchemy import SeriesParser


series1 = SeriesParser.from_accession('GSE73091').parse()
data = series1.to_dict()
series2 = SeriesParser.parse_dict(data)


print(series1 == series2)

use as command line software

using OCM

OCM(object command mapping) is a Python framework mapping Python object to command line software. It can capture intermediate results of command, you can enable OCM output like this:

geo-alchemy xxx --ocmir

probe reannotation

Prerequisites:

  1. NCBI BLAST must be installed.
  2. BLAST Index must be generated.

for more details, refer to this page.

geo-alchemy -d reanno -p GPL15303 -s 9 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13
  1. -p GPL15303 probe reannotation for GPL15303
  2. -s 9 the 9th column of platform annotation file is probe sequence
  3. -d xxx blast indexes location

if your reference sequences are download from GENCODE, enable --gencode can extract gene symbol from gene ID:

geo-alchemy -d reanno -p GPL15303 -s 9 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13 --gencode

preprocessing(microarray series only)

download metadata using network:

geo-alchemy pp -s GSE174772 -p GPL570 -g 11
  1. -s GSE174772 preprocessing for GSE174772
  2. -p GPL570 preprocessing samples who use GPL570 of GSE174772
  3. -g 11 NO.11 column of GPL570 annotation file is gene

this command generate 2 files under current directory:

  1. clinical file GSE174772_clinical.txt
  2. gene expression file GSE174772_expression.txt

use existed series metadata:

import json
from geo_alchemy import SeriesParser


series = SeriesParser.from_accession('GSE174772').parse()
data = series.to_dict()


with open('GSE174772.json', 'w') as fp:
   json.dump(data, fp)
geo-alchemy pp -sf GSE174772.json -g 11

using existing probe gene mapping file. usually you use geo-alchemy reanno do probe reannotation, this make you get a probe gene mapping file, you can:

geo-alchemy reanno -p GPL6480 -s 17 -d /Users/dev/Data/blast-indexes/GRCh38.p13/GRCh38.p13 --gencode
geo-alchemy pp -s GSE12435 -m GPL6480_reanno.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geo-alchemy-0.0.20.tar.gz (19.8 kB view details)

Uploaded Source

Built Distributions

geo_alchemy-0.0.20-py3.7.egg (48.4 kB view details)

Uploaded Source

geo_alchemy-0.0.20-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file geo-alchemy-0.0.20.tar.gz.

File metadata

  • Download URL: geo-alchemy-0.0.20.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for geo-alchemy-0.0.20.tar.gz
Algorithm Hash digest
SHA256 a92f19adeebcf647a7beee496247b186e1638480ad73294d00a55bfd705f17dc
MD5 0e2820243aae9f4de6ad77dafd9c16a7
BLAKE2b-256 70d6fa017ecd24ecec89b3dc09849705fb4387a2c2f8eaeda705ddcf9b52b43a

See more details on using hashes here.

File details

Details for the file geo_alchemy-0.0.20-py3.7.egg.

File metadata

  • Download URL: geo_alchemy-0.0.20-py3.7.egg
  • Upload date:
  • Size: 48.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for geo_alchemy-0.0.20-py3.7.egg
Algorithm Hash digest
SHA256 47df53636a3ba88318cc2ddd29bd1c65d89f78453de80365d3abf0095b0cb4eb
MD5 dd5209b7b81e191de09645301f85daee
BLAKE2b-256 2a294be256f97aa0e05d0f5ba618af6ccc130a24e0f9bb60476ffd5ac468a6a3

See more details on using hashes here.

File details

Details for the file geo_alchemy-0.0.20-py3-none-any.whl.

File metadata

  • Download URL: geo_alchemy-0.0.20-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.7.10

File hashes

Hashes for geo_alchemy-0.0.20-py3-none-any.whl
Algorithm Hash digest
SHA256 498bf270424dccf63bd0887b24cbe9759f53a3f0072c2a662724aab37aa44acb
MD5 bda626df9f7a3d8e3f2b28b0b02392d8
BLAKE2b-256 d1e5671456325ac5dc7cc5783506bf171738949d77e58929d1d62045df344de9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page