Skip to main content

a Python library and command line tool to make GEO data into gold.

Project description

geo-alchemy

a Python library and command line tool to make GEO data into gold.

  1. why geo-alchemy
  2. installation
  3. usage

why geo-alchemy

GEO is like a gold mine that contains a huge many gold ore. But processing these gold ore(GEO series) into gold(expression matrix, clinical data) is not very easy:

  1. how to map microarray probe to gene?
  2. how about multiple probes map to same gene?
  3. hot to get clinical data?
  4. ...

geo-alchemy was born to deal with it.

installation

pip install geo-alchemy

usage

parse metadata from GEO

parse platform

from geo_alchemy import PlatformParser


parser = PlatformParser.from_accession('GPL570')
platform1 = parser.parse()


# or
platform2 = PlatformParser.from_accession('GPL570').parse()


print(platform1 == platform2)

parse sample

from geo_alchemy import SampleParser


parser = SampleParser.from_accession('GSM1885279')
sample1 = parser.parse()

# or
sample2 = SampleParser.from_accession('GSM1885279').parse()

print(sample1 == sample2)

parse series

from geo_alchemy import SeriesParser


parser = SeriesParser.from_accession('GSE73091')
series1 = parser.parse()


# don't parse samples, samples attribute will be a blank list
series2 = parser.parse(parse_samples=False)
print(series2.samples == [])


# or
series3 = SeriesParser.from_accession('GSE73091').parse()


print(series1 == series3)

additional computed attributes can be access by:

print(series.sample_count)  # how many samples
print(series.platforms)  # duplication removal platforms
print(series.organisms)  # duplication removal organisms

serialization and deserialization

For the convenience of saving, all objects in geo-alchemy can be converted to dict, and this dict can be directly saved to a file in json form.

Moreover, geo-alchemy also provides methods to convert these dicts into objects.

from geo_alchemy import SeriesParser


series1 = SeriesParser.from_accession('GSE73091').parse()
data = series1.to_dict()
series2 = SeriesParser.parse_dict(data)


print(series1 == series2)

crawl all GEO metadata

geo-alchemy also support whole site crawling, support features:

  1. crawling all platform metadata.
  2. crawling all sample metadata.
  3. crawling all series metadata.
  4. incremental crawling.
  5. crawling sample metadata based on existing platform metadata.
  6. crawling series metadata based on existing platform metadata and series metadata.
  7. multiple processes support.

crawl platform metadata

denovo crawling
geo-alchemy crawl platforms -o platforms.jl
incremental crawling
geo-alchemy crawl platforms -cf platforms.jl -o new-platforms.jl

crawl sample metadata

denovo crawling
geo-alchemy crawl samples -o samples.jl
incremental crawling
geo-alchemy crawl samples -cf samples.jl -o new-samples.jl
based on existing platform metadata jsonlines file

When parse sample metadata, the specified platform metadata will also be crawled. If you want to omit these network accessing, you can specify the platform metadata jsonlines file crawled.

geo-alchemy crawl samples -cpf platforms.jl -cf samples.jl -o new-samples.jl

crawl series metadata

denovo crawling
geo-alchemy crawl series -o series.jl
incremental crawling
geo-alchemy crawl series -cf series.jl -o new-series.jl
based on existing platform and sample metadata jsonlines file

like sample, you can:

geo-alchemy crawl series -cpf platforms.jl -csf samples.jl -cf series.jl -o new-series.jl

multiple processes

To accelerate crawling, geo-alchemy support multiple processes.

eg, use 10 processes:

geo-alchemy crawl platform -p 10 -o platforms.jl

what's jsonlines

Refer to this site for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geo-alchemy-0.0.4.tar.gz (12.6 kB view hashes)

Uploaded Source

Built Distributions

geo_alchemy-0.0.4-py3.7.egg (26.1 kB view hashes)

Uploaded Source

geo_alchemy-0.0.4-py3-none-any.whl (12.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page