Skip to main content

The OmicIDX project collects, reprocesses, and then republishes metadata from multiple public genomics repositories. Included are the NCBI SRA, Biosample, and GEO databases. Publication is via the cloud data warehouse platform Bigquery, a set of performant search and retrieval APIs, and a set of json-format files for easy incorporation into other projects.

Project description

New process

Steps

  • Download xml
  • Create basic json
  • Upload json to s3
  • munge basic json to parquet
  • munge parquet to
    • experiment joined
    • sample joined
    • run joined
    • study with aggregates
    • Include aggs in spark jobs:
      • number of samples, experiments, runs
      • sample, experiment, and run accessions (as array)
  • Save munged spark data (json, parquet)
  • Create elasticsearch index mappings
  • Drop existing elasticsearch mappings
  • Load elasticsearch index mappings

lambda

zip lambdas.zip lambda_handlers.py sra_parsers.py

aws lambda create-function --function-name sra_to_json --zip-file fileb://lambdas.zip --handler lambda_handlers.lambda_return_full_experiment_json --runtime python3.6 --role arn:aws:iam::377200973048:role/lambda_s3_exec_role

Invoke

aws lambda invoke --function-name sra_to_json --log-type Tail --payload '{"accession":"SRX000273"}' /tmp/abc.txt

Concurrency

1000 total, reserve for certain functions to limit, etc.

aws lambda put-function-concurrency --function-name sra_to_json --reserved-concurrent-executions 20

timeout and memory

aws lambda update-function-configuration --function-name sra_to_json --timeout 15

logging

https://github.com/jorgebastida/awslogs

awslogs get /aws/lambda/sra_to_json ALL --watch

dynamodb

aws dynamodb scan --table-name sra_experiment --select "COUNT"

GEO

python -m omicidx.geometa --gse=GSE10

Will print json, one "line" per entity to stdout.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omicidx-0.3.8.tar.gz (34.5 kB view details)

Uploaded Source

Built Distribution

omicidx-0.3.8-py3-none-any.whl (42.1 kB view details)

Uploaded Python 3

File details

Details for the file omicidx-0.3.8.tar.gz.

File metadata

  • Download URL: omicidx-0.3.8.tar.gz
  • Upload date:
  • Size: 34.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.3 Darwin/18.2.0

File hashes

Hashes for omicidx-0.3.8.tar.gz
Algorithm Hash digest
SHA256 0b8b503a958c5da686e7694f9fc460083955a3bf3de639ca12f84f59bf4ab7ac
MD5 23d6588332eb77822e0cfb563f105e71
BLAKE2b-256 12bf536361131fd58081a3c930d2d464934b5d2087a195f23349a44fc09061cc

See more details on using hashes here.

File details

Details for the file omicidx-0.3.8-py3-none-any.whl.

File metadata

  • Download URL: omicidx-0.3.8-py3-none-any.whl
  • Upload date:
  • Size: 42.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.17 CPython/3.7.3 Darwin/18.2.0

File hashes

Hashes for omicidx-0.3.8-py3-none-any.whl
Algorithm Hash digest
SHA256 e8e10230788eed839263e3015c61a5d165f3bef699f5a41b98c36dc1794cd62b
MD5 b1b0db9fc72c214ca53b92fbd4cc4165
BLAKE2b-256 a37e925cccd8b25b9122e6002d44f92e25c937faed28baa5d5482663b97db55e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page