Skip to main content

The OmicIDX project collects, reprocesses, and then republishes metadata from multiple public genomics repositories. Included are the NCBI SRA, Biosample, and GEO databases. Publication is via the cloud data warehouse platform Bigquery, a set of performant search and retrieval APIs, and a set of json-format files for easy incorporation into other projects.

Project description

New process

Steps

  • Download xml
  • Create basic json
  • Upload json to s3
  • munge basic json to parquet
  • munge parquet to
    • experiment joined
    • sample joined
    • run joined
    • study with aggregates
    • Include aggs in spark jobs:
      • number of samples, experiments, runs
      • sample, experiment, and run accessions (as array)
  • Save munged spark data (json, parquet)
  • Create elasticsearch index mappings
  • Drop existing elasticsearch mappings
  • Load elasticsearch index mappings

lambda

zip lambdas.zip lambda_handlers.py sra_parsers.py

aws lambda create-function --function-name sra_to_json --zip-file fileb://lambdas.zip --handler lambda_handlers.lambda_return_full_experiment_json --runtime python3.6 --role arn:aws:iam::377200973048:role/lambda_s3_exec_role

Invoke

aws lambda invoke --function-name sra_to_json --log-type Tail --payload '{"accession":"SRX000273"}' /tmp/abc.txt

Concurrency

1000 total, reserve for certain functions to limit, etc.

aws lambda put-function-concurrency --function-name sra_to_json --reserved-concurrent-executions 20

timeout and memory

aws lambda update-function-configuration --function-name sra_to_json --timeout 15

logging

https://github.com/jorgebastida/awslogs

awslogs get /aws/lambda/sra_to_json ALL --watch

dynamodb

aws dynamodb scan --table-name sra_experiment --select "COUNT"

GEO

python -m omicidx.geometa --gse=GSE10

Will print json, one "line" per entity to stdout.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for omicidx, version 0.3.4
Filename, size File type Python version Upload date Hashes
Filename, size omicidx-0.3.4-py3-none-any.whl (43.7 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size omicidx-0.3.4.tar.gz (35.7 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page