Skip to main content

The OmicIDX project collects, reprocesses, and then republishes metadata from multiple public genomics repositories. Included are the NCBI SRA, Biosample, and GEO databases. Publication is via the cloud data warehouse platform Bigquery, a set of performant search and retrieval APIs, and a set of json-format files for easy incorporation into other projects.

Project description

New process

Steps

  • Download xml
  • Create basic json
  • Upload json to s3
  • munge basic json to parquet
  • munge parquet to
    • experiment joined
    • sample joined
    • run joined
    • study with aggregates
    • Include aggs in spark jobs:
      • number of samples, experiments, runs
      • sample, experiment, and run accessions (as array)
  • Save munged spark data (json, parquet)
  • Create elasticsearch index mappings
  • Drop existing elasticsearch mappings
  • Load elasticsearch index mappings

lambda

zip lambdas.zip lambda_handlers.py sra_parsers.py

aws lambda create-function --function-name sra_to_json --zip-file fileb://lambdas.zip --handler lambda_handlers.lambda_return_full_experiment_json --runtime python3.6 --role arn:aws:iam::377200973048:role/lambda_s3_exec_role

Invoke

aws lambda invoke --function-name sra_to_json --log-type Tail --payload '{"accession":"SRX000273"}' /tmp/abc.txt

Concurrency

1000 total, reserve for certain functions to limit, etc.

aws lambda put-function-concurrency --function-name sra_to_json --reserved-concurrent-executions 20

timeout and memory

aws lambda update-function-configuration --function-name sra_to_json --timeout 15

logging

https://github.com/jorgebastida/awslogs

awslogs get /aws/lambda/sra_to_json ALL --watch

dynamodb

aws dynamodb scan --table-name sra_experiment --select "COUNT"

GEO

python -m omicidx.geometa --gse=GSE10

Will print json, one "line" per entity to stdout.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omicidx-0.3.10.tar.gz (34.5 kB view hashes)

Uploaded Source

Built Distribution

omicidx-0.3.10-py3-none-any.whl (42.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page