Skip to main content

The OmicIDX project collects, reprocesses, and then republishes metadata from multiple public genomics repositories. Included are the NCBI SRA, Biosample, and GEO databases. Publication is via the cloud data warehouse platform Bigquery, a set of performant search and retrieval APIs, and a set of json-format files for easy incorporation into other projects.

Project description

New process


  • Download xml
  • Create basic json
  • Upload json to s3
  • munge basic json to parquet
  • munge parquet to
    • experiment joined
    • sample joined
    • run joined
    • study with aggregates
    • Include aggs in spark jobs:
      • number of samples, experiments, runs
      • sample, experiment, and run accessions (as array)
  • Save munged spark data (json, parquet)
  • Create elasticsearch index mappings
  • Drop existing elasticsearch mappings
  • Load elasticsearch index mappings



aws lambda create-function --function-name sra_to_json --zip-file fileb:// --handler lambda_handlers.lambda_return_full_experiment_json --runtime python3.6 --role arn:aws:iam::377200973048:role/lambda_s3_exec_role


aws lambda invoke --function-name sra_to_json --log-type Tail --payload '{"accession":"SRX000273"}' /tmp/abc.txt


1000 total, reserve for certain functions to limit, etc.

aws lambda put-function-concurrency --function-name sra_to_json --reserved-concurrent-executions 20

timeout and memory

aws lambda update-function-configuration --function-name sra_to_json --timeout 15


awslogs get /aws/lambda/sra_to_json ALL --watch


aws dynamodb scan --table-name sra_experiment --select "COUNT"


python -m omicidx.geometa --gse=GSE10

Will print json, one "line" per entity to stdout.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for omicidx, version
Filename, size File type Python version Upload date Hashes
Filename, size omicidx- (47.2 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size omicidx- (38.3 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page