Skip to main content

Routines for ingesting metadata to a CanDIG repository

Project description

Routines for ingesting metadata into a CanDIG 1.0 server Requires [candig-server](, [docopt]( and [pandas](

  • Free software: GNU General Public License v3

You can run the ingestion and test a server with the resulting repo as follows (requires Python 2.7 for candig-server<1.0.0, or Python 3.6 for candig-server>=1.0.0, note that Python 3.7 is not currently supported.)

# Install
virtualenv test_server # If you are running Python 2
python3 -m venv test_server # If you are running Python 3.6

cd test_server
source bin/activate
pip install --upgrade pip setuptools
pip install candig-server # Specify anything <1.0.0 for Python 2.7, or >=1.0.0 for Python 3.6.
pip install candig-ingest

# ingest data and make the repo
mkdir candig-example-data
ingest candig-example-data/registry.db <path to example data, like: mock_data/clinical_metadata_tier1.json>

# optional
# add peer site addresses
candig_repo add-peer candig-example-data/registry.db <peer site IP address, like:>

# optional
# create dataset for reads and variants
candig_repo add-dataset --description "Reads and variants dataset" candig-example-data/registry.db read_and_variats_dataset

# optinal
# add reference set, data source: or
candig_repo add-referenceset candig-example-data/registry.db <path to downloaded reference set, like GRCh37-lite.fa> -d "GRCh37-lite human reference genome" --name GRCh37-lite --sourceUri ""

# optional
# add reads
candig_repo add-readgroupset -r -I <path to bam index file> -R GRCh37-lite candig-example-data/registry.db read_and_variats_dataset <path to bam file>

# optional
# add variants
candig_repo add-variantset -I <path to variants index file> -R GRCh37-lite candig-example-data/registry.db read_and_variats_dataset <path to vcf file>

# optional
# add sequence ontology set
# wget
candig_repo add-ontology candig-example-data/registry.db <path to sequence ontology set, like: so.obo> -n so-xp

# optional
# add features/annotations
## get the following scripts
## download the relevant annotation release from Gencode
## decompress
# gunzip gencode.v27.annotation.gff3.gz
## build the annotation database
# python -i gencode.v27.annotation.gff3 -o gencode.v27.annotation.db -v
# build index for your annotation database
# Run "CREATE INDEX name_type_index ON FEATURE (gene_name, type)" in Sqlite browser
# add featureset
candig_repo add-featureset candig-example-data/registry.db read_and_variats_dataset <path to the annotation.db> -R GRCh37-lite -O so-xp

# optional
# add phenotype association set from Monarch Initiative
# wget
candig_repo add-phenotypeassociationset candig-example-data/registry.db read_and_variats_dataset <path to the folder containing cdg.ttl>

# optional
# add disease ontology set, like: NCIT
# wget
candig_repo add-ontology -n NCIT candig-example-data/registry.db ncit.obo

# launch the server at different IP and/or port:
candig_server --host --port 8000 -c NoAuth

and then, from another terminal:

curl -X POST --header 'Content-Type: application/json' --header 'Accept: application/json' \ \
    | jq '.'


  "datasets": [
      "description": "PROFYLE test metadata",
      "id": "WyJQUk9GWUxFIl0",
      "name": "PROFYLE"

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for candig-ingest, version 1.3.1
Filename, size File type Python version Upload date Hashes
Filename, size candig_ingest-1.3.1-py3-none-any.whl (25.4 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size candig-ingest-1.3.1.tar.gz (25.3 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page