Skip to main content

Store observations of vcf variants in a mongodb

Project description

loqusdb

Publish to PyPI Coverage Status PyPI Version

Small tool to setup a local variant database. If you find loqusdb useful in your work, please cite the article.

Right now locusdb uses mongodb as backend for storing variants but there should not be a huge difference to use another database manager.

Installation

These instructions were written and tested using a conda environment with a version of Python >=3.9, which is required by the installer file (setup.py).

pip install loqusdb

or

$git clone https://github.com/moonso/loqusdb
$cd loqusdb
$pip install --editable .

Idea

Tool to keep track of what variants that have been seen and in what families they have been observed. This is NOT a tool to create a true frequency database. It will basically count the number of times we have seen a variant in any individual. We will also keep track of the variants that have been seen in a homozygous or hemizygous state.

Variants are stored by providing a vcf file and a (ped or ped like)family file.

Loqusdb will first check if the vcf file looks ok.

The tool will then check all variants if they have been observed in any of the individuals in the family.

When the variants are added:

  • Either the variant exists, in this case we increase the number of observations with one
  • Or this variant has not ben seen before, then the variant is added to database

Command Line Interface

$ loqusdb
Usage: loqusdb [OPTIONS] COMMAND [ARGS]...

  loqusdb: manage a local variant count database.

Options:
  -db, --database TEXT            Defaults to 'loqusdb' if not specified
  -u, --username TEXT
  -p, --password TEXT
  -a, --authdb TEXT               If authentication should be done against
                                  another database than --database

  -port, --port INTEGER           Specify the port where to look for the mongo
                                  database.  [default: 27017]

  -h, --host TEXT                 Specify the host where to look for the mongo
                                  database.  [default: localhost]

  --uri TEXT                      Specify a mongodb uri
  -c, --config FILENAME           Use a config with db information
  -t, --test                      Used for testing. This will use a mongomock
                                  database.

  -g, --genome-build [GRCh37|GRCh38]
                                  Specify what genome build to use
  -v, --verbose
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Commands:
  annotate  Annotate a VCF with observations
  cases     Display cases in database
  delete    Delete the variants of a family
  dump      Dump the database
  export    Export variants to VCF format
  identity  Search identity collection
  index     Add indexes to database
  load      Load the variants of a family
  migrate   Migrate an old loqusdb instance
  profile   Loads variants to be used in profiling
  restore   Restore database from dump
  update    Update an existing case with a new type of variants
  variants  Display variants in database
  wipe      Wipe a loqusdb instance

Database

Connecting

Connection can be specified on command line with --database, --username, --password, --port, --host and/or --uri. Or these options can be sent with a config file that can take the same options, looks like:

uri: mongodb://loqusdb-username:loqusdb-pwd@localhost:27030/loqusdb-rd?authSource=admin
db_name: loqusdb_test

or

host: localhost
port: 27030
username: loqusdb-username
password: loqusdb-pwd
authdb: admin
db_name: loqusdb_test

Mongo

The collections looks like:

Case

{
    'case_id': 'case_id',
    'vcf_path': 'path_to_vcf'
}

Variant

{
    '_id': 'variant_id',
    'chrom': 'CHROM',
    'start': postition,
    'end': end postition,
    'ref': reference base(s),
    'alt': alternative base(s),
    'homozygote': number_of_homozygotes,
    'hemizygote': number_of_hemizygotes,
    'observations': number_of_observations,
    'families': ['family_id', ...]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loqusdb-2.7.4.tar.gz (492.9 kB view details)

Uploaded Source

Built Distribution

loqusdb-2.7.4-py3-none-any.whl (50.3 kB view details)

Uploaded Python 3

File details

Details for the file loqusdb-2.7.4.tar.gz.

File metadata

  • Download URL: loqusdb-2.7.4.tar.gz
  • Upload date:
  • Size: 492.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for loqusdb-2.7.4.tar.gz
Algorithm Hash digest
SHA256 f19fb07af41166b48f0f9884127b24bcd0a392fe6e6e62ac54dee3566e488a16
MD5 f35ba91c0368eaa65116ed60c0e94e70
BLAKE2b-256 6e482ec0499be2c29c5f0be6c15c3c826f9eb4f3a41ae4f10d850220fc2f48b0

See more details on using hashes here.

File details

Details for the file loqusdb-2.7.4-py3-none-any.whl.

File metadata

  • Download URL: loqusdb-2.7.4-py3-none-any.whl
  • Upload date:
  • Size: 50.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for loqusdb-2.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b305ec5f922a8a0a4a7a921e7caff582bc44d810e41d100e55fd5a85c30d4e2c
MD5 4df6fd2530d57d92974187e1365b84e0
BLAKE2b-256 bf46460e30d924b813fd940aca59673389816e1e94578edcfe957f8d1de1dc8a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page