Skip to main content

OpenCRAVAT CSV processor for generating sample CSVs for aiva-database import

Project description

AIVA Sample CSV Processor

Python 3.8+ OpenCRAVAT

A Python package for processing OpenCRAVAT CSV output files and generating sample CSVs for database import. This tool helps streamline the workflow from variant calling to database import by converting OpenCRAVAT annotations into structured CSV files ready for database loading.

Features

  • Process OpenCRAVAT CSV files into structured database-ready formats
  • Generate VRS IDs for variants using the aiva-vrs library
  • Create separate CSV files for variants, transcript consequences, and sample variants
  • Support for multi-sample VCF inputs
  • Handle various zygosity, quality, and depth metrics
  • Optional compression of output files
  • Customizable sample metadata

Installation

Quick Install

pip install aiva-sample-processor

Development Install

git clone https://github.com/MHSPL/aiva-sample-processor.git
cd aiva-sample-processor
pip install -e .

Dependencies

This package requires:

  • Python 3.8+
  • open-cravat 2.2.0+
  • aiva-vrs 0.1.0+
  • pandas
  • tqdm
  • psycopg2-binary

All dependencies will be installed automatically when installing with pip.

Usage

1. Run OpenCRAVAT

First, run OpenCRAVAT on your VCF file to generate the input CSV:

# Install OpenCRAVAT modules (first time only)
oc module install-base
oc module install csvreporter

# Run OpenCRAVAT
oc run input.vcf -l hg38 -t csv 

2. Process the OpenCRAVAT Output

Use the aiva-sample-processor command to process the OpenCRAVAT output:

aiva-sample-processor --input input.vcf.variant.csv --output-dir output_csvs

Command Line Options

usage: aiva-sample-processor [-h] --input INPUT --output-dir OUTPUT_DIR [--assembly {GRCh37,GRCh38}]
                           [--no-compress] [--owner-id OWNER_ID] [--group-id GROUP_ID]
                           [--is-public {true,false}] [--sample-type SAMPLE_TYPE]
                           [--status STATUS] [--review-status REVIEW_STATUS]
                           [--view-status VIEW_STATUS] [--archive-status ARCHIVE_STATUS]
                           [--clinical-notes CLINICAL_NOTES] [--phenotype-terms PHENOTYPE_TERMS]

Generate CSVs for importing sample variants into the database.

required arguments:
  --input INPUT          Path to OpenCRAVAT CSV file
  --output-dir OUTPUT_DIR
                        Directory to write output CSVs to

optional arguments:
  --assembly {GRCh37,GRCh38}
                        Genome assembly (default: GRCh38)
  --no-compress         Do not compress output files with gzip
  --owner-id OWNER_ID   User ID of the owner (default: user-1)
  --group-id GROUP_ID   Group ID for the samples
  --is-public {true,false}
                        Whether samples are public (default: false)
  --sample-type SAMPLE_TYPE
                        Type of sample (default: blood)
  --status STATUS       Sample processing status (default: processed)
  --review-status REVIEW_STATUS
                        Review status of the sample (default: not_reviewed)
  --view-status VIEW_STATUS
                        View status of the sample (default: none)
  --archive-status ARCHIVE_STATUS
                        Archive status of the sample (default: active)
  --clinical-notes CLINICAL_NOTES
                        Clinical notes for the sample
  --phenotype-terms PHENOTYPE_TERMS
                        JSON array of phenotype terms (default: [])

Output Files

The tool generates the following CSV files:

  1. variants.csv(.gz): Contains variant information with VRS IDs
  2. transcript_consequences.csv(.gz): Contains transcript consequences for each variant
  3. sample_variants.csv(.gz): Contains sample-specific variant information
  4. samples.csv: Contains sample metadata

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • OpenCRAVAT for providing the annotation framework
  • GA4GH VRS for the variant representation standard

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aiva_sample_processor-0.1.0.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aiva_sample_processor-0.1.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file aiva_sample_processor-0.1.0.tar.gz.

File metadata

  • Download URL: aiva_sample_processor-0.1.0.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for aiva_sample_processor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e8d69f3d71eac7187072dcd90a92d5e229be4938a770d5dfa4ce8479b7384bb6
MD5 f4e38b156008961a988ba42b3cba7e8c
BLAKE2b-256 4d3475250c6c94061a83c9920dea7f4f8fcd07a6f26bc8e5d25376a1c6263b25

See more details on using hashes here.

File details

Details for the file aiva_sample_processor-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for aiva_sample_processor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e4320357f88b75066f1dda8727fb887ce28c274ce1ffc9ab90da42dea584616
MD5 56374db5be9705b054c70d45fb673804
BLAKE2b-256 dc5d48300c9aa81ba40593b0ba581cd5cbd3e7719fe88869d3bf150cd99dcfe2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page