OpenCRAVAT CSV processor for generating sample CSVs for aiva-database import
Project description
AIVA Sample CSV Processor
A Python package for processing OpenCRAVAT CSV output files and generating sample CSVs for database import. This tool helps streamline the workflow from variant calling to database import by converting OpenCRAVAT annotations into structured CSV files ready for database loading.
Features
- Process OpenCRAVAT CSV files into structured database-ready formats
- Generate VRS IDs for variants using the aiva-vrs library
- Create separate CSV files for variants, transcript consequences, and sample variants
- Support for multi-sample VCF inputs
- Handle various zygosity, quality, and depth metrics
- Optional compression of output files
- Customizable sample metadata
Installation
Quick Install
pip install aiva-sample-processor
Development Install
git clone https://github.com/MHSPL/aiva-sample-processor.git
cd aiva-sample-processor
pip install -e .
Dependencies
This package requires:
- Python 3.8+
- open-cravat 2.2.0+
- aiva-vrs 0.1.0+
- pandas
- tqdm
- psycopg2-binary
All dependencies will be installed automatically when installing with pip.
Usage
1. Run OpenCRAVAT
First, run OpenCRAVAT on your VCF file to generate the input CSV:
# Install OpenCRAVAT modules (first time only)
oc module install-base
oc module install csvreporter
# Run OpenCRAVAT
oc run input.vcf -l hg38 -t csv
2. Process the OpenCRAVAT Output
Use the aiva-sample-processor command to process the OpenCRAVAT output:
aiva-sample-processor --input input.vcf.variant.csv --output-dir output_csvs
Command Line Options
usage: aiva-sample-processor [-h] --input INPUT --output-dir OUTPUT_DIR [--assembly {GRCh37,GRCh38}]
[--no-compress] [--owner-id OWNER_ID] [--group-id GROUP_ID]
[--is-public {true,false}] [--sample-type SAMPLE_TYPE]
[--status STATUS] [--review-status REVIEW_STATUS]
[--view-status VIEW_STATUS] [--archive-status ARCHIVE_STATUS]
[--clinical-notes CLINICAL_NOTES] [--phenotype-terms PHENOTYPE_TERMS]
Generate CSVs for importing sample variants into the database.
required arguments:
--input INPUT Path to OpenCRAVAT CSV file
--output-dir OUTPUT_DIR
Directory to write output CSVs to
optional arguments:
--assembly {GRCh37,GRCh38}
Genome assembly (default: GRCh38)
--no-compress Do not compress output files with gzip
--owner-id OWNER_ID User ID of the owner (default: user-1)
--group-id GROUP_ID Group ID for the samples
--is-public {true,false}
Whether samples are public (default: false)
--sample-type SAMPLE_TYPE
Type of sample (default: blood)
--status STATUS Sample processing status (default: processed)
--review-status REVIEW_STATUS
Review status of the sample (default: not_reviewed)
--view-status VIEW_STATUS
View status of the sample (default: none)
--archive-status ARCHIVE_STATUS
Archive status of the sample (default: active)
--clinical-notes CLINICAL_NOTES
Clinical notes for the sample
--phenotype-terms PHENOTYPE_TERMS
JSON array of phenotype terms (default: [])
Output Files
The tool generates the following CSV files:
- variants.csv(.gz): Contains variant information with VRS IDs
- transcript_consequences.csv(.gz): Contains transcript consequences for each variant
- sample_variants.csv(.gz): Contains sample-specific variant information
- samples.csv: Contains sample metadata
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- OpenCRAVAT for providing the annotation framework
- GA4GH VRS for the variant representation standard
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aiva_sample_processor-0.1.0.tar.gz.
File metadata
- Download URL: aiva_sample_processor-0.1.0.tar.gz
- Upload date:
- Size: 15.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e8d69f3d71eac7187072dcd90a92d5e229be4938a770d5dfa4ce8479b7384bb6
|
|
| MD5 |
f4e38b156008961a988ba42b3cba7e8c
|
|
| BLAKE2b-256 |
4d3475250c6c94061a83c9920dea7f4f8fcd07a6f26bc8e5d25376a1c6263b25
|
File details
Details for the file aiva_sample_processor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: aiva_sample_processor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 15.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e4320357f88b75066f1dda8727fb887ce28c274ce1ffc9ab90da42dea584616
|
|
| MD5 |
56374db5be9705b054c70d45fb673804
|
|
| BLAKE2b-256 |
dc5d48300c9aa81ba40593b0ba581cd5cbd3e7719fe88869d3bf150cd99dcfe2
|