Package to convert a vcf into a pandas dataframe.
Project description
vcf2pandas
vcf2pandas
is a python package to convert vcf files to pandas
dataframes.
Install
pip install vcf2pandas
Dependencies
- pandas (2.1.0)
- pysam (0.21.0)
Usage
Selecting all columns
from vcf2pandas import vcf2pandas
import pandas
df_all = vcf2pandas("path_to_vcf.vcf")
Selecting custom custom columns and samples
info_fields = ["info_field_1", "info_field_2"]
sample_list = ["sample_name_1", "sample_name_2"]
format_fields = ["format_name_1", "format_name_2"]
df_selected = vcf2pandas(
"path_to_vcf.vcf",
info_fields=info_fields,
sample_list=sample_list,
format_fields=format_fields,
)
Custom column ordering
vcf2pandas
can select custom/specific:
- INFO fields
- samples
- FORMAT fields
And order the selected columns based on the input list.
E.g. The following list:
info_fields = ["DP", "MQM", "QA"]
Gets the columns (in that order)
INFO:DP INFO:MQM INFO:QA
Note that this only applies for INFO and FORMAT columns. That is, the samples will be ordered based on the VCF and not the input list.
Output
INFO and FORMAT headings
INFO:INFO_FIELD e.g. INFO:DP
FORMAT:SAMPLE_NAME:FORMAT_FIELD e.g. FORMAT:HG002:GT
INFO fields not present for some variants
When certain INFO fields are not present for certain variants, vcf2pandas
inserts a .
instead in that cell. E.g. for vcf3_all.txt
you can see INFO:GENE
column has .
for the first 7 variants.
Examples
Example vcf and output files (dataframes as a .txt file) are available in examples/
Example Usage
df1 = vcf2pandas("examples/vcf1.vcf")
df2 = vcf2pandas("examples/vcf2.vcf")
df3_all = vcf2pandas("examples/vcf3.vcf")
info = ["DP"]
samples = ["HG002"]
format_fields = ["GT", "AO"]
df3_selected = vcf2pandas("examples/vcf3.vcf")
To print to a text file:
with open("path_to_txt_file.txt") as f:
f.write(df.to_string())
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vcf2pandas-0.1.1.tar.gz
.
File metadata
- Download URL: vcf2pandas-0.1.1.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a126b120057d01a6b72e8594246875efc42377968816122772612a8bf1465a18 |
|
MD5 | 54ad7ef99b3d083f8da2b2218b2a9c78 |
|
BLAKE2b-256 | de7d88d6c2f266c1e2275eff7a7df6ecacef80a17438dbd9dc27f061fb2018a2 |
File details
Details for the file vcf2pandas-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: vcf2pandas-0.1.1-py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.153.1-microsoft-standard-WSL2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d7156338398f89f8feec5279e0a5a3687dffcc1544b0cab51ed964d06996dc2f |
|
MD5 | 8e2c05c0062c52152ba52e30f796d636 |
|
BLAKE2b-256 | c9bf2e41be9446b4e31b2737e080ddae68baa107569f51374fffc9909191d6c6 |