Skip to main content

Package to convert a vcf into a pandas dataframe.

Project description

vcf2pandas

vcf2pandas is a python package to convert vcf files to pandas dataframes.

Install

pip install vcf2pandas

Dependencies

  • pandas (2.1.0)
  • pysam (0.21.0)

Usage

Selecting all columns

from vcf2pandas import vcf2pandas
import pandas

df_all = vcf2pandas("path_to_vcf.vcf")

Selecting custom custom columns and samples

info_fields = ["info_field_1", "info_field_2"]
sample_list = ["sample_name_1", "sample_name_2"]
format_fields = ["format_name_1", "format_name_2"]

df_selected = vcf2pandas(
    "path_to_vcf.vcf",
    info_fields=info_fields,
    sample_list=sample_list,
    format_fields=format_fields,
)

Custom column ordering

vcf2pandas can select custom/specific:

  • INFO fields
  • samples
  • FORMAT fields

And order the selected columns based on the input list.

E.g. The following list:

info_fields = ["DP", "MQM", "QA"]

Gets the columns (in that order)

INFO:DP    INFO:MQM    INFO:QA

Note that this only applies for INFO and FORMAT columns. That is, the samples will be ordered based on the VCF and not the input list.

Output

INFO and FORMAT headings

INFO:INFO_FIELD                     e.g. INFO:DP
FORMAT:SAMPLE_NAME:FORMAT_FIELD     e.g. FORMAT:HG002:GT

INFO fields not present for some variants

When certain INFO fields are not present for certain variants, vcf2pandas inserts a . instead in that cell. E.g. for vcf3_all.txt you can see INFO:GENE column has . for the first 7 variants.

Examples

Example vcf and output files (dataframes as a .txt file) are available in examples/

Example Usage

df1 = vcf2pandas("examples/vcf1.vcf")
df2 = vcf2pandas("examples/vcf2.vcf")

df3_all = vcf2pandas("examples/vcf3.vcf")

info = ["DP"]
samples = ["HG002"]
format_fields = ["GT", "AO"]
df3_selected = vcf2pandas("examples/vcf3.vcf")

To print to a text file:

with open("path_to_txt_file.txt") as f:
    f.write(df.to_string())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcf2pandas-0.1.1.tar.gz (2.9 kB view details)

Uploaded Source

Built Distribution

vcf2pandas-0.1.1-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file vcf2pandas-0.1.1.tar.gz.

File metadata

  • Download URL: vcf2pandas-0.1.1.tar.gz
  • Upload date:
  • Size: 2.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for vcf2pandas-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a126b120057d01a6b72e8594246875efc42377968816122772612a8bf1465a18
MD5 54ad7ef99b3d083f8da2b2218b2a9c78
BLAKE2b-256 de7d88d6c2f266c1e2275eff7a7df6ecacef80a17438dbd9dc27f061fb2018a2

See more details on using hashes here.

File details

Details for the file vcf2pandas-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: vcf2pandas-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.10.12 Linux/5.15.153.1-microsoft-standard-WSL2

File hashes

Hashes for vcf2pandas-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d7156338398f89f8feec5279e0a5a3687dffcc1544b0cab51ed964d06996dc2f
MD5 8e2c05c0062c52152ba52e30f796d636
BLAKE2b-256 c9bf2e41be9446b4e31b2737e080ddae68baa107569f51374fffc9909191d6c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page