Skip to main content

A Python module for simple data file profiling.

Project description

Data File Profiler Utils

A Python module for simple data file profiling.

Usage

Installation

pip install data-file-profiler-utils

Integration

from import data_file_profiler_utils import Manager as ProfileManager

pm = ProfileManager()
pm.profile_file("/tmp/patient002.vcf")

Exported Console Script

Contents of sample data file:

cat -n sample.tsv
1  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
2  1       12345   rs567   A       G       50      PASS    DP=30;AF=0.2;AN=1000;CSQ=missense_variant|HIGH|GeneA|ENSG00000112345|transcriptA|ENST00000234567|protein_coding|1/10|c.123C>T|p.Arg41Trp|123/1000|ensembl
3  2       56789   rs890   T       C       44      PASS    DP=25;AF=0.1;AN=1200;CSQ=synonymous_variant|MEDIUM|GeneB|ENSG00000123456|transcriptB|ENST00000345678|protein_coding|5/20|c.567A>G|p.Ala189Ala|567/1200|ensembl
4  3       98765   rs123   G       T       60      PASS    DP=40;AF=0.3;AN=800;CSQ=splice_acceptor_variant|HIGH|GeneC|ENSG00000134567|transcriptC|ENST00000456789|protein_coding|2/15|c.987+1G>T|p.?|987/800|ensembl
5  1       34567   rs456   C       A       55      PASS    DP=35;AF=0.15;AN=900;CSQ=frameshift_variant|HIGH|GeneX|ENSG00000145678|transcriptX|ENST00000567890|protein_coding|8/25|c.345_346insT|p.Leu116Phefs*12|345/900|ensembl

Invocation of the exported console script:

profile-data-file --infile /tmp/demo-data-file-profiler-utils/sample.tsv --verbose --outdir /tmp/demo-data-file-profiler-utils/
--logfile was not specified and therefore was set to '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Wrote profile metadata file '/tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt'
The log file is '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Execution of '/tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/profile_data_file.py' completed

Contents of the profile report:

cat -n /tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt
1  ## method-profiled: /tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/manager.py
2  ## date-profiled: 2025-02-15-142732
3  ## profiled-by: sundaram
4  file: /tmp/demo-data-file-profiler-utils/sample.tsv
5  md5sum: 786b82b2414d3acf7af34c068e358759
6  date_created: 2025-02-15 14:06:37.202165
7  file_size: 776
8  line_count: 5

History

0.1.0 (2024-02-10)

  • First release on PyPI.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page