Skip to main content

A Python module for simple data file profiling.

Project description

Data File Profiler Utils

A Python module for simple data file profiling.

Usage

Installation

pip install data-file-profiler-utils

Integration

from import data_file_profiler_utils import Manager as ProfileManager

pm = ProfileManager()
pm.profile_file("/tmp/patient002.vcf")

Exported Console Script

Contents of sample data file:

cat -n sample.tsv
1  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
2  1       12345   rs567   A       G       50      PASS    DP=30;AF=0.2;AN=1000;CSQ=missense_variant|HIGH|GeneA|ENSG00000112345|transcriptA|ENST00000234567|protein_coding|1/10|c.123C>T|p.Arg41Trp|123/1000|ensembl
3  2       56789   rs890   T       C       44      PASS    DP=25;AF=0.1;AN=1200;CSQ=synonymous_variant|MEDIUM|GeneB|ENSG00000123456|transcriptB|ENST00000345678|protein_coding|5/20|c.567A>G|p.Ala189Ala|567/1200|ensembl
4  3       98765   rs123   G       T       60      PASS    DP=40;AF=0.3;AN=800;CSQ=splice_acceptor_variant|HIGH|GeneC|ENSG00000134567|transcriptC|ENST00000456789|protein_coding|2/15|c.987+1G>T|p.?|987/800|ensembl
5  1       34567   rs456   C       A       55      PASS    DP=35;AF=0.15;AN=900;CSQ=frameshift_variant|HIGH|GeneX|ENSG00000145678|transcriptX|ENST00000567890|protein_coding|8/25|c.345_346insT|p.Leu116Phefs*12|345/900|ensembl

Invocation of the exported console script:

profile-data-file --infile /tmp/demo-data-file-profiler-utils/sample.tsv --verbose --outdir /tmp/demo-data-file-profiler-utils/
--logfile was not specified and therefore was set to '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Wrote profile metadata file '/tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt'
The log file is '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Execution of '/tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/profile_data_file.py' completed

Contents of the profile report:

cat -n /tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt
1  ## method-profiled: /tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/manager.py
2  ## date-profiled: 2025-02-15-142732
3  ## profiled-by: sundaram
4  file: /tmp/demo-data-file-profiler-utils/sample.tsv
5  md5sum: 786b82b2414d3acf7af34c068e358759
6  date_created: 2025-02-15 14:06:37.202165
7  file_size: 776
8  line_count: 5

History

0.1.0 (2024-02-10)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_file_profiler_utils-0.1.7.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

data_file_profiler_utils-0.1.7-py2.py3-none-any.whl (9.5 kB view details)

Uploaded Python 2Python 3

File details

Details for the file data_file_profiler_utils-0.1.7.tar.gz.

File metadata

  • Download URL: data_file_profiler_utils-0.1.7.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for data_file_profiler_utils-0.1.7.tar.gz
Algorithm Hash digest
SHA256 e78ada4c85bb514fa597ea50169d235fa8c156cb7591189de5cfb7fe1112b6ad
MD5 bf9d9907ef467e7e4da590ce1cb38022
BLAKE2b-256 78e93b7d5c8202dea3e6856ce838564fdd9dbd7fd8f7d300ad05c7ab41173bc6

See more details on using hashes here.

File details

Details for the file data_file_profiler_utils-0.1.7-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for data_file_profiler_utils-0.1.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 52d9ac6e474f74344926119d26561f1d9d317ce198ed88abb6a6b4bf15af2610
MD5 49d0ca19356b4eb040a9ff9eb1f45748
BLAKE2b-256 c54d1726ae493b1fce7c6c8d0b440f6512fc50bd583a18fe006f4259a21381c0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page