A Python module for simple data file profiling.
Project description
Data File Profiler Utils
A Python module for simple data file profiling.
Usage
Installation
pip install data-file-profiler-utils
Integration
from import data_file_profiler_utils import Manager as ProfileManager
pm = ProfileManager()
pm.profile_file("/tmp/patient002.vcf")
Exported Console Script
Contents of sample data file:
cat -n sample.tsv
1 #CHROM POS ID REF ALT QUAL FILTER INFO
2 1 12345 rs567 A G 50 PASS DP=30;AF=0.2;AN=1000;CSQ=missense_variant|HIGH|GeneA|ENSG00000112345|transcriptA|ENST00000234567|protein_coding|1/10|c.123C>T|p.Arg41Trp|123/1000|ensembl
3 2 56789 rs890 T C 44 PASS DP=25;AF=0.1;AN=1200;CSQ=synonymous_variant|MEDIUM|GeneB|ENSG00000123456|transcriptB|ENST00000345678|protein_coding|5/20|c.567A>G|p.Ala189Ala|567/1200|ensembl
4 3 98765 rs123 G T 60 PASS DP=40;AF=0.3;AN=800;CSQ=splice_acceptor_variant|HIGH|GeneC|ENSG00000134567|transcriptC|ENST00000456789|protein_coding|2/15|c.987+1G>T|p.?|987/800|ensembl
5 1 34567 rs456 C A 55 PASS DP=35;AF=0.15;AN=900;CSQ=frameshift_variant|HIGH|GeneX|ENSG00000145678|transcriptX|ENST00000567890|protein_coding|8/25|c.345_346insT|p.Leu116Phefs*12|345/900|ensembl
Invocation of the exported console script:
profile-data-file --infile /tmp/demo-data-file-profiler-utils/sample.tsv --verbose --outdir /tmp/demo-data-file-profiler-utils/
--logfile was not specified and therefore was set to '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Wrote profile metadata file '/tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt'
The log file is '/tmp/demo-data-file-profiler-utils/profile_data_file.log'
Execution of '/tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/profile_data_file.py' completed
Contents of the profile report:
cat -n /tmp/demo-data-file-profiler-utils/sample.tsv.profile.txt
1 ## method-profiled: /tmp/data-file-profiler-utils/venv/lib/python3.10/site-packages/data_file_profiler_utils/manager.py
2 ## date-profiled: 2025-02-15-142732
3 ## profiled-by: sundaram
4 file: /tmp/demo-data-file-profiler-utils/sample.tsv
5 md5sum: 786b82b2414d3acf7af34c068e358759
6 date_created: 2025-02-15 14:06:37.202165
7 file_size: 776
8 line_count: 5
History
0.1.0 (2024-02-10)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file data_file_profiler_utils-0.1.7.tar.gz
.
File metadata
- Download URL: data_file_profiler_utils-0.1.7.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
e78ada4c85bb514fa597ea50169d235fa8c156cb7591189de5cfb7fe1112b6ad
|
|
MD5 |
bf9d9907ef467e7e4da590ce1cb38022
|
|
BLAKE2b-256 |
78e93b7d5c8202dea3e6856ce838564fdd9dbd7fd8f7d300ad05c7ab41173bc6
|
File details
Details for the file data_file_profiler_utils-0.1.7-py2.py3-none-any.whl
.
File metadata
- Download URL: data_file_profiler_utils-0.1.7-py2.py3-none-any.whl
- Upload date:
- Size: 9.5 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
52d9ac6e474f74344926119d26561f1d9d317ce198ed88abb6a6b4bf15af2610
|
|
MD5 |
49d0ca19356b4eb040a9ff9eb1f45748
|
|
BLAKE2b-256 |
c54d1726ae493b1fce7c6c8d0b440f6512fc50bd583a18fe006f4259a21381c0
|