Skip to main content

Collection of Python scripts/utils for file manipulation tasks

Project description

Data File Utils

Collection of Python scripts/utils for file manipulation tasks.

Exported console scripts

The following exported console scripts are available:

  • analyze-record-tuples

  • archive-dir

  • backup-dir

  • backup-file

  • compare-tab-files

  • create-tmp-dir

  • delete-old-files

  • find-last-directory

  • find-last-file

  • jsonl2json

  • profile-data-file

  • tsv2json

  • xlsx2tsv

analyze-record-tuples

This script will determine which records are missing from either of the two tab-delimited files. Some specified number of columns will make up the unique tuple for each line/record.

archive-dir

This script will archive the directory in-place using tar -zcvf and will apply suffix TIMESTAMP.tgz to the directory.

Sample invocation:

archive-dir /tmp/test-123/test-abc
test-abc/
test-abc/file-1
test-abc/file-2
test-abc/file-3
test-abc/file-4
Directory '/tmp/test-123/test-abc' successfully archived to 'test-abc_2024-01-02-212243.tgz'

backup-dir

This script will backup the directory in-place and will apply suffix TIMESTAMP.bak to the directory.

Sample invocation:

backup-dir /tmp/test-123
Backed-up '/tmp/test-123' to '/tmp/test-123.2024-01-02-210517.bak'

backup-file

This script will backup the file in-place and will apply suffix TIMESTAMP.bak to the file.

Sample invocation:

backup-file setup.py
Backed-up 'setup.py' to 'setup.py.2024-01-02-205756.bak'

compare-tab-files

This script will parse two tab-delimited files and generate a report to indicate which lines and columns are different.

create-tmp-dir

This script will prompt the user for the following information and then create a temporary directory: - root directory (default is /tmp) - user directory (default is $USER) - purpose

Sample invocation:

create-tmp-dir
Enter the root directory: [/tmp]:
Enter the user directory: [sundaram]:
Enter the purpose of the directory: stock-checker
Created output directory '/tmp/sundaram/stock-checker/2024-01-02-213509'

delete-old-files

This script will delete all old files belonging to the current or specified username in the /tmp or specified directory.

jsonl2json

This script will parse a JSONL file and write a JSON file for each line in the JSONL file.

profile-data-file

This script will output the following attributes of a specified file: - date created - md5sum - line count - byte size

Sample invocation:

profile-data-file requirements.txt
File: /home/sundaram/projects/data-file-utils/requirements.txt
md5sum: 2063352be9cbfa5bd1f1425524dbb77b
create_date: 2023-12-18 11:35:54.242520
byte_size: 14
line_count: 2

tsv2json

This script will parse a tab-delimited file and write a JSON file.

xlsx2tsv

This script will parse an Excel file and write a tab-delimited file for each worksheet.

Sample invocation:

xlsx2tsv --infile ~/projects/experiments/xlsx2tsv/genetics.xlsx
--config_file was not specified and therefore was set to '/home/sundaram/projects/experiments/xlsx2tsv/venv/lib/python3.10/site-packages/data_file_utils/conf/config.yaml'
--outdir was not specified and therefore was set to '/tmp/xlsx2tsv/2023-12-22-142224'
Created output directory '/tmp/xlsx2tsv/2023-12-22-142224'
--logfile was not specified and therefore was set to '/tmp/xlsx2tsv/2023-12-22-142224/xlsx2tsv.log'
Sheet 'genes' has been written to '/tmp/xlsx2tsv/2023-12-22-142224/genes.tsv
Sheet 'transcripts' has been written to '/tmp/xlsx2tsv/2023-12-22-142224/transcripts.tsv'
Sheet 'proteins' has been written to '/tmp/xlsx2tsv/2023-12-22-142224/proteins.tsv'
The log file is '/tmp/xlsx2tsv/2023-12-22-142224/xlsx2tsv.log'
Execution of '/home/sundaram/projects/experiments/xlsx2tsv/venv/lib/python3.10/site-packages/data_file_utils/xlsx2tsv.py' completed

History

0.1.0 (2023-12-18)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_file_utils-0.9.1.tar.gz (38.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_file_utils-0.9.1-py2.py3-none-any.whl (46.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file data_file_utils-0.9.1.tar.gz.

File metadata

  • Download URL: data_file_utils-0.9.1.tar.gz
  • Upload date:
  • Size: 38.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for data_file_utils-0.9.1.tar.gz
Algorithm Hash digest
SHA256 837fff7b72b1dafb95c93b8d4eddc17f3dc500923f217fcee273bbd01b00d9f5
MD5 cf1a19c70c7eaa767f71ba71a3ac70ea
BLAKE2b-256 4905b1c436f288e7fa46ec47ae3705056922025933dc9b01726cc7aac4c23bf5

See more details on using hashes here.

File details

Details for the file data_file_utils-0.9.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for data_file_utils-0.9.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9619cf3634f6554a78f15d73e8fb7c2d5543d41f3b580305edad626f9ecb4bce
MD5 537055e0d1de68316d42983cd1fc2bef
BLAKE2b-256 caf97de352731a36e1ef1a0060a3a643367f16e27bb6119a658cb4bea29fa320

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page