Skip to main content

A variety of smart tools to make analytics easy.

Project description

Dissector

Using the dissector command-line tool

Dissector is a command-line tool that runs analysis on each column in a delimited file. The input can be a single file or a directory with multiple files. The output contains a table of the following each column in the input file:

  • strlen: minimum and maximum string length of the column.
  • nnull: count of NAs and empty strings.
  • nrow: number of rows.
  • nunique: number of unique values.
  • nvalue: number of rows with values.
  • freq: frequency distribution of top n values. n is configured in dissector_config.yaml.
  • sample: a sample of top n values. n is configured in dissector_config.yaml.
  • symbols: non-alphanumic characters that are not in [a-zA-Z0-9]

Additionally, the following columns:

  • column: column name.
  • n: column order.
  • filename: name of the input file.
  • filetype: file type to which the file is associated to (e.g., csv).
  • slice: slice to which the row represents
  • timestamp: file modified date.
  • hash: md5 hash from the input file.
  • size: filesize in bytes.
usage: dissector [-h] [-t--to {xlsx,json,csv}] [-s SEP]
                    [--slicers [SLICERS ...]] [-c [COLS ...]]
                    [--config CONFIG]
                    dir file

positional arguments:
  dir                   Input directory
  file                  Input file (for multiple files use wildcard)

optional arguments:
  -h, --help            show this help message and exit
  -t--to {xlsx,json,csv}
                        Dissected as one of: xlsx or json. Default is xlsx.
  -s SEP, --sep SEP     Column separator
  --slicers [SLICERS ...]
                        Informs how to slice data. Default is "" for no
                        slicing.
  -c [COLS ...], --cols [COLS ...]
                        If present, first row will not be used for column
                        names. No duplicates allowed.
  --config CONFIG       Config file for meta data. Defaults to
                        `.\config\dissector_config.yaml`

Before running the command, make sure a yaml config file is created and saved as .\config\dissector_config.yaml at the working directory.

---
nsample: 10
read_csv:
  skipheader: 0
  skipfooter: 0
  engine: python
  encoding: 'utf-8'
  quotechar: '"'
  on_bad_lines: 'warn'
  dtype: 'str'
  keep_default_na: false

Here are some samples:

Fetch *.csv from .\temp and dissect them with delimiter ,.

dissector .\temp *.csv -s ,

Fetch myfile.text from c:\temp and dissect the file with delimiter ;.

dissector c:\temp myfile.text -s ;

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on COLUMN1 == 'VALUE'.

dissector c:\temp myfile.text -s ; --slicers "" "COLUMN1 == 'VALUE'"

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on a column name that has a space in it COLUMN 1 == 'VALUE'.

dissector c:\temp myfile.text -sep ';' --slicers "" "`COLUMN 1` == 'VALUE'"

Using the dissector python libary

TODO

morpher

usage: morpher [-h] [--sep SEP] [--replace] [--to {xlsx,json}] dir file

positional arguments:
  dir               Input directory
  file              Input file or files (wildcard)

optional arguments:
  -h, --help        show this help message and exit
  --sep SEP         Column separator
  --replace         Replace output file if it already exists
  --to {xlsx,json}  How to output dissected result: to_xls|to_json

banking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_tools-0.8.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

smart_tools-0.8-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file smart_tools-0.8.tar.gz.

File metadata

  • Download URL: smart_tools-0.8.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.8.tar.gz
Algorithm Hash digest
SHA256 ebc0d953aa4a9eead756fbecc97e8fe5c28e9855ac2ccf71b250b6f8ac36b2bf
MD5 3183a117fc5e56dc29cd03c07623b8c6
BLAKE2b-256 78b40b002b1959aa35a051b4bdcc0026d397481529597f1f7283a4024bdec430

See more details on using hashes here.

File details

Details for the file smart_tools-0.8-py3-none-any.whl.

File metadata

  • Download URL: smart_tools-0.8-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.8-py3-none-any.whl
Algorithm Hash digest
SHA256 c000af1e3f165f70cec62efd2ef9ab21e9ca4e24c3f1a6d85cc2fbb4d0d57f22
MD5 27e35dfc684c5032425f4a334e86cca1
BLAKE2b-256 66bb342145413af724a6c98015c2ba93e762c31298a93642f48e0dfcc921641e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page