Skip to main content

A variety of smart tools to make analytics easy.

Project description

Dissector

Using the dissector command-line tool

Dissector is a command-line tool that runs analysis on each column in a delimited file. The input can be a single file or a directory with multiple files. The output contains a table of the following each column in the input file:

  • strlen: minimum and maximum string length of the column.
  • nnull: count of NAs and empty strings.
  • nrow: number of rows.
  • nunique: number of unique values.
  • nvalue: number of rows with values.
  • freq: frequency distribution of top n values. n is configured in dissector_config.yaml.
  • sample: a sample of top n values. n is configured in dissector_config.yaml.
  • symbols: non-alphanumic characters that are not in [a-zA-Z0-9]

Additionally, the following columns:

  • column: column name.
  • n: column order.
  • filename: name of the input file.
  • filetype: file type to which the file is associated to (e.g., csv).
  • slice: slice to which the row represents
  • timestamp: file modified date.
  • hash: md5 hash from the input file.
  • size: filesize in bytes.
usage: dissector [-h] [-t--to {xlsx,json,csv}] [-s SEP]
                    [--slicers [SLICERS ...]] [-c [COLS ...]]
                    [--config CONFIG]
                    dir file

positional arguments:
  dir                   Input directory
  file                  Input file (for multiple files use wildcard)

optional arguments:
  -h, --help            show this help message and exit
  -t--to {xlsx,json,csv}
                        Dissected as one of: xlsx or json. Default is xlsx.
  -s SEP, --sep SEP     Column separator
  --slicers [SLICERS ...]
                        Informs how to slice data. Default is "" for no
                        slicing.
  -c [COLS ...], --cols [COLS ...]
                        If present, first row will not be used for column
                        names. No duplicates allowed.
  --config CONFIG       Config file for meta data. Defaults to
                        `.\config\dissector_config.yaml`

Before running the command, make sure a yaml config file is created and saved as .\config\dissector_config.yaml at the working directory.

---
nsample: 10
read_csv:
  skipheader: 0
  skipfooter: 0
  engine: python
  encoding: 'utf-8'
  quotechar: '"'
  on_bad_lines: 'warn'
  dtype: 'str'
  keep_default_na: false

Here are some samples:

Fetch *.csv from .\temp and dissect them with delimiter ,.

dissector .\temp *.csv -s ,

Fetch myfile.text from c:\temp and dissect the file with delimiter ;.

dissector c:\temp myfile.text -s ;

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on COLUMN1 == 'VALUE'.

dissector c:\temp myfile.text -s ; --slicers "" "COLUMN1 == 'VALUE'"

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on a column name that has a space in it COLUMN 1 == 'VALUE'.

dissector c:\temp myfile.text -sep ';' --slicers "" "`COLUMN 1` == 'VALUE'"

Using the dissector python libary

TODO

morpher

usage: morpher [-h] [--sep SEP] [--replace] [--to {xlsx,json}] dir file

positional arguments:
  dir               Input directory
  file              Input file or files (wildcard)

optional arguments:
  -h, --help        show this help message and exit
  --sep SEP         Column separator
  --replace         Replace output file if it already exists
  --to {xlsx,json}  How to output dissected result: to_xls|to_json

banking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_tools-0.6.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

smart_tools-0.6-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file smart_tools-0.6.tar.gz.

File metadata

  • Download URL: smart_tools-0.6.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.6.tar.gz
Algorithm Hash digest
SHA256 e98558ece4300712df7ed45cce0d5eb378b3b702afa7896d2f1a7966e5a65809
MD5 ffd597db6bd29298c705971e7993ada3
BLAKE2b-256 789a8abc8ab72ebc8e09bce8bbece55e372c4335a9673978e1bf0ec65f52607e

See more details on using hashes here.

File details

Details for the file smart_tools-0.6-py3-none-any.whl.

File metadata

  • Download URL: smart_tools-0.6-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 b5c3a52811b6cbf34585c5ae58e1478d1f7cdeb14d2fc870129d2712465f5ff9
MD5 69368ad48e40ac82547447839b064f0f
BLAKE2b-256 ffc5dbc41f87f378d738219c05b826ace2de14b38264bc00652cd3ad0c2d26c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page