Skip to main content

A variety of smart tools to make analytics easy.

Project description

Dissector

Using the dissector command-line tool

Dissector is a command-line tool that runs analysis on each column in a delimited file. The input can be a single file or a directory with multiple files. The output contains a table of the following each column in the input file:

  • strlen: minimum and maximum string length of the column.
  • nnull: count of NAs and empty strings.
  • nrow: number of rows.
  • nunique: number of unique values.
  • nvalue: number of rows with values.
  • freq: frequency distribution of top n values. n is configured in dissector_config.yaml.
  • sample: a sample of top n values. n is configured in dissector_config.yaml.
  • symbols: non-alphanumic characters that are not in [a-zA-Z0-9]

Additionally, the following columns:

  • column: column name.
  • n: column order.
  • filename: name of the input file.
  • filetype: file type to which the file is associated to (e.g., csv).
  • slice: slice to which the row represents
  • timestamp: file modified date.
  • hash: md5 hash from the input file.
  • size: filesize in bytes.
usage: dissector [-h] [-t--to {xlsx,json,csv}] [-s SEP]
                    [--slicers [SLICERS ...]] [-c [COLS ...]]
                    [--config CONFIG]
                    dir file

positional arguments:
  dir                   Input directory
  file                  Input file (for multiple files use wildcard)

optional arguments:
  -h, --help            show this help message and exit
  -t--to {xlsx,json,csv}
                        Dissected as one of: xlsx or json. Default is xlsx.
  -s SEP, --sep SEP     Column separator
  --slicers [SLICERS ...]
                        Informs how to slice data. Default is "" for no
                        slicing.
  -c [COLS ...], --cols [COLS ...]
                        If present, first row will not be used for column
                        names. No duplicates allowed.
  --config CONFIG       Config file for meta data. Defaults to
                        `.\config\dissector_config.yaml`

Before running the command, make sure a yaml config file is created and saved as .\config\dissector_config.yaml at the working directory.

---
nsample: 10
read_csv:
  skipheader: 0
  skipfooter: 0
  engine: python
  encoding: 'utf-8'
  quotechar: '"'
  on_bad_lines: 'warn'
  dtype: 'str'
  keep_default_na: false

Here are some samples:

Fetch *.csv from .\temp and dissect them with delimiter ,.

dissector .\temp *.csv -s ,

Fetch myfile.text from c:\temp and dissect the file with delimiter ;.

dissector c:\temp myfile.text -s ;

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on COLUMN1 == 'VALUE'.

dissector c:\temp myfile.text -s ; --slicers "" "COLUMN1 == 'VALUE'"

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on a column name that has a space in it COLUMN 1 == 'VALUE'.

dissector c:\temp myfile.text -sep ';' --slicers "" "`COLUMN 1` == 'VALUE'"

Using the dissector python libary

TODO

morpher

usage: morpher [-h] [--sep SEP] [--replace] [--to {xlsx,json}] dir file

positional arguments:
  dir               Input directory
  file              Input file or files (wildcard)

optional arguments:
  -h, --help        show this help message and exit
  --sep SEP         Column separator
  --replace         Replace output file if it already exists
  --to {xlsx,json}  How to output dissected result: to_xls|to_json

banking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_tools-0.7.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

smart_tools-0.7-py3-none-any.whl (14.8 kB view details)

Uploaded Python 3

File details

Details for the file smart_tools-0.7.tar.gz.

File metadata

  • Download URL: smart_tools-0.7.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.7.tar.gz
Algorithm Hash digest
SHA256 8cc9d37793defd5f9144ef0add9fc03c40d1081a77cabdde33caafda700947f7
MD5 b0317876f7ab926919435aa3c60e1c13
BLAKE2b-256 49c9c847c26ff3e749d52bb342928a3fa2f6073f8054f08d6e40575281fb8f74

See more details on using hashes here.

File details

Details for the file smart_tools-0.7-py3-none-any.whl.

File metadata

  • Download URL: smart_tools-0.7-py3-none-any.whl
  • Upload date:
  • Size: 14.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 8fd5e745817f83ec59918a42fa958973458b44729ab1a918c0788f62bb16c783
MD5 1a61c8964a733fbbb85b1c498bbbcbef
BLAKE2b-256 5ecfd45ddd590dd8588e12922e9ea2fdab7af185b76bd1d6a5d0c31db8339782

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page