Skip to main content

A variety of smart tools to make analytics easy.

Project description

Dissector

Using the dissector command-line tool

Dissector is a command-line tool that runs analysis on each column in a delimited file. The input can be a single file or a directory with multiple files. The output contains a table of the following each column in the input file:

  • strlen: minimum and maximum string length of the column.
  • nnull: count of NAs and empty strings.
  • nrow: number of rows.
  • nunique: number of unique values.
  • nvalue: number of rows with values.
  • freq: frequency distribution of top n values. n is configured in dissector_config.yaml.
  • sample: a sample of top n values. n is configured in dissector_config.yaml.
  • symbols: non-alphanumic characters that are not in [a-zA-Z0-9]

Additionally, the following columns:

  • column: column name.
  • n: column order.
  • filename: name of the input file.
  • filetype: file type to which the file is associated to (e.g., csv).
  • slice: slice to which the row represents
  • timestamp: file modified date.
  • hash: md5 hash from the input file.
  • size: filesize in bytes.
usage: dissector [-h] [-t--to {xlsx,json,csv}] [-s SEP]
                    [--slicers [SLICERS ...]] [-c [COLS ...]]
                    [--config CONFIG]
                    dir file

positional arguments:
  dir                   Input directory
  file                  Input file (for multiple files use wildcard)

optional arguments:
  -h, --help            show this help message and exit
  -t--to {xlsx,json,csv}
                        Dissected as one of: xlsx or json. Default is xlsx.
  -s SEP, --sep SEP     Column separator
  --slicers [SLICERS ...]
                        Informs how to slice data. Default is "" for no
                        slicing.
  -c [COLS ...], --cols [COLS ...]
                        If present, first row will not be used for column
                        names. No duplicates allowed.
  --config CONFIG       Config file for meta data. Defaults to
                        `.\config\dissector_config.yaml`

Before running the command, make sure a yaml config file is created and saved as .\config\dissector_config.yaml at the working directory.

---
nsample: 10
read_csv:
  skipheader: 0
  skipfooter: 0
  engine: python
  encoding: 'utf-8'
  quotechar: '"'
  on_bad_lines: 'warn'
  dtype: 'str'
  keep_default_na: false

Here are some samples:

Fetch *.csv from .\temp and dissect them with delimiter ,.

dissector .\temp *.csv -s ,

Fetch myfile.text from c:\temp and dissect the file with delimiter ;.

dissector c:\temp myfile.text -s ;

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on COLUMN1 == 'VALUE'.

dissector c:\temp myfile.text -s ; --slicers "" "COLUMN1 == 'VALUE'"

Fetch myfile.text from c:\temp and dissect the file with delimiter ; by slicing the data without a filter and with a filter on a column name that has a space in it COLUMN 1 == 'VALUE'.

dissector c:\temp myfile.text -sep ';' --slicers "" "`COLUMN 1` == 'VALUE'"

Using the dissector python libary

TODO

morpher

usage: morpher [-h] [--sep SEP] [--replace] [--to {xlsx,json}] dir file

positional arguments:
  dir               Input directory
  file              Input file or files (wildcard)

optional arguments:
  -h, --help        show this help message and exit
  --sep SEP         Column separator
  --replace         Replace output file if it already exists
  --to {xlsx,json}  How to output dissected result: to_xls|to_json

banking

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_tools-0.9.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

smart_tools-0.9-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file smart_tools-0.9.tar.gz.

File metadata

  • Download URL: smart_tools-0.9.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.9.tar.gz
Algorithm Hash digest
SHA256 cc25db79eb99445cff44e34d6599d83aa8299b68440e05a239db3efb15e32460
MD5 4904a287b8386589c91fb618bcc58fd8
BLAKE2b-256 8efa2a67f076eec0289189f1243971ec7bae7fac0b3e7752e55227fd6ffd21ee

See more details on using hashes here.

File details

Details for the file smart_tools-0.9-py3-none-any.whl.

File metadata

  • Download URL: smart_tools-0.9-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.9

File hashes

Hashes for smart_tools-0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 0662f437eba666afca7f58a253d4883834f118c9c0e1d719c068ef37050fbb6c
MD5 1e36b61d9b7d2d09669a29982cb22676
BLAKE2b-256 0e5cd1d7e3b7d5fed21d946a23153176be86d46c251b7c0388542de47282ae83

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page