A variety of smart tools to make analytics easy.
Project description
Dissector
Using the dissector command-line tool
Dissector is a command-line tool that runs analysis on each column in a delimited file. The input can be a single file or a directory with multiple files. The output contains a table of the following each column in the input file:
- strlen: minimum and maximum string length of the column.
- nnull: count of NAs and empty strings.
- nrow: number of rows.
- nunique: number of unique values.
- nvalue: number of rows with values.
- freq: frequency distribution of top n values. n is configured in
dissector_config.yaml
. - sample: a sample of top n values. n is configured in
dissector_config.yaml
. - symbols: non-alphanumic characters that are not in [a-zA-Z0-9]
Additionally, the following columns:
- column: column name.
- n: column order.
- filename: name of the input file.
- filetype: file type to which the file is associated to (e.g., csv).
- slice: slice to which the row represents
- timestamp: file modified date.
- hash: md5 hash from the input file.
- size: filesize in bytes.
usage: dissector [-h] [-t--to {xlsx,json,csv}] [-s SEP]
[--slicers [SLICERS ...]] [-c [COLS ...]]
[--config CONFIG]
dir file
positional arguments:
dir Input directory
file Input file (for multiple files use wildcard)
optional arguments:
-h, --help show this help message and exit
-t--to {xlsx,json,csv}
Dissected as one of: xlsx or json. Default is xlsx.
-s SEP, --sep SEP Column separator
--slicers [SLICERS ...]
Informs how to slice data. Default is "" for no
slicing.
-c [COLS ...], --cols [COLS ...]
If present, first row will not be used for column
names. No duplicates allowed.
--config CONFIG Config file for meta data. Defaults to
`.\config\dissector_config.yaml`
Before running the command, make sure a yaml
config file is created and saved as .\config\dissector_config.yaml
at the working directory.
---
nsample: 10
read_csv:
skipheader: 0
skipfooter: 0
engine: python
encoding: 'utf-8'
quotechar: '"'
on_bad_lines: 'warn'
dtype: 'str'
keep_default_na: false
Here are some samples:
Fetch *.csv
from .\temp
and dissect them with delimiter ,
.
dissector .\temp *.csv -s ,
Fetch myfile.text
from c:\temp
and dissect the file with delimiter ;
.
dissector c:\temp myfile.text -s ;
Fetch myfile.text
from c:\temp
and dissect the file with delimiter ;
by slicing the data without a filter and with a filter on COLUMN1 == 'VALUE'
.
dissector c:\temp myfile.text -s ; --slicers "" "COLUMN1 == 'VALUE'"
Fetch myfile.text
from c:\temp
and dissect the file with delimiter ;
by slicing the data without a filter and with a filter on a column name that has a space in it COLUMN 1 == 'VALUE'
.
dissector c:\temp myfile.text -sep ';' --slicers "" "`COLUMN 1` == 'VALUE'"
Using the dissector python libary
TODO
morpher
usage: morpher [-h] [--sep SEP] [--replace] [--to {xlsx,json}] dir file
positional arguments:
dir Input directory
file Input file or files (wildcard)
optional arguments:
-h, --help show this help message and exit
--sep SEP Column separator
--replace Replace output file if it already exists
--to {xlsx,json} How to output dissected result: to_xls|to_json
banking
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file smart_tools-0.9.tar.gz
.
File metadata
- Download URL: smart_tools-0.9.tar.gz
- Upload date:
- Size: 11.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cc25db79eb99445cff44e34d6599d83aa8299b68440e05a239db3efb15e32460 |
|
MD5 | 4904a287b8386589c91fb618bcc58fd8 |
|
BLAKE2b-256 | 8efa2a67f076eec0289189f1243971ec7bae7fac0b3e7752e55227fd6ffd21ee |
File details
Details for the file smart_tools-0.9-py3-none-any.whl
.
File metadata
- Download URL: smart_tools-0.9-py3-none-any.whl
- Upload date:
- Size: 16.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0662f437eba666afca7f58a253d4883834f118c9c0e1d719c068ef37050fbb6c |
|
MD5 | 1e36b61d9b7d2d09669a29982cb22676 |
|
BLAKE2b-256 | 0e5cd1d7e3b7d5fed21d946a23153176be86d46c251b7c0388542de47282ae83 |