Automatic generation of codebooks from dataframes.
Project description
Codebooks
Automatically generate codebooks from dataframes. Includes methods to:
- Infer variable type (as unique key, indicator, categorical, or continuous).
- Summarize values with histograms and KDEs.
- Generate a self-contained HTML report (may be extended to PDF or other formats in the future).
Usage:
codebooks -o output.html input.csv
Example
Adding variable descriptions
You can specify a csv file that maps variable names to descriptions using:
codebooks --desc descriptions.csv -o output.html input.csv
The csv file is expected to have two columns (variable, description).
License
3-Clause BSD (see LICENSE)
Tests
The test/ subdirectory contains a script to generate a synthetic data set, an integration test for the codebooks package, and a benchmark script used to test performance optimizations. You can run these with:
cd test
python dataset.py
codebooks --desc desc.csv dataset.csv
codebooks --desc desc.csv --parquet dataset.parquet
python benchmark.py
Authors
Mark Howison
http://mark.howison.org
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
codebooks-0.0.6.tar.gz
(13.4 kB
view hashes)
Built Distribution
Close
Hashes for codebooks-0.0.6-1-py3-none-any.whl
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 | 358e46985f14a7bf07d259766e74faaed7c19a4b8a5e90d008fced3a74e67684 |
|
| MD5 | 8792e39302694066b9d56e68c328db1a |
|
| BLAKE2b-256 | 43f3b6ca4630d1ae1c8f638fba714c9e85a18c56dd7450a8ad7705d699acab99 |