Automatic generation of codebooks from dataframes.
Project description
Codebooks
Automatically generate codebooks from dataframes. Includes methods to:
- Infer variable type (as unique key, indicator, categorical, or continuous).
- Summarize values with histograms and KDEs.
- Generate a self-contained HTML report (may be extended to PDF or other formats in the future).
Usage:
codebooks -o output.html input.csv
Adding variable descriptions
You can specify a csv file that maps variable names to descriptions using:
codebooks --desc descriptions.csv -o output.html input.csv
The csv file is expected to have two columns (variable, description).
License
3-Clause BSD (see LICENSE)
Tests
The test/
subdirectory contains a script to generate a synthetic data set, an integration test for the codebooks package, and a benchmark script used to test performance optimizations. You can run these with:
cd test
python dataset.py
codebooks --desc desc.csv dataset.csv
codebooks --desc desc.csv --parquet dataset.parquet
python benchmark.py
Authors
Mark Howison
http://mark.howison.org
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
codebooks-0.0.5.tar.gz
(12.9 kB
view hashes)
Built Distribution
codebooks-0.0.5-py3-none-any.whl
(13.0 kB
view hashes)
Close
Hashes for codebooks-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d57dd1a639a875ba1e0e5e214ee29a0fd6be26205b3f974f5d8bab4674134b7c |
|
MD5 | 2c462817d2e67c2f335f48030c03b0a1 |
|
BLAKE2b-256 | 2137eb5746c3c3a52816646947f419f6e547d9307d50ffae359a62fa6767b889 |