Skip to main content

Automatic generation of codebooks from dataframes.

Project description

Codebooks

Automatically generate codebooks from dataframes. Includes methods to:

  • Infer variable type (as unique key, indicator, categorical, or continuous).
  • Summarize values with histograms and KDEs.
  • Generate a self-contained HTML report (may be extended to PDF or other formats in the future).

Usage:

codebooks -o output.html input.csv

Adding variable descriptions

You can specify a csv file that maps variable names to descriptions using:

codebooks --desc descriptions.csv -o output.html input.csv

The csv file is expected to have two columns (variable, description).

License

3-Clause BSD (see LICENSE)

Tests

The test/ subdirectory contains a script to generate a synthetic data set, an integration test for the codebooks package, and a benchmark script used to test performance optimizations. You can run these with:

cd test
python dataset.py
codebooks --desc desc.csv dataset.csv
codebooks --desc desc.csv --parquet dataset.parquet
python benchmark.py

Authors

Mark Howison
http://mark.howison.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebooks-0.0.5.tar.gz (12.9 kB view hashes)

Uploaded Source

Built Distribution

codebooks-0.0.5-py3-none-any.whl (13.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page