Skip to main content

Automatic generation of codebooks from dataframes.

Project description

Codebooks

Automatically generate codebooks from dataframes. Includes methods to:

  • Infer variable type (as unique key, indicator, categorical, or continuous).
  • Summarize values with histograms and KDEs.
  • Generate a self-contained HTML report (may be extended to PDF or other formats in the future).

Usage:

codebooks -o output.html input.csv

Example

Screenshot of codebook for test dataset

Adding variable descriptions

You can specify a csv file that maps variable names to descriptions using:

codebooks --desc descriptions.csv -o output.html input.csv

The csv file is expected to have two columns (variable, description).

License

3-Clause BSD (see LICENSE)

Tests

The test/ subdirectory contains a script to generate a synthetic data set, an integration test for the codebooks package, and a benchmark script used to test performance optimizations. You can run these with:

cd test
python dataset.py
codebooks --desc desc.csv dataset.csv
codebooks --desc desc.csv --parquet dataset.parquet
python benchmark.py

Authors

Mark Howison
http://mark.howison.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebooks-0.0.6.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

codebooks-0.0.6-1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file codebooks-0.0.6.tar.gz.

File metadata

  • Download URL: codebooks-0.0.6.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for codebooks-0.0.6.tar.gz
Algorithm Hash digest
SHA256 fec501f88a7bb067cbb512779cf2a873f0dee1e1fa93153aa4a0c330d5c92dcc
MD5 429bea4a7de24df9d24992fa2ea105d4
BLAKE2b-256 2615d219330625145edd01de78a49d745629dff393a74a712ce2e9ce36159d00

See more details on using hashes here.

File details

Details for the file codebooks-0.0.6-1-py3-none-any.whl.

File metadata

  • Download URL: codebooks-0.0.6-1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for codebooks-0.0.6-1-py3-none-any.whl
Algorithm Hash digest
SHA256 358e46985f14a7bf07d259766e74faaed7c19a4b8a5e90d008fced3a74e67684
MD5 8792e39302694066b9d56e68c328db1a
BLAKE2b-256 43f3b6ca4630d1ae1c8f638fba714c9e85a18c56dd7450a8ad7705d699acab99

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page