Skip to main content

Empirical density estimation

Project description

Empirical density estimation in python

empdens provides a unified interface to several density estimation packages, including an implementation of classifier-adjusted density estimation. Examples include

Applications of density estimation include

  • Detecting data drift: The reliability of a trained model's prediction at a new data point depends on the similarity between the new point and the training data. A density function trained on the training data can serve as a warning of data drift if the evaluated density at the new point is exceptionally low. One way to focus such an analysis is to train and evaluate the density using only several of the most-important features in the model.
  • Mode detection: Locating regions of high density is a first step to efficiently allocate resources to address an epidemic, market a product, etc.
  • Feature engineering: The density at a point with respect to any subset of the dimensions of a feature space can encode useful information.
  • Anomaly/novelty/outlier detection: A "point of low density" is a common working definition of "anomaly", although it's not the only one. (In astrostatistics, for example, a density spike may draw attention as a possible galaxy.)

Evaluating the performance of a density estimator is not straightforward. We rely on a mix of simulation, real-data sanity checks, and cross-validation in special cases, as detailed in our evaluation guide.

Installation

We're on pypi, so pip install empdens.

To keep the package lean, several packages that it's capable of using are not included as required dependencies. So, depending on your usage, you may get an error message reminding you to install any of the packages listed under the extras group in the pyproject.toml file.

Consider using the simplest-possible virtual environment if working directly on this repo.

Related work

Wishlist

Infrastructure:

  • expand code testing coverage
  • define new simulations

Tutorials, starting with

  • how CADE works
  • density estimation trees

Density estimation:

  • Implement a dimensionality-reduction pre-processing method. Extreme multicolinearly is a potential failure mode in CADE because the classifier can trivially distinguish fake data from real since the fake data model assumes feature independence.
  • Merge the best of the tree-based methods of LightGBM, detpack, Schmidberger and Frank, and astropy.stats.bayesian_blocks.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

empdens-1.4.4.tar.gz (179.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

empdens-1.4.4-py3-none-any.whl (186.1 kB view details)

Uploaded Python 3

File details

Details for the file empdens-1.4.4.tar.gz.

File metadata

  • Download URL: empdens-1.4.4.tar.gz
  • Upload date:
  • Size: 179.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for empdens-1.4.4.tar.gz
Algorithm Hash digest
SHA256 006a409a507e3016325cccdad0ba1282ca4e2046485673509ffdb3718bce6837
MD5 66061034cdbb37d14834d15e1de6bfb7
BLAKE2b-256 8022c887c3907167aef401bcdced36255f02c8ff5a0136ca0a3c5759f258df8a

See more details on using hashes here.

File details

Details for the file empdens-1.4.4-py3-none-any.whl.

File metadata

  • Download URL: empdens-1.4.4-py3-none-any.whl
  • Upload date:
  • Size: 186.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.11

File hashes

Hashes for empdens-1.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a1c090f54c19b009db531b027395701164e9ef5df531e3b897e0727a915dcc63
MD5 ef62680e5881ec6a26772421e0698d26
BLAKE2b-256 85fbd0dc952fb590eeff6a554afd6255bab4deaba5d387d237273cb85d516e36

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page