Skip to main content

No project description provided

Project description

Implementation of Flexible Conditional Density Estimator (FlexCode) in Python. See Izbicki, R.; Lee, A.B. Converting High-Dimensional Regression to High-Dimensional Conditional Density Estimation. Electronic Journal of Statistics, 2017 for details. Port of the original R package.

FlexCode

FlexCode is a general-purpose method for converting any conditional mean point estimator of $z$ to a conditional density estimator $(f(z \vert x))$, where $x$ represents the covariates. The key idea is to expand the unknown function $f(z \vert x)$ in an orthonormal basis ${\phi_i(z)}_{i}$:

$$f(z|x)=\sum_{i}\beta_{i }(x)\phi_i(z)$$

By the orthogonality property, the expansion coefficients are just conditional means

$$\beta_{i }(x) = \mathbb{E}\left[\phi_i(z)|x\right] \equiv \int f(z|x) \phi_i(z) dz$$

where the coefficients are estimated from data by an appropriate regression method.

Installation

git clone https://github.com/lee-group-cmu/FlexCode.git
pip install FlexCode[all]

Flexcode handles a number of regression models; if you wish to avoid installing all dependencies you can specify your desired regression methods using the optional requires in brackets. Targets include

  • xgboost
  • scikit-learn (for nearest neighbor regression, random forests)

A simple example

import numpy as np
import scipy.stats
import flexcode
from flexcode.regression_models import NN
import matplotlib.pyplot as plt

# Generate data p(z | x) = N(x, 1)
def generate_data(n_draws):
    x = np.random.normal(0, 1, n_draws)
    z = np.random.normal(x, 1, n_draws)
    return x.reshape((len(x), 1)), z.reshape((len(z), 1))

x_train, z_train = generate_data(10000)
x_validation, z_validation = generate_data(10000)
x_test, z_test = generate_data(10000)

# Parameterize model
model = flexcode.FlexCodeModel(NN, max_basis=31, basis_system="cosine",
                               regression_params={"k":20})

# Fit and tune model
model.fit(x_train, z_train)
model.tune(x_validation, z_validation)

# Estimate CDE loss
print(model.estimate_error(x_test, z_test))

# Calculate conditional density estimates
cdes, z_grid = model.predict(x_test, n_grid=200)

for ii in range(10):
    true_density = scipy.stats.norm.pdf(z_grid, x_test[ii], 1)
    plt.plot(z_grid, cdes[ii, :])
    plt.plot(z_grid, true_density, color = "green")
    plt.axvline(x=z_test[ii], color="red")
    plt.show()

FlexZBoost Buzzard Data

One particular realization of the FlexCode algorithm is FlexZBoost which uses XGBoost as the regression method. We apply this method to photo-z estimation in the LSST DESC DC-1. For members of the LSST DESC, you can find information on obtaining the data here.

import numpy as np
import pandas as pd
import flexcode
from flexcode.regression_models import XGBoost

# Read in data
def process_data(feature_file, has_z=False):
    """Processes buzzard data"""
    df = pd.read_table(feature_file, sep=" ")
    df["ug"] = df["u"] - df["g"]

    df.assign(ug = df.u - df.g,
              gr = df.g - df.r,
              ri = df.r - df.i,
              iz = df.i - df.z,
              zy = df.z - df.y,
              ug_err = np.sqrt(df['u.err'] ** 2 + df['g.err'] ** 2),
              gr_err = np.sqrt(df['g.err'] ** 2 + df['r.err'] ** 2),
              ri_err = np.sqrt(df['r.err'] ** 2 + df['i.err'] ** 2),
              iz_err = np.sqrt(df['i.err'] ** 2 + df['z.err'] ** 2),
              zy_err = np.sqrt(df['z.err'] ** 2 + df['y.err'] ** 2))

    if has_z:
        z = df.redshift.as_matrix()
        df.drop('redshift', axis=1, inplace=True)
    else:
        z = None

    return df.as_matrix(), z

x_data, z_data = process_data('buzzard_spec_witherrors_mass.txt', has_z=True)
x_test, _ = process_data('buzzard_phot_witherrors_mass.txt', has_z=False)

n_obs = x_data.shape[0]
n_train = round(n_obs * 0.8)
n_validation = n_obs - n_train

perm = np.random.permutation(n_obs)
x_train = x_data[perm[:n_train], :]
z_train = z_data[perm[:n_train]]
x_validation = x_data[perm[n_train:]]
z_validation = z_data[perm[n_train:]]

# Fit the model
model = flexcode.FlexCodeModel(XGBoost, max_basis=40, basis_system='cosine',
                               regression_params={"max_depth": 8})
model.fit(x_train, z_train)
model.tune(x_validation, z_validation)

# Make predictions
cdes, z_grid = model.predict(x_test, n_grid=200)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flexcode-0.2.3.tar.gz (58.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flexcode-0.2.3-py3-none-any.whl (26.4 kB view details)

Uploaded Python 3

File details

Details for the file flexcode-0.2.3.tar.gz.

File metadata

  • Download URL: flexcode-0.2.3.tar.gz
  • Upload date:
  • Size: 58.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for flexcode-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ad7542305e73fbe16a67a7990aafa9f8a752c0179557b93fdcd0b5c6851a8be3
MD5 59dd12395267e8d8f138d4762d37d6a7
BLAKE2b-256 575c90438e1b027dfc8dd6baad48ba35e58c933d2fc1f6f449589755e45cc64f

See more details on using hashes here.

File details

Details for the file flexcode-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: flexcode-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 26.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for flexcode-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2c0f9b5efbb509e92cbff161ae400630ca5a0a61491bd6fa6f0d13ed63ab1a77
MD5 288d029bcabab8a747677a80ae81befc
BLAKE2b-256 0d9b67e2be610329a7e72e1e0bbd68906991d7df60b0085f0a203bdce5343484

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page