demonstration of quantile regression
Most of us are familiar with the charts that pediatricians use that show percentiles of weight and height as a function of age; generating such a chart from a small sample of data requires quantile regression or similar methods. (When working with a large enough sample of data, one can bin the data, i.e., divide the x-axis into intervals and calculate percentiles independently for each interval. But, this approach uses the data inefficiently and is unworkable when sample sizes are small).
Quantiles and percentiles are the same except for a factor of 100, e.g., the 30th percentile is the 0.3 quantile.
This Python script demonstrates that one can perform quantile regression using only Python, NumPy, and SciPy. The only other dependency is on matplotlib, which is used to plot the data and the quantile estimates.
In detail, the script does the following:
(1) Model parameters are assigned. (Currently, these are hardwired into the code).
(2) The program generates an artificial bivariate sample of data (x, y) as follows:
- x is generated by drawing from a distribution that is uniform on [x_min, x_max], where x_min and x_max are currently 0 and 1, respectively.
- y is then generated according to a normal distribution having mean -0.5 + x and standard deviation 1.0 + 0.5 * x.
(All of this can be changed, e.g., one could choose to make the mean of y quadratic in x).
(3) The code defines an objective function based on the tilted absolute value function (see references for motivation).
(4) The SciPy optimization package is then used to optimize (minimize) the objective function.
(5) Using the matplotlib module, the code plots a scatter diagram of the data with an overlay of percentile lines.