Tools for creating and analyzing confidence ellipses, including Hotelling's T-squared ellipses for multivariate statistical analysis and data visualization.
Project description
pyEllipse
A Python package for computing Hotelling's T² statistics and generating confidence ellipse/ellipsoid coordinates for multivariate data analysis and visualization.
Overview
pyEllipse provides three main functions for analyzing multivariate data:
hotelling_parameters- Calculate Hotelling's T² statistics and ellipse parametershotelling_coordinates- Generate Hotelling's ellipse/ellipsoid coordinates from PCA/PLS scoresconfidence_ellipse- Compute confidence ellipse/ellipsoid coordinates from raw data with grouping support
Installation
pip install pyEllipse
Usage Examples
Example 1: Hotelling's T² statistic and confidence ellipse from PCA Scores
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from pathlib import Path
from pyEllipse import hotelling_parameters, hotelling_coordinates, confidence_ellipse
def load_wine_data():
"""Load wine dataset and add cultivar labels"""
wine_df = pd.read_csv('data/wine.csv')
# Add cultivar labels based on standard Wine dataset structure
cultivar = []
for i in range(len(wine_df)):
if i < 59:
cultivar.append('Cultivar 1')
elif i < 130:
cultivar.append('Cultivar 2')
else:
cultivar.append('Cultivar 3')
wine_df['Cultivar'] = cultivar
return wine_df
wine_df = load_wine_data()
X = wine_df.drop('Cultivar', axis=1)
y = wine_df['Cultivar']
# Perform PCA
pca = PCA()
SS = StandardScaler()
X = SS.fit_transform(X)
pca_scores = pca.fit_transform(X)
explained_var = pca.explained_variance_ratio_
plt.style.use('bmh')
# Calculate T² statistics
results = hotelling_parameters(pca_scores, k=2)
t2 = results['Tsquared'].values
# Generate ellipse coordinates for plotting
ellipse_95 = hotelling_coordinates(pca_scores, pcx=1, pcy=2, conf_limit=0.95)
ellipse_99 = hotelling_coordinates(pca_scores, pcx=1, pcy=2, conf_limit=0.99)
# Plot the PCA scores with Hotelling's T² ellipse
plt.figure(figsize=(8, 6))
scatter = plt.scatter(
pca_scores[:, 0], pca_scores[:, 1],
c=t2, cmap='jet', alpha=0.85, s=70, label='Wine samples'
)
cbar = plt.colorbar(scatter)
cbar.set_label('Hotelling T² Statistic', rotation=270, labelpad=20)
plt.plot(ellipse_95['x'], ellipse_95['y'], 'r-', linewidth=1, label='95% Confidence level')
plt.plot(ellipse_99['x'], ellipse_99['y'], 'k-', linewidth=1, label='99% Confidence level')
plt.xlim(-1000, 1000)
plt.ylim(-50, 60)
plt.xlabel(f'PC1 ({explained_var[0]*100:.2f}%)', fontsize=14, labelpad=10, fontweight='bold')
plt.ylabel(f'PC2 ({explained_var[1]*100:.2f}%)', fontsize=14, labelpad=10, fontweight='bold')
plt.title("Hotelling's T² Ellipse from PCA Scores", fontsize=16, pad=10, fontweight='bold')
plt.legend(
loc='upper left', fontsize=10, frameon=True, framealpha=0.9,
edgecolor='black', shadow=True, facecolor='white', borderpad=1
)
plt.show()
Example 2: Grouped Confidence Ellipses
wine_df['PC1'] = pca_scores[:, 0]
wine_df['PC2'] = pca_scores[:, 1]
colors = ['red', 'blue', 'green']
cultivars = wine_df['Cultivar'].unique()
color_map = {cultivar: color for cultivar, color in zip(cultivars, colors)}
point_colors = wine_df['Cultivar'].map(color_map)
# Plott PCA scores with confidence ellipses for each cultivar
plt.figure(figsize=(8, 6))
for i, cultivar in enumerate(cultivars):
mask = wine_df['Cultivar'] == cultivar
plt.scatter(
wine_df.loc[mask, 'PC1'], wine_df.loc[mask, 'PC2'], # type: ignore
c=colors[i], alpha=0.6, s=70, label=cultivar
)
ellipse_coords = confidence_ellipse(
data=wine_df,
x='PC1',
y='PC2',
group_by='Cultivar',
conf_level=0.95,
robust=True,
distribution='hotelling'
)
for i, cultivar in enumerate(cultivars):
ellipse_data = ellipse_coords[ellipse_coords['Cultivar'] == cultivar]
plt.plot(
ellipse_data['x'], ellipse_data['y'],
color=colors[i], linewidth=1, linestyle='-', label=f'{cultivar} (95% CI)'
)
plt.xlim(-1000, 1000)
plt.ylim(-50, 60)
plt.xlabel(f'PC1 ({explained_var[0]*100:.2f}%)', fontsize=14, labelpad=10, fontweight='bold')
plt.ylabel(f'PC2 ({explained_var[1]*100:.2f}%)', fontsize=14, labelpad=10, fontweight='bold')
plt.title("PCA Scores with Cultivar Group Confidence Ellipses", fontsize=16, pad=10, fontweight='bold')
plt.legend(
loc='upper left', fontsize=10, frameon=True, framealpha=0.9,
edgecolor='black', shadow=True, facecolor='white', borderpad=1
)
plt.show()
Example 3: Grouped 3D Confidence Ellipsoids
wine_df['PC1'] = pca_scores[:, 0]
wine_df['PC2'] = pca_scores[:, 1]
wine_df['PC3'] = pca_scores[:, 2]
colors = ['red', 'blue', 'green']
light_colors = ['lightcoral', 'lightblue', 'lightgreen']
cultivars = wine_df['Cultivar'].unique()
ellipse_coords = confidence_ellipse(
data=wine_df,
x='PC1',
y='PC2',
z='PC3',
group_by='Cultivar',
conf_level=0.95,
robust=True,
distribution='hotelling'
)
fig = plt.figure(figsize=(10, 6), facecolor='white')
ax = fig.add_subplot(111, projection='3d', facecolor='white')
for i, cultivar in enumerate(cultivars):
mask = wine_df['Cultivar'] == cultivar
ax.scatter(
wine_df.loc[mask, 'PC1'],
wine_df.loc[mask, 'PC2'],
wine_df.loc[mask, 'PC3'], # type: ignore
c=colors[i],
alpha=0.8,
s=50,
label=cultivar,
edgecolors='black',
linewidth=0.5
)
ellipse_data = ellipse_coords[ellipse_coords['Cultivar'] == cultivar]
n_points = int(np.sqrt(len(ellipse_data)))
x_2d = ellipse_data['x'].values.reshape(n_points, -1)
y_2d = ellipse_data['y'].values.reshape(n_points, -1)
z_2d = ellipse_data['z'].values.reshape(n_points, -1)
ax.plot_surface(
x_2d,
y_2d,
z_2d,
color=light_colors[i],
alpha=0.4,
linewidth=0,
antialiased=True
)
ax.set_xlabel(f'PC1 ({explained_var[0]*100:.2f}%)', fontsize=12, labelpad=5, fontweight='bold')
ax.set_ylabel(f'PC2 ({explained_var[1]*100:.2f}%)', fontsize=12, labelpad=5, fontweight='bold')
ax.set_zlabel(f'PC3 ({explained_var[2]*100:.2f}%)', fontsize=12, labelpad=1, fontweight='bold')
ax.set_title('3D PCA Scores with 95% Confidence Ellipsoids', fontsize=16, fontweight='bold')
ax.legend(
loc='upper right', fontsize=10, frameon=True, framealpha=0.9,
edgecolor='black', shadow=True, facecolor='white', borderpad=1
)
ax.grid(True, alpha=0.3, color='gray')
ax.view_init(elev=20, azim=65)
plt.tight_layout()
plt.show()
Key Differences Between Functions
| Feature | hotelling_parameters |
hotelling_coordinates |
confidence_ellipse |
|---|---|---|---|
| Input | Component scores | Component scores | Raw data |
| Purpose | T² statistics | Plot coordinates | Plot coordinates |
| Grouping | -- | -- | Yes |
| Robust | -- | -- | Yes |
| 2D/3D | 2D only for ellipse params | Both | Both |
| Distribution | Hotelling only | Hotelling only | Normal or Hotelling |
| Use Case | Outlier detection, QC | Visualizing PCA | Exploratory data analysis |
When to Use Each Function
Use hotelling_parameters when:
- You need T² statistics for outlier detection
- You want confidence cutoff values
- You're performing quality control or process monitoring
- You need ellipse parameters (semi-axes lengths)
Use hotelling_coordinates when:
- You have PCA/PLS component scores
- You want to visualize confidence regions on score plots
- You need precise control over which components to plot
- You're creating publication-quality figures from multivariate models
Use confidence_ellipse when:
- You're working with raw data (not scores)
- You need to compare multiple groups
- You want robust estimation for outlier-resistant analysis
- You need flexibility in distribution choice (normal vs Hotelling)
References
- Hotelling, H. (1931). The generalization of Student's ratio. Annals of Mathematical Statistics, 2(3), 360-378.
- Brereton, R. G. (2016). Hotelling's T-squared distribution, its relationship to the F distribution and its use in multivariate space. Journal of Chemometrics, 30(1), 18-21.
- Raymaekers, J., & Rousseeuw, P. J. (2019). Fast robust correlation for high dimensional data. Technometrics, 63(2), 184-198.
- Jackson, J. E. (1991). A User's Guide to Principal Components. Wiley.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyellipse-0.1.3.tar.gz.
File metadata
- Download URL: pyellipse-0.1.3.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.13.7 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a3add0fdd984af31f1f6d8abb6b5c4e0e75eb6f2bbfc607d5bd6d5ccc68508f
|
|
| MD5 |
5dd9054888f7ded92eaebc82964f0d65
|
|
| BLAKE2b-256 |
602a3aef4bc0aeecec019ac1de9d7e3636c3244ab9bfc80fda4f4ec1a4549bc2
|
File details
Details for the file pyellipse-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pyellipse-0.1.3-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.0 CPython/3.13.7 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17d9428b9908c6d6fbfa15a72176b4b28564b247a0bce52facfef33d44321d27
|
|
| MD5 |
89d7cc752fff164a8e57375e7866f8e7
|
|
| BLAKE2b-256 |
58a51967237d6b8d5b8b88df080886b13c4ceedab66bab9c64ee5de5a098590a
|