A collection of functions to generate synthetic data from real labeled data itself.
Project description
hz
A collection of functions to generate synthetic data from real labeled data itself.
To install: pip install hz
Overview
The hz package provides a suite of functions designed to analyze labeled datasets by computing statistical measures across different labels. This can be particularly useful for understanding the characteristics of data subsets and for generating synthetic data that mimics real datasets. The functions included allow you to calculate means, variances, medians, and standard deviations for features within each label of a dataset.
Functions
label_means
Computes the mean of features for each label in a labeled dataset. This can be useful to create a simplified representation of the dataset where each group of labels is represented by its mean feature values.
Usage Example:
import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
means = hz.label_means(X, y)
print(means)
label_variances
Calculates the variance of features for each label in a labeled dataset. This function helps in understanding the dispersion of data points from the mean within each label group.
Parameters:
X(numpy.ndarray): The input features, assumed to be a 2D array.y(numpy.array): The corresponding labels for the dataset.
Returns:
- numpy.ndarray: An array where each row corresponds to the variance of features for each label.
Usage Example:
import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
variances = hz.label_variances(X, y)
print(variances)
label_medians
Computes the median of features for each label in a labeled dataset. This function is useful for understanding the central tendency of the data without the influence of outliers.
Usage Example:
import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
medians = hz.label_medians(X, y)
print(medians)
label_standard_deviations
Calculates the standard deviation of features for each label in a labeled dataset. This function provides insights into the amount of variation or dispersion of the dataset features within each label.
Usage Example:
import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
std_devs = hz.label_standard_deviations(X, y)
print(std_devs)
Installation
You can install the hz package directly from PyPI:
pip install hz
This package requires numpy to be installed in your Python environment, as it is heavily used for all mathematical computations within the package.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hz-0.0.7.tar.gz.
File metadata
- Download URL: hz-0.0.7.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70565dba78b87b2fc745bb52ac7c211bf56626d5e2d7a54bdab4c779a082ef20
|
|
| MD5 |
2edaa1d862fca0887771984acbc13d84
|
|
| BLAKE2b-256 |
b0a682006230a6412bb6d6bb5c49bff502ef3e6527d6edae7ea28220003f2861
|
File details
Details for the file hz-0.0.7-py3-none-any.whl.
File metadata
- Download URL: hz-0.0.7-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27d4effd9b7dbaa4a1d6d794ecac51e4f16871435985c116754f466c66050302
|
|
| MD5 |
3ab561a9ff79b8baedb27223413d73ba
|
|
| BLAKE2b-256 |
9965bbb5597405ba448bbece32844e7c4ea738f85117e318400837711c1cc049
|