Skip to main content

A collection of functions to generate synthetic data from real labeled data itself.

Project description

hz

A collection of functions to generate synthetic data from real labeled data itself.

To install: pip install hz

Overview

The hz package provides a suite of functions designed to analyze labeled datasets by computing statistical measures across different labels. This can be particularly useful for understanding the characteristics of data subsets and for generating synthetic data that mimics real datasets. The functions included allow you to calculate means, variances, medians, and standard deviations for features within each label of a dataset.

Functions

label_means

Computes the mean of features for each label in a labeled dataset. This can be useful to create a simplified representation of the dataset where each group of labels is represented by its mean feature values.

Usage Example:

import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
means = hz.label_means(X, y)
print(means)

label_variances

Calculates the variance of features for each label in a labeled dataset. This function helps in understanding the dispersion of data points from the mean within each label group.

Parameters:

  • X (numpy.ndarray): The input features, assumed to be a 2D array.
  • y (numpy.array): The corresponding labels for the dataset.

Returns:

  • numpy.ndarray: An array where each row corresponds to the variance of features for each label.

Usage Example:

import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
variances = hz.label_variances(X, y)
print(variances)

label_medians

Computes the median of features for each label in a labeled dataset. This function is useful for understanding the central tendency of the data without the influence of outliers.

Usage Example:

import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
medians = hz.label_medians(X, y)
print(medians)

label_standard_deviations

Calculates the standard deviation of features for each label in a labeled dataset. This function provides insights into the amount of variation or dispersion of the dataset features within each label.

Usage Example:

import numpy as np
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
std_devs = hz.label_standard_deviations(X, y)
print(std_devs)

Installation

You can install the hz package directly from PyPI:

pip install hz

This package requires numpy to be installed in your Python environment, as it is heavily used for all mathematical computations within the package.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hz-0.0.7.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hz-0.0.7-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file hz-0.0.7.tar.gz.

File metadata

  • Download URL: hz-0.0.7.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for hz-0.0.7.tar.gz
Algorithm Hash digest
SHA256 70565dba78b87b2fc745bb52ac7c211bf56626d5e2d7a54bdab4c779a082ef20
MD5 2edaa1d862fca0887771984acbc13d84
BLAKE2b-256 b0a682006230a6412bb6d6bb5c49bff502ef3e6527d6edae7ea28220003f2861

See more details on using hashes here.

File details

Details for the file hz-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: hz-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for hz-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 27d4effd9b7dbaa4a1d6d794ecac51e4f16871435985c116754f466c66050302
MD5 3ab561a9ff79b8baedb27223413d73ba
BLAKE2b-256 9965bbb5597405ba448bbece32844e7c4ea738f85117e318400837711c1cc049

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page