A package for outlier detection in phenome datasets

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language

Project description

phenome-outlier-analysis

OutlierDetector Class Documentation

Overview

The OutlierDetector class is designed for detecting outliers in datasets using various normalization methods. It supports both context-specific and global outlier detection strategies, making it versatile for different types of data analysis.

Class Initialization

OutlierDetector(df, analyte_columns, segment_columns=['sex'])

Parameters:

df (pandas.DataFrame): The input DataFrame containing the data to be analyzed.
analyte_columns (list): A list of column names to be analyzed for outliers.
segment_columns (list, optional): A list of column names used for segmentation in context-specific outlier detection. Defaults to ['sex'].

Main Methods

1. perform_outlier_detection

perform_outlier_detection(lower_percentile=0.01, upper_percentile=0.99, method='double_mad', take_log=False)

This is the primary method to perform outlier detection on the given DataFrame.

Parameters:

lower_percentile (float): Lower percentile for cutoff calculation. Default is 0.01.
upper_percentile (float): Upper percentile for cutoff calculation. Default is 0.99.
method (str): Normalization method. Can be 'double_mad' or 'zscore'. Default is 'double_mad'.
take_log (bool): Whether to apply log transformation before normalization. Default is False.

Returns:

A tuple containing two dictionaries:

Context-specific results
Super-global results

2. context_specific_outlier_detection

context_specific_outlier_detection(method='double_mad', take_log=False)

Performs context-specific outlier detection by segmenting the DataFrame based on the segment_columns.

3. super_global_outlier_detection

super_global_outlier_detection(method='double_mad', take_log=False)

Evaluates outliers on a global scale, considering all data points together.

Helper Methods

calculate_double_mad

Calculates left and right Median Absolute Deviations (MADs) from the median.

normalize_series

Normalizes a series using the specified method (double_mad or zscore).

calculate_percentile_cutoffs

Calculates global percentile cutoffs based on the specified columns of a DataFrame.

create_binary_matrix

Creates a binary matrix indicating outliers based on specified cutoffs.

normalize_dataframe

Normalizes specified columns in a DataFrame.

detect_outliers

Detects outliers in the specified columns of a DataFrame.

get_global_cutoffs

Gets global cutoffs for outlier detection.

Usage Example

import pandas as pd
from outlier_detection import OutlierDetector

# Load your data
df = pd.read_csv('your_data.csv')

# Define columns
analyte_columns = ['column1', 'column2', 'column3']
segment_columns = ['sex', 'age_group']

# Create OutlierDetector instance
detector = OutlierDetector(df, analyte_columns, segment_columns)

# Perform outlier detection
context_results, global_results = detector.perform_outlier_detection(
    lower_percentile=0.01,
    upper_percentile=0.99,
    method='double_mad',
    take_log=True
)

# Analyze results
for (segment, value), result in context_results.items():
    print(f"Outliers for {segment}={value}:")
    print(result['binary_matrix'].sum())

print("Global outliers:")
print(global_results[('global', 'global')]['binary_matrix'].sum())

Notes

The class uses logging to provide information and warnings during the outlier detection process.
The tqdm library is used to show progress bars for long-running operations.
The class can handle both context-specific (segmented) and global outlier detection.
Two normalization methods are supported: 'double_mad' (double Median Absolute Deviation) and 'zscore'.
Log transformation can be applied before normalization if needed.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.1.0

Aug 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phenome_outlier_analysis-0.1.0.tar.gz (5.4 kB view details)

Uploaded Aug 12, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

phenome_outlier_analysis-0.1.0-py3-none-any.whl (6.8 kB view details)

Uploaded Aug 12, 2024 Python 3

File details

Details for the file phenome_outlier_analysis-0.1.0.tar.gz.

File metadata

Download URL: phenome_outlier_analysis-0.1.0.tar.gz
Upload date: Aug 12, 2024
Size: 5.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for phenome_outlier_analysis-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fadd5fbd5befc06f2e8f97c232dd3ae003b7da4afc58eded04181cda165fb0b5`
MD5	`bf68e33662e2a91f0f211f18369251c5`
BLAKE2b-256	`da67a4edc5c168a8fdd90d80c40708c66ad12443a9441cab2fc80458031d38ab`

See more details on using hashes here.

File details

Details for the file phenome_outlier_analysis-0.1.0-py3-none-any.whl.

File metadata

Download URL: phenome_outlier_analysis-0.1.0-py3-none-any.whl
Upload date: Aug 12, 2024
Size: 6.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for phenome_outlier_analysis-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`37fce37970dc8e0aa6de056acba5719c8c578750b53aa975d0091dbbaf309f78`
MD5	`aaf8e8ca3974fb754999df418537654c`
BLAKE2b-256	`564b9ccddb69fcf58ca09bc5de16dfd9d7328efdb4afcd16c6a18df799300d5a`

See more details on using hashes here.

phenome-outlier-analysis 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

phenome-outlier-analysis

OutlierDetector Class Documentation

Overview

Class Initialization

Parameters:

Main Methods

1. perform_outlier_detection

Parameters:

Returns:

2. context_specific_outlier_detection

3. super_global_outlier_detection

Helper Methods

calculate_double_mad

normalize_series

calculate_percentile_cutoffs

create_binary_matrix

normalize_dataframe

detect_outliers

get_global_cutoffs

Usage Example

Notes

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes