AlertManager is an open-source Python library designed to streamline and enhance data validation processes for both local Pandas DataFrames and database tables.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

AlertManager

Overview

AlertManager is an open-source Python library designed to streamline and enhance data validation processes for both local Pandas DataFrames and database tables. It provides a suite of decorators that seamlessly integrate data validation checks into your data processing functions. By automating common validation tasks such as range checks, value checks, statistical outlier detection, and custom logic application, AlertManager helps maintain data integrity and quality throughout your data pipeline.

The library consists of two main components:

LocalValidator: For validating data within Pandas DataFrames.
DatabaseValidator: For validating data directly within database tables using SQLAlchemy.

With AlertManager, you can configure flexible alerting and logging options, ensuring that data anomalies are caught and handled appropriately before they impact your system.

Features
Installation
- Using pip
- Using conda
Usage
- 1. LocalValidator
- 2. DatabaseValidator
Configuration Options
- Example Initialization with Custom Configuration
  - LocalValidator
  - DatabaseValidator
Best Practices and Detailed Explanations
Alert Management
Error Handling
Conclusion
Contribution and Support
License

Features

Seamless Integration with Decorators: Apply validations directly to your data processing functions using decorators, making data checks an integral part of your workflow.
Range Checks: Validate that numeric values fall within specified ranges in DataFrames or database tables.
Value Checks: Ensure that values in a column are within an allowed set or not within a disallowed set.
Statistical Outlier Detection: Detect outliers in both continuous and discrete data using appropriate statistical methods, with customizable sensitivity levels.
Custom Validation Logic: Implement custom validation rules using query strings or callable functions for specialized data checks.
Database Support: Validate data directly in your databases using SQLAlchemy, supporting various database backends.
Flexible Alerting and Logging: Store validation results with options for historical logging, unified or separate files, and multiple file formats (csv, xlsx, pkl, txt).
Configurable Alert Management: Control how and where alerts are stored, and specify identifiers for easy tracking of validation issues.

By incorporating AlertManager into your data processing pipeline, you can proactively manage data quality issues, reduce the risk of errors, and maintain high standards of data integrity.

Installation

You can install AlertManager using either pip or conda.

Using pip

If AlertManager is available on PyPI, you can install it directly:

pip install AlertManager

Using conda

If you prefer using conda, you can create a new environment and install AlertManager:

conda install -c conda-forge AlertManager

Usage

AlertManager provides two main classes:

LocalValidator: For validating Pandas DataFrames.
DatabaseValidator: For validating data in database tables.

Each class offers the same set of validation decorators:

range_check
value_check
statistical
custom_check

1. LocalValidator

Initialization

To use LocalValidator, import the class and initialize it:

import pandas as pd
from AlertManager import LocalValidator  # Replace with the correct import path

# Initialize the LocalValidator
AlertManager = LocalValidator(
store=True,
history=False,
united=True,
identifier='id',
path='./validation_logs',
file_type='csv'
)

Parameters:

store (bool, default False): Whether to store validation results.
history (bool, default False): Whether to store logs with historical data (creates subdirectories based on date).
united (bool, default True): Whether to store all validations in one file (True) or separately (False).
identifier (str, optional): Column name to identify rows (e.g., primary key).
path (str, default './validation logs'): Directory path where logs will be stored.
file_type (str, default 'pkl'): The file format for storing validation results. Options are 'csv', 'xlsx', 'pkl', 'txt'.

Decorators

1. range_check

Validates that the values in a specified column fall within given ranges.

@AlertManager.range_check(column='column_name', borders=[(lower_bound, upper_bound)], name='Validation Name')
def your_function(df):
    # Your data processing logic
    return df

Parameters:

column (str): The column in the DataFrame to be validated.
borders (list of tuple): A list of tuples, each containing two numeric values representing the lower and upper bounds.
name (str): A name for the validation, used in logging.

2. value_check

Validates that the values in a specified column are either allowed or not allowed.

@AlertManager.value_check(column='column_name', allowed=['value1', 'value2'], not_allowed=['value3'], name='Validation Name')
def your_function(df):
  # Your data processing logic
  return df

Parameters:

column (str): The column in the DataFrame to be validated.
allowed (list, optional): A list of allowed values for the column.
not_allowed (list, optional): A list of not allowed values for the column.
name (str): A name for the validation, used in logging.

3. statistical

Applies statistical outlier detection on a DataFrame column.

@AlertManager.statistical(column='column_name', name='Validation Name', sensitivity='medium', data_type='continuous')
def your_function(df):
  # Your data processing logic
  return df

Parameters:

column (str): The column in the DataFrame to be validated.
name (str): A name for the validation, used in logging.
sensitivity (str, default 'medium'): Adjusts the strictness of outlier detection. Options are 'sensitive', 'medium', 'insensitive'.
data_type (str, optional): Specify 'continuous' or 'discrete'. If None, the type will be inferred.

4. custom_check

Applies custom validation logic on a DataFrame.

@AlertManager.custom_check(custom_logic='column_name > value', name='Validation Name')
def your_function(df):
  # Your data processing logic
  return df

Parameters:

custom_logic (str or callable): The custom logic for validation, can be a query string or a function.
name (str): A name for the validation, used in logging.

Examples for Each Decorator

Assuming you have the following sample DataFrame:

import pandas as pd

data = {
'id': range(1, 11),
'age': [25, 38, 17, 120, 29, 41, -5, 30, 22, 300],
'status': ['active', 'inactive', 'active', 'pending', 'active', 'inactive', 'unknown', 'active', 'active', 'inactive'],
'salary': [50000, 60000, 55000, 70000, 65000, 52000, 48000, 1000000, 59000, 61000],
'gender_id': [1, 2, 1, 2, 1, 3, 1, 2, 2, 2]
}

df = pd.DataFrame(data)

Example 1: range_check

Validate that the age column values are between 0 and 120.

@AlertManager.range_check(column='age', borders=[(0, 120)], name='Age Range Check')
def process_data(df):
  # Data processing logic
  return df

df = process_data(df)

Explanation: Any records where age is less than 0 or greater than 120 will be flagged and stored according to the AlertManager configuration.

Example 2: value_check

Ensure that status is either 'active' or 'inactive'.

@AlertManager.value_check(column='status', allowed=['active', 'inactive'], name='Status Value Check')
def process_data(df):
  # Data processing logic
  return df

df = process_data(df)

Explanation: Records with a status of 'pending' or 'unknown' will be flagged.

Example 3: statistical

Detect outliers in the salary column, which is continuous data.

@AlertManager.statistical(column='salary', name='Salary Outlier Check', sensitivity='medium', data_type='continuous')
def process_data(df):
  # Data processing logic
  return df

df = process_data(df)

Explanation: Uses the z-score method to detect outliers based on the sensitivity level.

For discrete data:

@AlertManager.statistical(column='gender_id', name='Gender ID Outlier Check', sensitivity='medium', data_type='discrete')
def process_data(df):
  # Data processing logic
  return df

df = process_data(df)

Explanation: Identifies rarely occurring gender_id values as outliers using frequency-based detection.

Example 4: custom_check

Using a query string:

@AlertManager.custom_check(custom_logic='age < 0 or age > 100', name='Custom Age Check')
def process_data(df):
  # Data processing logic
  return df

df = process_data(df)

Explanation: Flags records where age is less than 0 or greater than 100.

Using a callable function:

def custom_logic(df):
    return df[(df['salary'] < 30000) | (df['salary'] > 200000)]

@AlertManager.custom_check(custom_logic=custom_logic, name='Custom Salary Check')
def process_data(df):
  # Data processing logic
  return df

df = process_data(df)

Explanation: Flags records where salary is less than 30,000 or greater than 200,000.

2. DatabaseValidator

Initialization

To use DatabaseValidator, import the class and initialize it with a database connection string:

from AlertManager import DatabaseValidator  # Replace with the correct import path

# Database connection string
connection_string = 'sqlite:///my_database.db'  # Replace with your actual connection string

# Initialize the DatabaseValidator
alert_manager_db = DatabaseValidator(
connection_string=connection_string,
table_name='users',  # Replace with your actual table name
schema=None,  # Replace with your schema if necessary
store=True,
history=False,
united=True,
identifier='id',
path='./validation_logs',
file_type='csv'
)

Parameters:

connection_string (str): The database connection string.
table_name (str): The name of the database table to be validated.
schema (str, optional): The schema of the table in the database.
store (bool, default False): Whether to store validation results.
history (bool, default False): Whether to store logs with historical data.
united (bool, default True): Whether to store all validations in one file or separately.
identifier (str, optional): Column name to identify rows.
path (str, default './validation_logs'): Directory path where logs will be stored.
file_type (str, default 'pkl'): The file format for storing validation results.

Decorators

1. range_check

Validates that the values in a specified column fall within given ranges.

@alert_manager_db.range_check(column='column_name', borders=[(lower_bound, upper_bound)], name='Validation Name')
  def your_function():
  # Your data processing logic
  pass

Parameters:

column (str): The column in the table to be validated.
borders (list of tuple): A list of tuples, each containing two numeric values representing the lower and upper bounds.
name (str): A name for the validation, used in logging.

2. value_check

Validates that the values in a specified column are either allowed or not allowed.

@alert_manager_db.value_check(column='column_name', allowed=['value1', 'value2'], not_allowed=['value3'], name='Validation Name')
  def your_function():
  # Your data processing logic
  pass

Parameters:

column (str): The column in the table to be validated.
allowed (list, optional): A list of allowed values for the column.
not_allowed (list, optional): A list of not allowed values for the column.
name (str): A name for the validation, used in logging.

3. statistical

Applies statistical outlier detection on a database table column.

@alert_manager_db.statistical(column='column_name', name='Validation Name', sensitivity='medium', data_type='continuous')
def your_function():
  # Your data processing logic
  pass

Parameters:

column (str): The column in the table to be validated.
name (str): A name for the validation, used in logging.
sensitivity (str, default 'medium'): Adjusts the strictness of outlier detection.
data_type (str, optional): Specify 'continuous' or 'discrete'.

4. custom_check

Applies custom validation logic on a database table.

@alert_manager_db.custom_check(custom_logic='column_name > value', name='Validation Name')
def your_function():
  # Your data processing logic
  pass

Parameters:

custom_logic (str or callable): The custom logic for validation.
name (str): A name for the validation, used in logging.

Examples for Each Decorator

Assuming you have a database table named users with columns similar to the sample DataFrame.

Example 1: range_check

Validate that the age column values are between 0 and 120.

@alert_manager_db.range_check(column='age', borders=[(0, 120)], name='Age Range Check')
def update_database():
  # Database update logic
  pass

update_database()

Explanation: Any records where age is less than 0 or greater than 120 will be flagged.

Example 2: value_check

Ensure that status is either 'active' or 'inactive'.

@alert_manager_db.value_check(column='status', allowed=['active', 'inactive'], name='Status Value Check')
def update_database():
  # Database update logic
  pass

update_database()

Explanation: Records with a status of 'pending' or 'unknown' will be flagged.

Example 3: statistical

Detect outliers in the salary column.

@alert_manager_db.statistical(column='salary', name='Salary Outlier Check', sensitivity='medium', data_type='continuous')
def update_database():
  # Database update logic
  pass

update_database()

Explanation: Uses the z-score method to detect outliers based on the sensitivity level.

Example 4: custom_check

Using a query string:

@alert_manager_db.custom_check(custom_logic='age < 0 or age > 100', name='Custom Age Check')
def update_database():
  # Database update logic
  pass

update_database()

Explanation: Flags records where age is less than 0 or greater than 100.

Configuration Options

AlertManager allows you to customize how and where alerts are stored.

Initialization Parameters:

store (bool): Enable or disable the storing of validation results.
history (bool): Enable historical logging by creating subdirectories based on the date.
united (bool): Store all validations in a single file (True) or separate files (False).
identifier (str): Specify a column to identify rows (e.g., primary key).
path (str): Directory path where logs will be stored.
file_type (str): Format for storing validation results ('csv', 'xlsx', 'pkl', 'txt').

Example Initialization with Custom Configuration

LocalValidator

AlertManager = LocalValidator(
store=True,
history=True,
united=False,
identifier='id',
path='./my_alert_logs',
file_type='csv'
)

DatabaseValidator

alert_manager_db = DatabaseValidator(
connection_string=connection_string,
table_name='users',
store=True,
history=True,
united=False,
identifier='id',
path='./my_alert_logs',
file_type='csv'
)

Best Practices and Detailed Explanations

Decorator Usage

Function Positioning: Place the decorator directly above the function definition you wish to apply the validation to.
Multiple Decorators: You can stack multiple decorators on a single function to apply multiple validations.

@AlertManager.range_check(column='age', borders=[(0, 120)], name='Age Range Check')
@AlertManager.value_check(column='status', allowed=['active', 'inactive'], name='Status Value Check')
def process_data(df):
  # Data processing logic
  return df

Execution Order: Decorators are applied from the bottom up. In the example above, value_check will execute before range_check.

Statistical Outlier Detection Sensitivity

Sensitivity Levels:

'sensitive': More strict, flags more data points as outliers.
'medium': Balanced approach.
'insensitive': Less strict, flags fewer data points.

Data Type Specification:

Specify data_type as 'continuous' or 'discrete' to ensure the correct outlier detection method is applied.
If data_type is None, AlertManager will attempt to infer the type based on the data.

Custom Validation Logic

Query Strings: Use Pandas query syntax for straightforward conditions.
Callable Functions: Define complex logic in a function that accepts a DataFrame and returns a DataFrame or Series of invalid rows.

Alert Management

Identifier Usage: Use the identifier parameter to store only essential information in your logs, making it easier to track and address issues.
Historical Logging: Enable history to maintain logs over time, which can be useful for monitoring data quality trends.
Unified vs. Separate Logs:
- Unified (united=True): All validation results are stored in a single file.
- Separate (united=False): Each validation result is stored in a separate file, named after the validation.

Error Handling

Missing Columns: If a specified column is not found in the DataFrame or database table, AlertManager will raise a ValueError.
Type Checking: AlertManager performs type checking on parameters to help prevent misconfiguration.
Database Connections: Ensure that your database connection string is correct and that the necessary database drivers are installed.

Conclusion

AlertManager is a powerful tool for integrating data validation into your data processing workflows, whether you're working with local Pandas DataFrames or directly with database tables. By providing decorators for common validation tasks and flexible alert management options, it helps ensure data integrity and facilitates proactive handling of data anomalies.

By adopting AlertManager, you can:

Reduce the risk of data errors propagating through your system.
Maintain high data quality standards.
Efficiently manage and track data validation alerts.
Integrate validations seamlessly into existing codebases and database operations.

Contribution and Support

Contributions to AlertManager are welcome. If you encounter any issues or have suggestions for improvements, please submit an issue or a pull request on the GitHub repository.

AlertManager on GitHub

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.2

Oct 30, 2024

1.0.1

Oct 30, 2024

1.0.0

Oct 30, 2024

0.0.3

Sep 29, 2024

0.0.2

Sep 29, 2024

0.0.1

Aug 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

AlertManager-1.0.2.tar.gz (16.8 kB view details)

Uploaded Oct 30, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

AlertManager-1.0.2-py3-none-any.whl (12.8 kB view details)

Uploaded Oct 30, 2024 Python 3

File details

Details for the file AlertManager-1.0.2.tar.gz.

File metadata

Download URL: AlertManager-1.0.2.tar.gz
Upload date: Oct 30, 2024
Size: 16.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for AlertManager-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`0fd3039e603aee9491939efa46dd5a8ea5d639b898bf0e0afc94155eb38489e8`
MD5	`9cfae8b55938edb3b430a30977362ee5`
BLAKE2b-256	`74d229bca63e1a86c9ee06020e9d6cd3b315cd48f24c87e0491e0bf4b8ce73ee`

See more details on using hashes here.

File details

Details for the file AlertManager-1.0.2-py3-none-any.whl.

File metadata

Download URL: AlertManager-1.0.2-py3-none-any.whl
Upload date: Oct 30, 2024
Size: 12.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for AlertManager-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3ef7e90875b7e06cf7a4846851f75b15b65415bb228e89e5806db79d380b35df`
MD5	`4b255469de9dc77203cb0d52be6019b1`
BLAKE2b-256	`df3b3602ca828a944f9642890dd01465bb8113278282227eb7621dee6e6738b3`

See more details on using hashes here.

AlertManager 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AlertManager

Overview

Table of Contents

Features

Installation

Using pip

Using conda

Usage

1. LocalValidator

Initialization

Decorators

1. range_check

2. value_check

3. statistical

4. custom_check

Examples for Each Decorator

Example 1: range_check

Example 2: value_check

Example 3: statistical

Example 4: custom_check

2. DatabaseValidator

Initialization

Decorators

1. range_check

2. value_check

3. statistical

4. custom_check

Examples for Each Decorator

Example 1: range_check

Example 2: value_check

Example 3: statistical

Example 4: custom_check

Configuration Options

Example Initialization with Custom Configuration

Best Practices and Detailed Explanations

Decorator Usage

Statistical Outlier Detection Sensitivity

Custom Validation Logic

Alert Management

Error Handling

Conclusion

Contribution and Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes