A modular Python toolkit for analytics workflows, including data processing, visualization, and reusable utilities.
Project description
haashi_pkg
A professional Python toolkit for data analytics workflows — providing modular, well-documented utilities for data ingestion, validation, transformation, visualization, and common development tasks.
Version: 1.0.0
Author: Haashiraaa
Python: ≥ 3.10
Core Stack: pandas, numpy, matplotlib, seaborn, pyarrow
Overview
haashi_pkg provides production-ready, well-documented utilities that streamline common analytics tasks. Designed with clean architecture principles, the package separates concerns into distinct modules while maintaining cohesive workflows.
Perfect for:
- Data pipelines and ETL workflows
- Exploratory data analysis
- Data validation and quality assurance
- Automated reporting and visualization
- Prototype development
- Analytics scripts and notebooks
Key Principles:
- Comprehensive Documentation: Every function fully documented with examples
- Robust Error Handling: Custom exceptions with clear, actionable messages
- Backward Compatible: Deprecated features supported with migration warnings
- Type-Safe: Full type hints throughout
- Production-Ready: Professional code suitable for enterprise use
Package Structure
haashi_pkg/
├── data_engine/ # Data loading, analysis, and saving
│ ├── data_engine.py # Main module (DataAnalyzer, DataLoader, DataSaver)
│ ├── dataengine.py # Deprecated wrapper (backward compatibility)
│ ├── dataloader.py # Deprecated wrapper (backward compatibility)
│ └── datasaver.py # Deprecated wrapper (backward compatibility)
│
├── plot_engine/ # Visualization utilities
│ └── plotengine.py # PlotEngine class for matplotlib/seaborn workflows
│
└── utility/ # Core utilities
└── utils.py # Logger, FileHandler, ScreenUtil, DateTimeUtil, etc.
Features by Module
Data Engine (haashi_pkg.data_engine)
DataAnalyzer (formerly DataEngine)
Core data analysis, validation, and transformation utilities.
Capabilities:
- Inspection: Non-mutating data exploration (head, dtypes, shape, missing counts, duplicates)
- Validation: Column existence, numeric ranges, datetime types, data quality checks
- Type Conversion: Flexible numeric/datetime parsing with error handling
- Normalization: Column names, text values (case/whitespace)
- Missing Data: Summary stats, forward/backward fill, row dropping
- Aggregation: Group-by operations with single or multiple functions
- Joins: Validated merges with relationship checks
DataLoader
Lightweight I/O for loading tabular data.
Supported Formats:
- CSV (single, multiple, chunked streaming)
- Excel (.xlsx via openpyxl)
- Parquet (single, multiple via pyarrow)
Features:
- Automatic delimiter detection (CSV)
- Memory-efficient chunked reading
- Path validation and error handling
- Flexible header/skip row handling
DataSaver
Save DataFrames with validation and compression.
Formats:
- CSV (with index control)
- Parquet (standard and gzip-compressed)
Features:
- Path and extension validation
- Automatic directory creation
- Compression options for Parquet
- Save confirmation logging
Plot Engine (haashi_pkg.plot_engine)
PlotEngine
High-level plotting interface built on matplotlib and seaborn.
Workflow:
- Setup: Create figures and configure layouts
- Draw: Add data visualizations
- Decorate: Apply styling, labels, legends
- Finalize: Save and/or display
Features:
- 4 Color Palettes: Professional, vibrant, soft, deep
- Flexible Layouts: Simple grids or complex custom ratios
- Plot Types: Line, bar, scatter, pie
- Reference Lines: Horizontal/vertical targets and thresholds
- Value Labels: Automatic labeling on bar charts
- Stats Boxes: Create summary statistics displays
- Formatting: Currency, percentages, date axes
- Theming: Seaborn themes with custom backgrounds
Utilities (haashi_pkg.utility)
Modern Classes (Recommended)
Logger
- Console logging with multiple levels (debug, info, warning, error)
- JSON error persistence with automatic rotation
- Integration with ErrorLogger for long-term error tracking
FileHandler
- JSON and text file I/O with validation
- Automatic path creation and permission checks
- Readable/writable path validation
ScreenUtil
- Terminal screen clearing
- Line clearing with delays
- Loading animations
- Text wrapping utilities
DateTimeUtil
- Current time with UTC offset
- Flexible date formatting
ClipboardUtil (Termux/Android only)
- Copy/paste operations via termux-api
Legacy Class (Deprecated)
Utility
- Backward-compatible wrapper combining all utilities
- ⚠️ Deprecated: Will be removed in v2.0.0
- Use modern classes instead for new code
Installation
From Git Repository
# Clone the repository
git clone https://github.com/Haashiraaa/haashi-analytics-toolkit.git
cd haashi-analytics-toolkit/packages
# Install in editable mode (recommended for development)
pip install -e .
# Or install normally
pip install .
Dependencies
Core dependencies are automatically installed:
pandas >= 2.0.0seaborn >= 0.12.0matplotlib >= 3.7.0numpy >= 1.24.0pyarrow >= 12.0.0openpyxl >= 3.1.0
Optional dev dependencies:
pip install -e ".[dev]" # Includes pytest, black, mypy, ruff
Quick Start
Complete Data Pipeline Example
from haashi_pkg.data_engine import DataAnalyzer, DataLoader, DataSaver
from haashi_pkg.plot_engine import PlotEngine
from haashi_pkg.utility import Logger, FileHandler
import logging
# Setup logging and utilities
logger = Logger(level=logging.INFO)
file_handler = FileHandler(logger=logger)
# Load data
loader = DataLoader("sales_data.csv", logger=logger, file_handler=file_handler)
df = loader.load_csv_single()
# Analyze and validate
analyzer = DataAnalyzer(logger=logger)
# Validate structure
analyzer.validate_columns_exist(df, ['customer_id', 'order_date', 'amount'])
# Inspect data quality
print(f"Missing values: {analyzer.count_missing(df, 'amount')}")
print(f"Duplicates: {analyzer.count_duplicates(df, 'customer_id')}")
# Clean and transform
df = analyzer.normalize_column_names(df)
df['order_date'] = analyzer.convert_datetime(df['order_date'])
df['amount'] = analyzer.convert_numeric(df['amount'])
# Validate data quality
analyzer.validate_dates(df, 'order_date')
analyzer.validate_numeric_non_negative(df, 'amount', allow_zero=False)
# Handle missing data
df = analyzer.drop_rows_with_missing(df, ['customer_id', 'order_date', 'amount'])
# Aggregate results
monthly_sales = analyzer.aggregate(
df,
value_col='amount',
group_cols='month',
op='sum'
)
# Create visualization
pe = PlotEngine()
fig, ax = pe.create_figure(figsize=(12, 8))
pe.draw(
ax,
x=monthly_sales.index,
y=monthly_sales.values,
plot_type='bar',
color=pe.colors_01[0],
label='Monthly Sales'
)
pe.decorate(
ax,
title='Monthly Sales Performance',
xlabel='Month',
ylabel='Revenue',
ylim='zero'
)
pe.format_y_axis(ax, currency='$', decimals=0)
pe.add_reference_line(ax, y=50000, label='Target', color='red', linestyle='--')
pe.set_legend(ax)
# Save results
saver = DataSaver(logger=logger, file_handler=file_handler)
saver.save_parquet_compressed(df, "sales_cleaned.parquet")
pe.save_or_show(fig, save_path="sales_chart.png", dpi=300)
# Report
print(f"Rows dropped: {analyzer.dropped_row_count}")
print(f"Total missing values: {analyzer.cumulative_missing}")
Detailed Usage Examples
Data Loading
from haashi_pkg.data_engine import DataLoader
from haashi_pkg.utility import Logger, FileHandler
logger = Logger()
file_handler = FileHandler(logger=logger)
# Single CSV file
loader = DataLoader("data.csv", logger=logger, file_handler=file_handler)
df = loader.load_csv_single()
# Multiple files
loader = DataLoader(
"jan.csv", "feb.csv", "mar.csv",
logger=logger,
file_handler=file_handler
)
dfs = loader.load_csv_many()
combined = pd.concat(dfs, ignore_index=True)
# Large file in chunks (memory efficient)
loader = DataLoader("huge_file.csv", logger=logger, file_handler=file_handler)
for chunk in loader.load_csv_chunk(chunk_size=10000):
process(chunk)
# Excel file
loader = DataLoader("report.xlsx", logger=logger, file_handler=file_handler)
df = loader.load_excel_single(sheet_name='Sales Data')
# Parquet files
loader = DataLoader("data.parquet", logger=logger, file_handler=file_handler)
df = loader.load_parquet_single()
Data Analysis & Validation
from haashi_pkg.data_engine import DataAnalyzer
from haashi_pkg.utility import Logger
logger = Logger()
analyzer = DataAnalyzer(logger=logger)
# Inspect data
analyzer.inspect_dataframe(df, rows=10)
# Check data quality
missing = analyzer.count_missing(df, 'salary')
duplicates = analyzer.count_duplicates(df, 'email')
# Text quality inspection
report = analyzer.inspect_text_formatting(df, 'company_name')
print(report) # JSON report of whitespace/case issues
# Validate requirements
analyzer.validate_columns_exist(df, ['id', 'name', 'email'])
analyzer.validate_numeric_non_negative(df, 'age', allow_zero=False)
analyzer.validate_dates(df, 'birth_date')
# Transform data
df = analyzer.normalize_column_names(df)
df['name'] = analyzer.normalize_text_values(df['name'], method='title')
df['price'] = analyzer.convert_numeric(df['price_string'])
df['date'] = analyzer.convert_datetime(df['date_string'])
# Handle missing data
total, missing, percent = analyzer.missing_summary(df, 'optional_field')
df = analyzer.drop_rows_with_missing(df, ['required1', 'required2'])
df['price'] = analyzer.fill_missing_forward(df['price'])
# Aggregate
revenue_by_region = analyzer.aggregate(
df,
value_col='revenue',
group_cols='region',
op='sum'
)
stats = analyzer.aggregate(
df,
value_col='revenue',
group_cols=['year', 'quarter'],
op=['sum', 'mean', 'count']
)
Visualization
from haashi_pkg.plot_engine import PlotEngine
pe = PlotEngine()
# Simple plot
fig, ax = pe.create_figure(figsize=(10, 6))
pe.draw(ax, x=[1,2,3,4], y=[10,15,13,17], plot_type='line', color='blue')
pe.decorate(ax, title='Trend', xlabel='Time', ylabel='Value')
pe.save_or_show(fig, save_path='trend.png')
# Dashboard with custom grid
fig, gs = pe.create_custom_grid(
rows=2, cols=3,
height_ratios=[2, 1],
width_ratios=[2, 1, 1],
figsize=(20, 12)
)
# Main plot (top, spanning all columns)
ax_main = fig.add_subplot(gs[0, :])
pe.draw(ax_main, x=dates, y=revenue, plot_type='line', linewidth=2)
pe.decorate(ax_main, title='Revenue Trend', ylabel='Revenue ($)')
pe.format_y_axis(ax_main, currency='$')
pe.add_reference_line(ax_main, y=100000, label='Target', color='red')
# Stats box
ax_stats = fig.add_subplot(gs[1, 0])
pe.create_stats_text_box(
ax_stats,
stats={'Total': 1250000, 'Average': 104167, 'Growth': '23.5%'},
title='Key Metrics'
)
# Bar chart
ax_bar = fig.add_subplot(gs[1, 1])
pe.draw(ax_bar, x=categories, y=counts, plot_type='bar')
pe.add_value_labels_on_bars(ax_bar, format_string='{:.0f}')
# Pie chart
ax_pie = fig.add_subplot(gs[1, 2])
pe.draw(ax_pie, x=None, y=distribution, plot_type='pie', labels=labels)
pe.save_or_show(fig, save_path='dashboard.png', dpi=300)
Custom Exceptions & Error Handling
from haashi_pkg.data_engine import (
DataAnalyzer,
DataValidationError,
DataTypeError,
FileLoadError
)
from haashi_pkg.utility import Logger
logger = Logger()
analyzer = DataAnalyzer(logger=logger)
try:
analyzer.validate_columns_exist(df, ['missing_column'])
except DataValidationError as e:
logger.error(f"Validation failed: {e}")
# Handle missing columns
try:
analyzer.validate_numeric_non_negative(df, 'age', allow_zero=False)
except DataValidationError as e:
logger.error(f"Data quality issue: {e}")
# Clean invalid data
try:
df['amount'] = analyzer.convert_numeric(df['amount_str'])
except DataTypeError as e:
logger.error(f"Type conversion failed: {e}")
# Handle conversion errors
Migration Guide (v0.x → v1.0)
What Changed
- Consolidated Modules: Three separate files merged into
data_engine.py - Class Renamed:
DataEngine→DataAnalyzer(more descriptive) - Modern Utilities: Now uses
LoggerandFileHandlerinstead of deprecatedUtility - Custom Exceptions: Replaced generic errors with specific exception types
- Full Documentation: 2000+ lines of comprehensive docstrings added
Backward Compatibility
Deprecated modules remain with warnings:
# OLD CODE - Still works but shows deprecation warning
from haashi_pkg.data_engine.dataengine import DataEngine
de = DataEngine()
de.validate_columns_exist(df, ['id', 'name'])
# ⚠️ DeprecationWarning: Use DataAnalyzer from data_engine instead
Recommended Updates
Option 1: Quick Update (Keep Old Class Name)
# Use new import, keep old name
from haashi_pkg.data_engine import DataEngine # Actually imports DataAnalyzer
de = DataEngine() # No deprecation warning when importing from package
Option 2: Full Modern Update (Recommended)
from haashi_pkg.data_engine import DataAnalyzer, DataLoader, DataSaver
from haashi_pkg.utility import Logger, FileHandler
import logging
logger = Logger(level=logging.INFO)
file_handler = FileHandler(logger=logger)
analyzer = DataAnalyzer(logger=logger)
loader = DataLoader("data.csv", logger=logger, file_handler=file_handler)
saver = DataSaver(logger=logger, file_handler=file_handler)
Deprecation Timeline
- v1.0 (Current): Deprecated features supported with warnings
- v2.0 (Future): Deprecated modules removed
dataengine.pywrapper removeddataloader.pywrapper removeddatasaver.pywrapper removedUtilityclass removed
Action Required: Update imports before v2.0 release.
Exception Hierarchy
# Data Engine Exceptions
DataEngineError # Base exception
├── DataValidationError # Validation failures
├── DataTypeError # Type conversion errors
├── FileLoadError # File loading failures
└── FileSaveError # File saving failures
# Plot Engine Exceptions
PlotEngineError # Base exception
├── InvalidPlotTypeError # Invalid plot type
├── InvalidDataError # Data validation failures
└── ConfigurationError # Setup/config failures
# Utility Exceptions
UtilityError # Base exception
├── FileOperationError # File operation failures
└── ClipboardError # Clipboard failures (Termux)
Contributing
Contributions welcome! Please ensure:
- Documentation: Add docstrings for all public functions
- Type Hints: Use full type annotations
- Error Handling: Raise appropriate custom exceptions
- Tests: Add unit tests for new features
- Non-Mutating: Don't modify inputs unless explicitly documented
- Examples: Include usage examples in docstrings
Roadmap
Planned Features
- Additional file formats (Excel support)
- Enhanced validation rules and custom validators
- More plot types (heatmaps, box plots, violin plots)
- Data profiling utilities
- Integration with cloud storage (S3, GCS)
- Performance benchmarks and optimization
- Comprehensive test suite
- Example notebooks and tutorials
In Progress
- Custom exception hierarchy
- Comprehensive documentation
- Modern utility classes
- Backward compatibility layer
License
This project is licensed under the MIT License.
See the LICENSE file for details.
Acknowledgements
Built with the Python data science stack:
- pandas - Data manipulation and analysis
- NumPy - Numerical computing
- matplotlib - Visualization
- seaborn - Statistical visualization
- PyArrow - Columnar data format
Support
- Issues: GitHub Issues
- Documentation: README
- Repository: GitHub
Made with ❤️ by Haashiraaa
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file haashi_pkg-1.6.0.tar.gz.
File metadata
- Download URL: haashi_pkg-1.6.0.tar.gz
- Upload date:
- Size: 53.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b6d3138e88ec55660678a32d40bad21325a8739c217fc43b78528d9df9b8cc6
|
|
| MD5 |
6788c736db25b9f73c4b9a2243d8cf62
|
|
| BLAKE2b-256 |
1485e3635159eb768e3ac7cda215ecd6acf99ee0ee72ca9a9c5fe085b733c983
|
File details
Details for the file haashi_pkg-1.6.0-py3-none-any.whl.
File metadata
- Download URL: haashi_pkg-1.6.0-py3-none-any.whl
- Upload date:
- Size: 52.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e422fe76b54f0b2a1092fdb50bb0fa9a491cc12c9017c5c1519eca8f86e4121c
|
|
| MD5 |
b4bc868371173198e6de41c4fdd936d3
|
|
| BLAKE2b-256 |
62d8bc854358d659d42417e37231a41d6b65086fc54b3ad65185b28adcb316b4
|