A modular Python toolkit for analytics workflows, including data processing, visualization, and reusable utilities.

These details have not been verified by PyPI

Project links

Project description

haashi_pkg

A professional Python toolkit for data analytics workflows — providing modular, well-documented utilities for data ingestion, validation, transformation, visualization, and common development tasks.

Version: 1.0.0
Author: Haashiraaa
Python: ≥ 3.10
Core Stack: pandas, numpy, matplotlib, seaborn, pyarrow

Overview

haashi_pkg provides production-ready, well-documented utilities that streamline common analytics tasks. Designed with clean architecture principles, the package separates concerns into distinct modules while maintaining cohesive workflows.

Perfect for:

Data pipelines and ETL workflows
Exploratory data analysis
Data validation and quality assurance
Automated reporting and visualization
Prototype development
Analytics scripts and notebooks

Key Principles:

Comprehensive Documentation: Every function fully documented with examples
Robust Error Handling: Custom exceptions with clear, actionable messages
Backward Compatible: Deprecated features supported with migration warnings
Type-Safe: Full type hints throughout
Production-Ready: Professional code suitable for enterprise use

Package Structure

haashi_pkg/
├── data_engine/          # Data loading, analysis, and saving
│   ├── data_engine.py    # Main module (DataAnalyzer, DataLoader, DataSaver)
│   ├── dataengine.py     # Deprecated wrapper (backward compatibility)
│   ├── dataloader.py     # Deprecated wrapper (backward compatibility)
│   └── datasaver.py      # Deprecated wrapper (backward compatibility)
│
├── plot_engine/          # Visualization utilities
│   └── plotengine.py     # PlotEngine class for matplotlib/seaborn workflows
│
└── utility/              # Core utilities
    └── utils.py          # Logger, FileHandler, ScreenUtil, DateTimeUtil, etc.

Features by Module

Data Engine (`haashi_pkg.data_engine`)

DataAnalyzer (formerly DataEngine)

Core data analysis, validation, and transformation utilities.

Capabilities:

Inspection: Non-mutating data exploration (head, dtypes, shape, missing counts, duplicates)
Validation: Column existence, numeric ranges, datetime types, data quality checks
Type Conversion: Flexible numeric/datetime parsing with error handling
Normalization: Column names, text values (case/whitespace)
Missing Data: Summary stats, forward/backward fill, row dropping
Aggregation: Group-by operations with single or multiple functions
Joins: Validated merges with relationship checks

DataLoader

Lightweight I/O for loading tabular data.

Supported Formats:

CSV (single, multiple, chunked streaming)
Excel (.xlsx via openpyxl)
Parquet (single, multiple via pyarrow)

Features:

Automatic delimiter detection (CSV)
Memory-efficient chunked reading
Path validation and error handling
Flexible header/skip row handling

DataSaver

Save DataFrames with validation and compression.

Formats:

CSV (with index control)
Parquet (standard and gzip-compressed)

Features:

Path and extension validation
Automatic directory creation
Compression options for Parquet
Save confirmation logging

Plot Engine (`haashi_pkg.plot_engine`)

PlotEngine

High-level plotting interface built on matplotlib and seaborn.

Workflow:

Setup: Create figures and configure layouts
Draw: Add data visualizations
Decorate: Apply styling, labels, legends
Finalize: Save and/or display

Features:

4 Color Palettes: Professional, vibrant, soft, deep
Flexible Layouts: Simple grids or complex custom ratios
Plot Types: Line, bar, scatter, pie
Reference Lines: Horizontal/vertical targets and thresholds
Value Labels: Automatic labeling on bar charts
Stats Boxes: Create summary statistics displays
Formatting: Currency, percentages, date axes
Theming: Seaborn themes with custom backgrounds

Utilities (`haashi_pkg.utility`)

Modern Classes (Recommended)

Logger

Console logging with multiple levels (debug, info, warning, error)
JSON error persistence with automatic rotation
Integration with ErrorLogger for long-term error tracking

FileHandler

JSON and text file I/O with validation
Automatic path creation and permission checks
Readable/writable path validation

ScreenUtil

Terminal screen clearing
Line clearing with delays
Loading animations
Text wrapping utilities

DateTimeUtil

Current time with UTC offset
Flexible date formatting

ClipboardUtil (Termux/Android only)

Copy/paste operations via termux-api

Legacy Class (Deprecated)

Utility

Backward-compatible wrapper combining all utilities
⚠️ Deprecated: Will be removed in v2.0.0
Use modern classes instead for new code

Installation

From Git Repository

# Clone the repository
git clone https://github.com/Haashiraaa/haashi-analytics-toolkit.git
cd haashi-analytics-toolkit/packages

# Install in editable mode (recommended for development)
pip install -e .

# Or install normally
pip install .

Dependencies

Core dependencies are automatically installed:

pandas >= 2.0.0
seaborn >= 0.12.0
matplotlib >= 3.7.0
numpy >= 1.24.0
pyarrow >= 12.0.0
openpyxl >= 3.1.0

Optional dev dependencies:

pip install -e ".[dev]"  # Includes pytest, black, mypy, ruff

Quick Start

Complete Data Pipeline Example

from haashi_pkg.data_engine import DataAnalyzer, DataLoader, DataSaver
from haashi_pkg.plot_engine import PlotEngine
from haashi_pkg.utility import Logger, FileHandler
import logging

# Setup logging and utilities
logger = Logger(level=logging.INFO)
file_handler = FileHandler(logger=logger)

# Load data
loader = DataLoader("sales_data.csv", logger=logger, file_handler=file_handler)
df = loader.load_csv_single()

# Analyze and validate
analyzer = DataAnalyzer(logger=logger)

# Validate structure
analyzer.validate_columns_exist(df, ['customer_id', 'order_date', 'amount'])

# Inspect data quality
print(f"Missing values: {analyzer.count_missing(df, 'amount')}")
print(f"Duplicates: {analyzer.count_duplicates(df, 'customer_id')}")

# Clean and transform
df = analyzer.normalize_column_names(df)
df['order_date'] = analyzer.convert_datetime(df['order_date'])
df['amount'] = analyzer.convert_numeric(df['amount'])

# Validate data quality
analyzer.validate_dates(df, 'order_date')
analyzer.validate_numeric_non_negative(df, 'amount', allow_zero=False)

# Handle missing data
df = analyzer.drop_rows_with_missing(df, ['customer_id', 'order_date', 'amount'])

# Aggregate results
monthly_sales = analyzer.aggregate(
    df,
    value_col='amount',
    group_cols='month',
    op='sum'
)

# Create visualization
pe = PlotEngine()
fig, ax = pe.create_figure(figsize=(12, 8))

pe.draw(
    ax,
    x=monthly_sales.index,
    y=monthly_sales.values,
    plot_type='bar',
    color=pe.colors_01[0],
    label='Monthly Sales'
)

pe.decorate(
    ax,
    title='Monthly Sales Performance',
    xlabel='Month',
    ylabel='Revenue',
    ylim='zero'
)

pe.format_y_axis(ax, currency='$', decimals=0)
pe.add_reference_line(ax, y=50000, label='Target', color='red', linestyle='--')
pe.set_legend(ax)

# Save results
saver = DataSaver(logger=logger, file_handler=file_handler)
saver.save_parquet_compressed(df, "sales_cleaned.parquet")
pe.save_or_show(fig, save_path="sales_chart.png", dpi=300)

# Report
print(f"Rows dropped: {analyzer.dropped_row_count}")
print(f"Total missing values: {analyzer.cumulative_missing}")

Detailed Usage Examples

Data Loading

from haashi_pkg.data_engine import DataLoader
from haashi_pkg.utility import Logger, FileHandler

logger = Logger()
file_handler = FileHandler(logger=logger)

# Single CSV file
loader = DataLoader("data.csv", logger=logger, file_handler=file_handler)
df = loader.load_csv_single()

# Multiple files
loader = DataLoader(
    "jan.csv", "feb.csv", "mar.csv",
    logger=logger,
    file_handler=file_handler
)
dfs = loader.load_csv_many()
combined = pd.concat(dfs, ignore_index=True)

# Large file in chunks (memory efficient)
loader = DataLoader("huge_file.csv", logger=logger, file_handler=file_handler)
for chunk in loader.load_csv_chunk(chunk_size=10000):
    process(chunk)

# Excel file
loader = DataLoader("report.xlsx", logger=logger, file_handler=file_handler)
df = loader.load_excel_single(sheet_name='Sales Data')

# Parquet files
loader = DataLoader("data.parquet", logger=logger, file_handler=file_handler)
df = loader.load_parquet_single()

Data Analysis & Validation

from haashi_pkg.data_engine import DataAnalyzer
from haashi_pkg.utility import Logger

logger = Logger()
analyzer = DataAnalyzer(logger=logger)

# Inspect data
analyzer.inspect_dataframe(df, rows=10)

# Check data quality
missing = analyzer.count_missing(df, 'salary')
duplicates = analyzer.count_duplicates(df, 'email')

# Text quality inspection
report = analyzer.inspect_text_formatting(df, 'company_name')
print(report)  # JSON report of whitespace/case issues

# Validate requirements
analyzer.validate_columns_exist(df, ['id', 'name', 'email'])
analyzer.validate_numeric_non_negative(df, 'age', allow_zero=False)
analyzer.validate_dates(df, 'birth_date')

# Transform data
df = analyzer.normalize_column_names(df)
df['name'] = analyzer.normalize_text_values(df['name'], method='title')
df['price'] = analyzer.convert_numeric(df['price_string'])
df['date'] = analyzer.convert_datetime(df['date_string'])

# Handle missing data
total, missing, percent = analyzer.missing_summary(df, 'optional_field')
df = analyzer.drop_rows_with_missing(df, ['required1', 'required2'])
df['price'] = analyzer.fill_missing_forward(df['price'])

# Aggregate
revenue_by_region = analyzer.aggregate(
    df,
    value_col='revenue',
    group_cols='region',
    op='sum'
)

stats = analyzer.aggregate(
    df,
    value_col='revenue',
    group_cols=['year', 'quarter'],
    op=['sum', 'mean', 'count']
)

Visualization

from haashi_pkg.plot_engine import PlotEngine

pe = PlotEngine()

# Simple plot
fig, ax = pe.create_figure(figsize=(10, 6))
pe.draw(ax, x=[1,2,3,4], y=[10,15,13,17], plot_type='line', color='blue')
pe.decorate(ax, title='Trend', xlabel='Time', ylabel='Value')
pe.save_or_show(fig, save_path='trend.png')

# Dashboard with custom grid
fig, gs = pe.create_custom_grid(
    rows=2, cols=3,
    height_ratios=[2, 1],
    width_ratios=[2, 1, 1],
    figsize=(20, 12)
)

# Main plot (top, spanning all columns)
ax_main = fig.add_subplot(gs[0, :])
pe.draw(ax_main, x=dates, y=revenue, plot_type='line', linewidth=2)
pe.decorate(ax_main, title='Revenue Trend', ylabel='Revenue ($)')
pe.format_y_axis(ax_main, currency='$')
pe.add_reference_line(ax_main, y=100000, label='Target', color='red')

# Stats box
ax_stats = fig.add_subplot(gs[1, 0])
pe.create_stats_text_box(
    ax_stats,
    stats={'Total': 1250000, 'Average': 104167, 'Growth': '23.5%'},
    title='Key Metrics'
)

# Bar chart
ax_bar = fig.add_subplot(gs[1, 1])
pe.draw(ax_bar, x=categories, y=counts, plot_type='bar')
pe.add_value_labels_on_bars(ax_bar, format_string='{:.0f}')

# Pie chart
ax_pie = fig.add_subplot(gs[1, 2])
pe.draw(ax_pie, x=None, y=distribution, plot_type='pie', labels=labels)

pe.save_or_show(fig, save_path='dashboard.png', dpi=300)

Custom Exceptions & Error Handling

from haashi_pkg.data_engine import (
    DataAnalyzer,
    DataValidationError,
    DataTypeError,
    FileLoadError
)
from haashi_pkg.utility import Logger

logger = Logger()
analyzer = DataAnalyzer(logger=logger)

try:
    analyzer.validate_columns_exist(df, ['missing_column'])
except DataValidationError as e:
    logger.error(f"Validation failed: {e}")
    # Handle missing columns

try:
    analyzer.validate_numeric_non_negative(df, 'age', allow_zero=False)
except DataValidationError as e:
    logger.error(f"Data quality issue: {e}")
    # Clean invalid data

try:
    df['amount'] = analyzer.convert_numeric(df['amount_str'])
except DataTypeError as e:
    logger.error(f"Type conversion failed: {e}")
    # Handle conversion errors

Migration Guide (v0.x → v1.0)

What Changed

Consolidated Modules: Three separate files merged into data_engine.py
Class Renamed: DataEngine → DataAnalyzer (more descriptive)
Modern Utilities: Now uses Logger and FileHandler instead of deprecated Utility
Custom Exceptions: Replaced generic errors with specific exception types
Full Documentation: 2000+ lines of comprehensive docstrings added

Backward Compatibility

Deprecated modules remain with warnings:

# OLD CODE - Still works but shows deprecation warning
from haashi_pkg.data_engine.dataengine import DataEngine

de = DataEngine()
de.validate_columns_exist(df, ['id', 'name'])
# ⚠️ DeprecationWarning: Use DataAnalyzer from data_engine instead

Recommended Updates

Option 1: Quick Update (Keep Old Class Name)

# Use new import, keep old name
from haashi_pkg.data_engine import DataEngine  # Actually imports DataAnalyzer

de = DataEngine()  # No deprecation warning when importing from package

Option 2: Full Modern Update (Recommended)

from haashi_pkg.data_engine import DataAnalyzer, DataLoader, DataSaver
from haashi_pkg.utility import Logger, FileHandler
import logging

logger = Logger(level=logging.INFO)
file_handler = FileHandler(logger=logger)

analyzer = DataAnalyzer(logger=logger)
loader = DataLoader("data.csv", logger=logger, file_handler=file_handler)
saver = DataSaver(logger=logger, file_handler=file_handler)

Deprecation Timeline

v1.0 (Current): Deprecated features supported with warnings
v2.0 (Future): Deprecated modules removed
- dataengine.py wrapper removed
- dataloader.py wrapper removed
- datasaver.py wrapper removed
- Utility class removed

Action Required: Update imports before v2.0 release.

Exception Hierarchy

# Data Engine Exceptions
DataEngineError                 # Base exception
├── DataValidationError         # Validation failures
├── DataTypeError              # Type conversion errors
├── FileLoadError              # File loading failures
└── FileSaveError              # File saving failures

# Plot Engine Exceptions
PlotEngineError                # Base exception
├── InvalidPlotTypeError       # Invalid plot type
├── InvalidDataError           # Data validation failures
└── ConfigurationError         # Setup/config failures

# Utility Exceptions
UtilityError                   # Base exception
├── FileOperationError         # File operation failures
└── ClipboardError            # Clipboard failures (Termux)

Contributing

Contributions welcome! Please ensure:

Documentation: Add docstrings for all public functions
Type Hints: Use full type annotations
Error Handling: Raise appropriate custom exceptions
Tests: Add unit tests for new features
Non-Mutating: Don't modify inputs unless explicitly documented
Examples: Include usage examples in docstrings

Roadmap

Planned Features

Additional file formats (Excel support)
Enhanced validation rules and custom validators
More plot types (heatmaps, box plots, violin plots)
Data profiling utilities
Integration with cloud storage (S3, GCS)
Performance benchmarks and optimization
Comprehensive test suite
Example notebooks and tutorials

In Progress

Custom exception hierarchy
Comprehensive documentation
Modern utility classes
Backward compatibility layer

License

This project is licensed under the MIT License.
See the LICENSE file for details.

Acknowledgements

Built with the Python data science stack:

pandas - Data manipulation and analysis
NumPy - Numerical computing
matplotlib - Visualization
seaborn - Statistical visualization
PyArrow - Columnar data format

Support

Issues: GitHub Issues
Documentation: README
Repository: GitHub

Made with ❤️ by Haashiraaa

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.8.0

Apr 28, 2026

1.6.1

Apr 24, 2026

This version

1.6.0

Apr 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haashi_pkg-1.6.0.tar.gz (53.9 kB view details)

Uploaded Apr 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

haashi_pkg-1.6.0-py3-none-any.whl (52.7 kB view details)

Uploaded Apr 4, 2026 Python 3

File details

Details for the file haashi_pkg-1.6.0.tar.gz.

File metadata

Download URL: haashi_pkg-1.6.0.tar.gz
Upload date: Apr 4, 2026
Size: 53.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for haashi_pkg-1.6.0.tar.gz
Algorithm	Hash digest
SHA256	`4b6d3138e88ec55660678a32d40bad21325a8739c217fc43b78528d9df9b8cc6`
MD5	`6788c736db25b9f73c4b9a2243d8cf62`
BLAKE2b-256	`1485e3635159eb768e3ac7cda215ecd6acf99ee0ee72ca9a9c5fe085b733c983`

See more details on using hashes here.

File details

Details for the file haashi_pkg-1.6.0-py3-none-any.whl.

File metadata

Download URL: haashi_pkg-1.6.0-py3-none-any.whl
Upload date: Apr 4, 2026
Size: 52.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for haashi_pkg-1.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e422fe76b54f0b2a1092fdb50bb0fa9a491cc12c9017c5c1519eca8f86e4121c`
MD5	`b4bc868371173198e6de41c4fdd936d3`
BLAKE2b-256	`62d8bc854358d659d42417e37231a41d6b65086fc54b3ad65185b28adcb316b4`

See more details on using hashes here.

haashi-pkg 1.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

haashi_pkg

Overview

Package Structure

Features by Module

Data Engine (haashi_pkg.data_engine)

DataAnalyzer (formerly DataEngine)

DataLoader

DataSaver

Plot Engine (haashi_pkg.plot_engine)

PlotEngine

Utilities (haashi_pkg.utility)

Modern Classes (Recommended)

Legacy Class (Deprecated)

Installation

From Git Repository

Dependencies

Quick Start

Complete Data Pipeline Example

Detailed Usage Examples

Data Loading

Data Analysis & Validation

Visualization

Custom Exceptions & Error Handling

Migration Guide (v0.x → v1.0)

What Changed

Backward Compatibility

Recommended Updates

Deprecation Timeline

Exception Hierarchy

Contributing

Roadmap

Planned Features

In Progress

License

Acknowledgements

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Data Engine (`haashi_pkg.data_engine`)

Plot Engine (`haashi_pkg.plot_engine`)

Utilities (`haashi_pkg.utility`)