Financial Data Analyst utility toolkit for data cleaning, validation, profiling, and pipelines.

Project description

📊 FDA Toolkit

Financial Data Analysis Made Simple — A production-grade Python toolkit for loading, cleaning, validating, and analyzing financial data with one-line pipelines.

Why FDA Toolkit?

Financial data analysis is messy. You spend 80% of your time cleaning, validating, and transforming data instead of analyzing it. FDA Toolkit eliminates that pain by providing:

67 production-ready functions grouped into 8 intelligent modules
One-line pipelines for common workflows (e.g., ftk.quick_clean_finance())
Finance-aware validation — understand sign conventions, entity names, currency formats
Audit trail — every operation logged for compliance and debugging
Type-safe — full type hints and IDE autocomplete throughout
Memory efficient — optimize dtypes, handle large files with chunking
Professional API — pandas-like, intuitive, well-documented

Module Overview

Module	Functions	Purpose
core	17	Column cleaning, types, duplicates, missing, outliers, text
features	7	Date & categorical feature engineering
finance	11	Currency parsing, entity standardization, financial validation
validation	9	Schema, ranges, integrity, reconciliation
reporting	10	Profiling, snapshots, delta reports, quick checks
io	5	Safe CSV/Excel reading, chunked processing, parquet export
pipelines	2	Pre-built `quick_clean()` and `quick_clean_finance()`
utils	6	Logging, security, memory optimization
TOTAL	67	Production-ready functions

Quick Start

Install

pip install -e .

Use in 3 Lines

import fda_toolkit as ftk

df = ftk.read_csv_safely("data/transactions.csv")
df_clean = ftk.quick_clean_finance(df, primary_key="transaction_id", 
                                   date_cols=["date"], currency_cols=["amount"])
ftk.quick_check(df_clean)  # Profile results

Discover All Functions

# See what's available
ftk.info()  # Browse by category

# Filter by domain
ftk.info(category="Finance")

📚 What's Inside?

Core Data Cleaning (17 functions)

Handle the fundamentals with confidence:

from fda_toolkit.core import columns, duplicates, missing, outliers, text, types

df = columns.clean_column_headers(df)           # 'Name ' → 'name'
df = types.clean_numeric_column(df['amount'])   # '$1,234.56' → 1234.56
df = missing.fill_missing(df, strategy='mean')  # Handle NaN intelligently
df = duplicates.remove_duplicates(df, subset=['id'])
df = outliers.flag_outliers(df, 'amount')       # Flag statistical outliers

Finance-Specific (11 functions)

Domain expertise built-in:

from fda_toolkit.finance import parsing, entities, rules

df['amount'] = parsing.parse_currency(df['amount'])        # Handle $, €, £
df['vendor'] = entities.strip_legal_suffixes(df['vendor']) # ACME Ltd → ACME
rules.validate_sign_conventions(df, rules_config)          # Verify debit/credit

Feature Engineering (7 functions)

Prepare data for ML in seconds:

from fda_toolkit.features import datetime, categorical

df = datetime.extract_date_features(df, 'date')  # Add year, month, quarter
df['category'] = categorical.limit_cardinality(df['category'], top_n=10)

Validation Suite (9 functions)

Catch issues before they become problems:

from fda_toolkit.validation import schema, ranges, integrity

schema.validate_required_fields(df, ['id', 'date', 'amount'])
violations = ranges.validate_data_ranges(df, {'amount': (0, 1_000_000)})
integrity.reconciliation_check(original_df, clean_df, value_cols=['amount'])

Smart Pipelines (2 functions)

Pre-built, battle-tested workflows:

# Generic pipeline
df_clean = ftk.quick_clean(df)

# Finance pipeline (smart defaults for financial data)
df_clean = ftk.quick_clean_finance(
    df,
    primary_key="invoice_id",
    date_cols=["invoice_date", "due_date"],
    currency_cols=["amount", "tax"]
)

Reporting & Profiling (10 functions)

Understand your data instantly:

# Quick diagnosis
ftk.quick_check(df)

# Detailed profile
profile = ftk.profile_report(df)  # Types, missingness, memory, outliers

# Track changes
snapshot_v1 = ftk.snapshot_dataset(df_before, name="before_clean")
snapshot_v2 = ftk.snapshot_dataset(df_after, name="after_clean")
delta = ftk.compare_snapshots(snapshot_v1, snapshot_v2)

Secure I/O (5 functions)

Read and write without surprises:

# Safe reading with encoding detection
df = ftk.read_csv_safely("messy_file.csv")
df = ftk.read_excel_safely("workbook.xlsx", sheet_name="Data")

# Process huge files in chunks
for chunk in ftk.chunked_processing("huge_file.csv", chunksize=50_000):
    process(chunk)

# Export in optimized formats
ftk.export_parquet(df, "output.parquet")  # Fast, compressed

Architecture: Dynamic & Scalable

Every function self-registers via decorator — no manual __all__ lists:

from fda_toolkit.registry import register_function

@register_function(
    name="detect_fraud",
    category="Validation",
    module="custom.fraud"
)
def detect_fraud(df: pd.DataFrame) -> pd.DataFrame:
    """Your custom logic here."""
    result = df[df['amount'] > threshold]
    audit_log("detect_fraud", before=len(df), after=len(result))
    return result

# Automatically appears in ftk.info()!

Audit Trail (Compliance Ready)

Every operation is logged automatically:

from fda_toolkit.utils.logging import get_global_audit_log

log = get_global_audit_log()

for event in log.events:
    print(f"✓ {event.name} at {event.timestamp_utc}")

# Export for compliance teams
audit_json = log.to_dict()  # JSON-ready

💡 Real-World Example

import fda_toolkit as ftk

# 1. Load and diagnose
df = ftk.read_csv_safely("sales_transactions_2024.csv")
ftk.quick_check(df)
# → Reports: types, missing %, duplicates, outliers, memory usage

# 2. Clean for analysis
df_clean = ftk.quick_clean_finance(
    df,
    primary_key="transaction_id",
    date_cols=["date", "due_date"],
    currency_cols=["amount", "tax"]
)

# 3. Validate
from fda_toolkit.validation import integrity
integrity.reconciliation_check(
    original=df, 
    cleaned=df_clean,
    value_cols=["amount"],
    group_cols=["vendor_id"]
)

# 4. Engineer features for ML
df_ml = ftk.extract_date_features(df_clean, "date")
df_ml = ftk.limit_cardinality(df_ml, "vendor", top_n=20)

# 5. Export and log
ftk.export_parquet(df_ml, "ready_for_ml.parquet")
print("✅ Pipeline complete with full audit trail!")

Testing

# Run all tests
pytest

# Run specific module
pytest tests/test_core/

# Verbose output
pytest -v

Example test:

import pandas as pd
from fda_toolkit.core.columns import clean_column_headers

def test_clean_headers():
    df = pd.DataFrame({'Name ': [1], 'Age (years)': [2]})
    result = clean_column_headers(df)
    assert result.columns.tolist() == ['name', 'age_years']

Installation & Development

From Source

# Clone or download
cd fda_toolkit_project

# Install in editable mode (dev)
pip install -e .

# With dev dependencies (if available)
pip install -e ".[dev]"

Requirements

Python 3.9+
pandas (data manipulation)
numpy (numerical operations)

Security & Compliance

Audit logging — Every operation tracked with timestamps
Data masking — mask_sensitive_fields() for PII protection
Type safety — Full type hints prevent common errors
Error handling — Clear, actionable error messages
Memory optimization — Control data footprint

📖 API Reference

Explore the full API:

ftk.info()                           # List all functions
ftk.info(category="Finance")         # Filter by domain
ftk.get_data_summary(df)            # Profile a dataset
ftk.profile_report(df)              # Detailed analysis

For detailed docs on each function:

from fda_toolkit.core.outliers import detect_outliers_iqr
help(detect_outliers_iqr)  # Full docstring with examples

See QUICK_REFERENCE.md for common patterns.

🎯 Use Cases

✅ Financial Reporting — Prepare data for compliance audits
✅ ML Pipelines — Clean & engineer features for models
✅ Data Migration — Validate and transform during transfers
✅ Anomaly Detection — Flag outliers in transactions
✅ Time Series Analysis — Extract date features automatically
✅ Data Quality Monitoring — Profile and compare snapshots

🚀 Next Steps

Explore functions: ftk.info()
Try examples: See examples/01_quick_check.py
Read docs: docs/function_reference.md
Run tests: pytest
Extend: Add your own functions using @register_function

📝 License

MIT License — see LICENSE for details.

🤝 Contributing

Found a bug? Have an idea? Open an issue or PR!

Built for financial analysts who value time, accuracy, and peace of mind. 📊✨

FDA Toolkit: Where data cleaning stops being painful and starts being productive.

Project details

Release history Release notifications | RSS feed

0.2.8

Jan 22, 2026

0.2.7

Jan 21, 2026

0.2.6

Jan 21, 2026

0.2.5

Jan 21, 2026

0.2.4

Jan 21, 2026

This version

0.2.3

Jan 21, 2026

0.2.2

Jan 21, 2026

0.2.1

Jan 21, 2026

0.2.0

Jan 21, 2026

0.1.0

Jan 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fda_toolkit-0.2.3.tar.gz (41.4 kB view details)

Uploaded Jan 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fda_toolkit-0.2.3-py3-none-any.whl (54.8 kB view details)

Uploaded Jan 21, 2026 Python 3

File details

Details for the file fda_toolkit-0.2.3.tar.gz.

File metadata

Download URL: fda_toolkit-0.2.3.tar.gz
Upload date: Jan 21, 2026
Size: 41.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for fda_toolkit-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`0e1777ebec6ef4a3fc226fd74a04631acfa4c580c1193c9f5bf85e2a2acc7910`
MD5	`74f6ceb1db233e7778d47f92c951ed44`
BLAKE2b-256	`1d9e1a941c3aab3e3e8c8056d2345d6e581b220c740f1d9431fbe4b24a7f1500`

See more details on using hashes here.

File details

Details for the file fda_toolkit-0.2.3-py3-none-any.whl.

File metadata

Download URL: fda_toolkit-0.2.3-py3-none-any.whl
Upload date: Jan 21, 2026
Size: 54.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for fda_toolkit-0.2.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4dbfde459eae5496f1681f4d87f74afe17ba0b9a411777001708bc97ef6a4969`
MD5	`eebbb897f562cbc87d0c3ed81ba11801`
BLAKE2b-256	`c81c3da32bb4cab7fbcb50dcb9f0dfb89a72f4c3543eacfa56255f2654689c33`

See more details on using hashes here.

fda-toolkit 0.2.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Meta

Project description

📊 FDA Toolkit

Why FDA Toolkit?

Module Overview

Quick Start

Install

Use in 3 Lines

Discover All Functions

📚 What's Inside?

Core Data Cleaning (17 functions)

Finance-Specific (11 functions)

Feature Engineering (7 functions)

Validation Suite (9 functions)

Smart Pipelines (2 functions)

Reporting & Profiling (10 functions)

Secure I/O (5 functions)

Architecture: Dynamic & Scalable

Audit Trail (Compliance Ready)

💡 Real-World Example

Testing

Installation & Development

From Source

Requirements

Security & Compliance

📖 API Reference

🎯 Use Cases

🚀 Next Steps

📝 License

🤝 Contributing

Project details

Verified details

Maintainers

Meta

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes