Skip to main content

A simple data analysis and visualization toolkit created for use on google colaboratory or other Notebooks such as jupyter. This Python Package will help reduce repetitive notebook code and help users quickly explore, validate, and present data

Project description

dakitlab

Professional Tables, Statistical Summaries, Data Validation, and Data Health Reports for Python Dataframes.

Built for Data Analysts, Data Scientists, Researchers, Educators, and Students.


Overview

dakitlab is a Python package designed to reduce repetitive notebook code and help users quickly explore, validate, and present data.

The current release focuses on a powerful Table class that provides:

  • Professional Plotly-powered tables
  • Interactive dataframe viewing
  • Statistical summary reports
  • Data integrity validation
  • Data health assessment reports
  • Support for multiple dataframe libraries

Whether you are working in:

  • Google Colab
  • Jupyter Notebook
  • JupyterLab
  • Kaggle Notebooks
  • VS Code Notebooks

dakitlab helps you spend less time writing boilerplate code and more time understanding your data.


Installation

pip install dakitlab

Import

from dakitlab import Table

Supported Dataframe Libraries

The Table class automatically accepts:

Library Supported
Pandas
Polars

Internally, data is converted when necessary so users can work with their preferred dataframe library.


Example Dataset

Examples throughout this README use an environmental monitoring dataset containing:

  • Latitude
  • Longitude
  • PM10
  • PM2.5
  • Carbon Monoxide
  • Nitrogen Dioxide
  • Ozone
  • Dust
  • UV Index
  • European AQI
  • Hazardous Event
import pandas as pd

df = pd.read_csv("environmental_data.csv")

Quick Start

from dakitlab import Table

table = Table(
    df,
    title="Environmental Monitoring Data"
)

table.show()

Creating a Table

Basic Table

table = Table(df)

Table With Title

table = Table(
    df,
    title="Environmental Monitoring Data"
)

Table With Custom Headers

table = Table(
    df,
    header_names=[
        "Latitude",
        "Longitude",
        "PM10",
        "PM2.5",
        "CO",
        "NO₂",
        "Ozone",
        "Dust",
        "UV",
        "AQI",
        "Hazard"
    ]
)

Display Methods

show()

Display the dataframe as a professional Plotly table.

table.show()

Custom caption:

table.show(
    caption="Air Quality Monitoring Results"
)

display()

Full display control.

table.display(
    filename="environmental_table",
    max_rows=500,
    show_index=False
)

interactive()

Displays the dataframe using an interactive notebook table.

table.interactive()

Specify rows per page:

table.interactive(
    rows_per_page=50
)

Layout Customization

set_layout()

Control title alignment, dimensions, margins, and column widths.

table.set_layout(
    title="Environmental Monitoring Data",
    title_align="center",
    width=1200,
    height=700
)

Advanced example:

table.set_layout(
    width=1400,
    height=800,
    header_height=50,
    cell_height=35,
    margin={
        "l":20,
        "r":20,
        "t":80,
        "b":20
    },
    column_widths=[
        150,150,120,120,120,
        120,120,120,100,100,120
    ]
)

Header Styling

set_header_style()

table.set_header_style(
    fillcolor="#1f2937",
    textcolor="white",
    align="center",
    fontsize=14,
    bold=True
)

Supported fonts:

  • Arial
  • Calibri
  • Helvetica
  • Times New Roman
  • Courier New
  • Verdana

Cell Styling

set_cell_style()

table.set_cell_style(
    fillcolor=["#ffffff", "#f9fafb"],
    textcolor="#111827",
    align="left",
    fontsize=12
)

Alternating row colors:

table.set_cell_style(
    fillcolor=[
        "#ffffff",
        "#f3f4f6"
    ]
)

Global Styling

set_global_style()

table.set_global_style(
    paper_bgcolor="#f3f4f6"
)

Statistical Summary Reports

The stats() method generates descriptive statistics and EDA summaries.


Basic Statistics

table.stats()

Returns:

  • Count
  • Missing values
  • Missing %
  • Unique values
  • Mean
  • Standard deviation
  • Min
  • Max
  • Range
  • Coefficient of variation

Full Statistics

table.stats(mode="full")

Adds:

  • Median
  • Variance
  • Quartiles
  • IQR
  • Outlier counts
  • Outlier percentages
  • Skewness
  • Distribution shape
  • Status indicators

Select Specific Columns By Index

table.stats(
    columns=[2,3,4]
)

Select Specific Columns By Name

table.stats(
    columns=[
        "PM10_ug_m3",
        "PM2_5_ug_m3",
        "European_AQI"
    ]
)

Full Statistics For Selected Columns

table.stats(
    columns=[
        "PM10_ug_m3",
        "PM2_5_ug_m3"
    ],
    mode="full"
)

Round Output

table.stats(
    round_digits=2
)

Data Integrity Validation

The integrity() method validates data using user-defined rules.


Basic Integrity Check

table.integrity()

Uses built-in checks only.


Validate Selected Columns

Using column indexes:

table.integrity(
    columns=[0,1,2]
)

Using column names:

table.integrity(
    columns=[
        "Latitude",
        "Longitude"
    ]
)

Creating Rules

Rules are defined as a dictionary.

Example:

rules = {

    "Latitude": {
        "required": True,
        "dtype": "numeric",
        "min": -90,
        "max": 90
    },

    "Longitude": {
        "required": True,
        "dtype": "numeric",
        "min": -180,
        "max": 180
    }

}

Run:

table.integrity(
    rules=rules
)

Supported Rules

Required Values

{
    "required": True
}

Unique Values

{
    "unique": True
}

Data Type Validation

{
    "dtype": "numeric"
}

Supported:

numeric
text
boolean
date
datetime

Numeric Range Validation

{
    "min": 0,
    "max": 100
}

Positive Values

{
    "positive": True
}

Non-Negative Values

{
    "non_negative": True
}

Allowed Values

{
    "allowed": [0,1]
}

Minimum Length

{
    "min_length": 3
}

Maximum Length

{
    "max_length": 50
}

Alphabetic Only

{
    "isalpha": True
}

Example:

Andrew
Alice
Bob

Numeric Only

{
    "isnumeric": True
}

Example:

12345
67890

Alphanumeric Only

{
    "isalnum": True
}

Example:

ABC123
Student01

Regular Expressions

Email validation:

{
    "regex": r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"
}

Phone validation:

{
    "regex": r"^[0-9\-\+\(\) ]+$"
}

Allowed Characters

{
    "allowed_chars": "A-Za-z0-9_"
}

Allows:

abc
ABC
123
student_01

Complete Integrity Example

rules = {

    "Latitude": {
        "required": True,
        "dtype": "numeric",
        "min": -90,
        "max": 90
    },

    "Longitude": {
        "required": True,
        "dtype": "numeric",
        "min": -180,
        "max": 180
    },

    "Hazardous_Event": {
        "allowed": [0,1]
    },

    "Station_Name": {
        "required": True,
        "min_length": 3,
        "max_length": 50
    }

}

table.integrity(
    rules=rules
)

Data Health Reports

The data_health() method provides a high-level overview of dataset quality.

table.data_health()

What Is Included?

  • Dataset health score
  • Missing value analysis
  • Missing rows report
  • Duplicate row detection
  • Rows requiring attention
  • Severity classification

Limit Problem Rows Displayed

table.data_health(
    max_problem_rows=25
)

Hide Problem Rows

table.data_health(
    show_problem_rows=False
)

Complete Workflow Example

from dakitlab import Table

table = Table(
    df,
    title="Environmental Monitoring Data"
)

table.show()

table.stats(
    columns=[
        "PM10_ug_m3",
        "PM2_5_ug_m3",
        "European_AQI"
    ],
    mode="full"
)

rules = {
    "Latitude": {
        "required": True,
        "min": -90,
        "max": 90
    },

    "Longitude": {
        "required": True,
        "min": -180,
        "max": 180
    }
}

table.integrity(rules=rules)

table.data_health()

Current Public Methods

Method Description
Table() Create a table object
show() Display styled table
display() Advanced table display
interactive() Interactive dataframe view
set_layout() Layout customization
set_header_style() Header customization
set_cell_style() Cell customization
set_global_style() Global styling
stats() Statistical summary reports
integrity() Rule-based validation
data_health() Dataset health assessment

Roadmap

Planned future classes:

  • Summary
  • CompareFrames
  • Cleaner
  • SchemaValidator
  • QuickPlot
  • CorrelationMap
  • DistributionGrid
  • Report
  • Snapshot
  • Profiler

License

MIT License


Author

Andrew Benyeogor Osenwe

Built for practical data analysis, exploratory data analysis, and notebook productivity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dakitlab-0.0.2.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dakitlab-0.0.2-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file dakitlab-0.0.2.tar.gz.

File metadata

  • Download URL: dakitlab-0.0.2.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.17.0 {"ci":null,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.13.0"},"installer":{"name":"hatch","version":"1.17.0"},"openssl_version":"OpenSSL 3.0.15 3 Sep 2024","python":"3.13.0","system":{"name":"Windows","release":"11"}} HTTPX2/2.3.0

File hashes

Hashes for dakitlab-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4e5296e02ba248b93f429aa173c7eb3122407bb0376f6d0de0338f09e49293f1
MD5 465bd524ad4132b154a7d988a9b0df90
BLAKE2b-256 15f392b65b1840ab5151f1eb6550399e29d03b635ea830f408fd199dbc633e6c

See more details on using hashes here.

File details

Details for the file dakitlab-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: dakitlab-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Hatch/1.17.0 {"ci":null,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.13.0"},"installer":{"name":"hatch","version":"1.17.0"},"openssl_version":"OpenSSL 3.0.15 3 Sep 2024","python":"3.13.0","system":{"name":"Windows","release":"11"}} HTTPX2/2.3.0

File hashes

Hashes for dakitlab-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c7f6a60190303a468564f1b3528f8dbc66d04d815e6e01887136f8f6418deee2
MD5 c8c29f805667f75eb3b8ac82e3124070
BLAKE2b-256 f2daefbf2bcf720d52a55eeff46d8027d838e3e14cc1f4f18505071a4aa50064

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page