Skip to main content

A simple data analysis and visualization toolkit created for use on google colaboratory or other Notebooks such as jupyter. This Python Package will help reduce repetitive notebook code and help users quickly explore, validate, and present data

Project description

dakitlab

Professional Tables, Statistical Summaries, Data Validation, and Data Health Reports for Python Dataframes.

Built for Data Analysts, Data Scientists, Researchers, Educators, and Students.


Overview

dakitlab is a Python package designed to reduce repetitive notebook code and help users quickly explore, validate, and present data.

The current release focuses on a powerful Table class that provides:

  • Professional Plotly-powered tables
  • Interactive dataframe viewing
  • Statistical summary reports
  • Data integrity validation
  • Data health assessment reports
  • Support for multiple dataframe libraries

Whether you are working in:

  • Google Colab
  • Jupyter Notebook
  • JupyterLab
  • Kaggle Notebooks
  • VS Code Notebooks

dakitlab helps you spend less time writing boilerplate code and more time understanding your data.


Installation

pip install dakitlab

Import

from dakitlab import Table

Supported Dataframe Libraries

The Table class automatically accepts:

Library Supported
Pandas
Polars

Internally, data is converted when necessary so users can work with their preferred dataframe library.


Example Dataset

Examples throughout this README use an environmental monitoring dataset containing:

  • Latitude
  • Longitude
  • PM10
  • PM2.5
  • Carbon Monoxide
  • Nitrogen Dioxide
  • Ozone
  • Dust
  • UV Index
  • European AQI
  • Hazardous Event
import pandas as pd

df = pd.read_csv("environmental_data.csv")

Quick Start

from dakitlab import Table

table = Table(
    df,
    title="Environmental Monitoring Data"
)

table.show()

Creating a Table

Basic Table

table = Table(df)

Table With Title

table = Table(
    df,
    title="Environmental Monitoring Data"
)

Table With Custom Headers

table = Table(
    df,
    header_names=[
        "Latitude",
        "Longitude",
        "PM10",
        "PM2.5",
        "CO",
        "NO₂",
        "Ozone",
        "Dust",
        "UV",
        "AQI",
        "Hazard"
    ]
)

Display Methods

show()

Display the dataframe as a professional Plotly table.

table.show()

Custom caption:

table.show(
    caption="Air Quality Monitoring Results"
)

display()

Full display control.

table.display(
    filename="environmental_table",
    max_rows=500,
    show_index=False
)

interactive()

Displays the dataframe using an interactive notebook table.

table.interactive()

Specify rows per page:

table.interactive(
    rows_per_page=50
)

Layout Customization

set_layout()

Control title alignment, dimensions, margins, and column widths.

table.set_layout(
    title="Environmental Monitoring Data",
    title_align="center",
    width=1200,
    height=700
)

Advanced example:

table.set_layout(
    width=1400,
    height=800,
    header_height=50,
    cell_height=35,
    margin={
        "l":20,
        "r":20,
        "t":80,
        "b":20
    },
    column_widths=[
        150,150,120,120,120,
        120,120,120,100,100,120
    ]
)

Header Styling

set_header_style()

table.set_header_style(
    fillcolor="#1f2937",
    textcolor="white",
    align="center",
    fontsize=14,
    bold=True
)

Supported fonts:

  • Arial
  • Calibri
  • Helvetica
  • Times New Roman
  • Courier New
  • Verdana

Cell Styling

set_cell_style()

table.set_cell_style(
    fillcolor=["#ffffff", "#f9fafb"],
    textcolor="#111827",
    align="left",
    fontsize=12
)

Alternating row colors:

table.set_cell_style(
    fillcolor=[
        "#ffffff",
        "#f3f4f6"
    ]
)

Global Styling

set_global_style()

table.set_global_style(
    paper_bgcolor="#f3f4f6"
)

Statistical Summary Reports

The stats() method generates descriptive statistics and EDA summaries.


Basic Statistics

table.stats()

Returns:

  • Count
  • Missing values
  • Missing %
  • Unique values
  • Mean
  • Standard deviation
  • Min
  • Max
  • Range
  • Coefficient of variation

Full Statistics

table.stats(mode="full")

Adds:

  • Median
  • Variance
  • Quartiles
  • IQR
  • Outlier counts
  • Outlier percentages
  • Skewness
  • Distribution shape
  • Status indicators

Select Specific Columns By Index

table.stats(
    columns=[2,3,4]
)

Select Specific Columns By Name

table.stats(
    columns=[
        "PM10_ug_m3",
        "PM2_5_ug_m3",
        "European_AQI"
    ]
)

Full Statistics For Selected Columns

table.stats(
    columns=[
        "PM10_ug_m3",
        "PM2_5_ug_m3"
    ],
    mode="full"
)

Round Output

table.stats(
    round_digits=2
)

Data Integrity Validation

The integrity() method validates data using user-defined rules.


Basic Integrity Check

table.integrity()

Uses built-in checks only.


Validate Selected Columns

Using column indexes:

table.integrity(
    columns=[0,1,2]
)

Using column names:

table.integrity(
    columns=[
        "Latitude",
        "Longitude"
    ]
)

Creating Rules

Rules are defined as a dictionary.

Example:

rules = {

    "Latitude": {
        "required": True,
        "dtype": "numeric",
        "min": -90,
        "max": 90
    },

    "Longitude": {
        "required": True,
        "dtype": "numeric",
        "min": -180,
        "max": 180
    }

}

Run:

table.integrity(
    rules=rules
)

Supported Rules

Required Values

{
    "required": True
}

Unique Values

{
    "unique": True
}

Data Type Validation

{
    "dtype": "numeric"
}

Supported:

numeric
text
boolean
date
datetime

Numeric Range Validation

{
    "min": 0,
    "max": 100
}

Positive Values

{
    "positive": True
}

Non-Negative Values

{
    "non_negative": True
}

Allowed Values

{
    "allowed": [0,1]
}

Minimum Length

{
    "min_length": 3
}

Maximum Length

{
    "max_length": 50
}

Alphabetic Only

{
    "isalpha": True
}

Example:

Andrew
Alice
Bob

Numeric Only

{
    "isnumeric": True
}

Example:

12345
67890

Alphanumeric Only

{
    "isalnum": True
}

Example:

ABC123
Student01

Regular Expressions

Email validation:

{
    "regex": r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"
}

Phone validation:

{
    "regex": r"^[0-9\-\+\(\) ]+$"
}

Allowed Characters

{
    "allowed_chars": "A-Za-z0-9_"
}

Allows:

abc
ABC
123
student_01

Complete Integrity Example

rules = {

    "Latitude": {
        "required": True,
        "dtype": "numeric",
        "min": -90,
        "max": 90
    },

    "Longitude": {
        "required": True,
        "dtype": "numeric",
        "min": -180,
        "max": 180
    },

    "Hazardous_Event": {
        "allowed": [0,1]
    },

    "Station_Name": {
        "required": True,
        "min_length": 3,
        "max_length": 50
    }

}

table.integrity(
    rules=rules
)

Data Health Reports

The data_health() method provides a high-level overview of dataset quality.

table.data_health()

What Is Included?

  • Dataset health score
  • Missing value analysis
  • Missing rows report
  • Duplicate row detection
  • Rows requiring attention
  • Severity classification

Limit Problem Rows Displayed

table.data_health(
    max_problem_rows=25
)

Hide Problem Rows

table.data_health(
    show_problem_rows=False
)

Complete Workflow Example

from dakitlab import Table

table = Table(
    df,
    title="Environmental Monitoring Data"
)

table.show()

table.stats(
    columns=[
        "PM10_ug_m3",
        "PM2_5_ug_m3",
        "European_AQI"
    ],
    mode="full"
)

rules = {
    "Latitude": {
        "required": True,
        "min": -90,
        "max": 90
    },

    "Longitude": {
        "required": True,
        "min": -180,
        "max": 180
    }
}

table.integrity(rules=rules)

table.data_health()

Current Public Methods

Method Description
Table() Create a table object
show() Display styled table
display() Advanced table display
interactive() Interactive dataframe view
set_layout() Layout customization
set_header_style() Header customization
set_cell_style() Cell customization
set_global_style() Global styling
stats() Statistical summary reports
integrity() Rule-based validation
data_health() Dataset health assessment

Roadmap

Planned future classes:

  • Summary
  • CompareFrames
  • Cleaner
  • SchemaValidator
  • QuickPlot
  • CorrelationMap
  • DistributionGrid
  • Report
  • Snapshot
  • Profiler

License

MIT License


Author

Andrew Benyeogor Osenwe

Built for practical data analysis, exploratory data analysis, and notebook productivity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dakitlab-0.0.3.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dakitlab-0.0.3-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file dakitlab-0.0.3.tar.gz.

File metadata

  • Download URL: dakitlab-0.0.3.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dakitlab-0.0.3.tar.gz
Algorithm Hash digest
SHA256 f99a6b8f55d04c5d9028fdaaf807c87068d8260f3010f225afc33cba98835c34
MD5 4804cdf9d9d4f789e632344c9ca0e5d4
BLAKE2b-256 4821a2b63a1f3ecd7cec9be880f538cd0d016fa6ea75ee78c4d201e315819237

See more details on using hashes here.

File details

Details for the file dakitlab-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: dakitlab-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dakitlab-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9b436a60114a88855654bf111a9ef62f0fb5043b0f6f33d405eb5577cfc6922f
MD5 761925b491b17e59ed3223fd292f81a6
BLAKE2b-256 b67aa0b61af821eb0ef100665586c47c3f84922b86a8d44e9e7e34ff91ca1d05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page