A simple data analysis and visualization toolkit created for use on google colaboratory or other Notebooks such as jupyter. This Python Package will help reduce repetitive notebook code and help users quickly explore, validate, and present data
Project description
dakitlab
Professional Tables, Statistical Summaries, Data Validation, and Data Health Reports for Python Dataframes.
Built for Data Analysts, Data Scientists, Researchers, Educators, and Students.
Overview
dakitlab is a Python package designed to reduce repetitive notebook code and help users quickly explore, validate, and present data.
The current release focuses on a powerful Table class that provides:
- Professional Plotly-powered tables
- Interactive dataframe viewing
- Statistical summary reports
- Data integrity validation
- Data health assessment reports
- Support for multiple dataframe libraries
Whether you are working in:
- Google Colab
- Jupyter Notebook
- JupyterLab
- Kaggle Notebooks
- VS Code Notebooks
dakitlab helps you spend less time writing boilerplate code and more time understanding your data.
Installation
pip install dakitlab
Import
from dakitlab import Table
Supported Dataframe Libraries
The Table class automatically accepts:
| Library | Supported |
|---|---|
| Pandas | ✅ |
| Polars | ✅ |
Internally, data is converted when necessary so users can work with their preferred dataframe library.
Example Dataset
Examples throughout this README use an environmental monitoring dataset containing:
- Latitude
- Longitude
- PM10
- PM2.5
- Carbon Monoxide
- Nitrogen Dioxide
- Ozone
- Dust
- UV Index
- European AQI
- Hazardous Event
import pandas as pd
df = pd.read_csv("environmental_data.csv")
Quick Start
from dakitlab import Table
table = Table(
df,
title="Environmental Monitoring Data"
)
table.show()
Creating a Table
Basic Table
table = Table(df)
Table With Title
table = Table(
df,
title="Environmental Monitoring Data"
)
Table With Custom Headers
table = Table(
df,
header_names=[
"Latitude",
"Longitude",
"PM10",
"PM2.5",
"CO",
"NO₂",
"Ozone",
"Dust",
"UV",
"AQI",
"Hazard"
]
)
Display Methods
show()
Display the dataframe as a professional Plotly table.
table.show()
Custom caption:
table.show(
caption="Air Quality Monitoring Results"
)
display()
Full display control.
table.display(
filename="environmental_table",
max_rows=500,
show_index=False
)
interactive()
Displays the dataframe using an interactive notebook table.
table.interactive()
Specify rows per page:
table.interactive(
rows_per_page=50
)
Layout Customization
set_layout()
Control title alignment, dimensions, margins, and column widths.
table.set_layout(
title="Environmental Monitoring Data",
title_align="center",
width=1200,
height=700
)
Advanced example:
table.set_layout(
width=1400,
height=800,
header_height=50,
cell_height=35,
margin={
"l":20,
"r":20,
"t":80,
"b":20
},
column_widths=[
150,150,120,120,120,
120,120,120,100,100,120
]
)
Header Styling
set_header_style()
table.set_header_style(
fillcolor="#1f2937",
textcolor="white",
align="center",
fontsize=14,
bold=True
)
Supported fonts:
- Arial
- Calibri
- Helvetica
- Times New Roman
- Courier New
- Verdana
Cell Styling
set_cell_style()
table.set_cell_style(
fillcolor=["#ffffff", "#f9fafb"],
textcolor="#111827",
align="left",
fontsize=12
)
Alternating row colors:
table.set_cell_style(
fillcolor=[
"#ffffff",
"#f3f4f6"
]
)
Global Styling
set_global_style()
table.set_global_style(
paper_bgcolor="#f3f4f6"
)
Statistical Summary Reports
The stats() method generates descriptive statistics and EDA summaries.
Basic Statistics
table.stats()
Returns:
- Count
- Missing values
- Missing %
- Unique values
- Mean
- Standard deviation
- Min
- Max
- Range
- Coefficient of variation
Full Statistics
table.stats(mode="full")
Adds:
- Median
- Variance
- Quartiles
- IQR
- Outlier counts
- Outlier percentages
- Skewness
- Distribution shape
- Status indicators
Select Specific Columns By Index
table.stats(
columns=[2,3,4]
)
Select Specific Columns By Name
table.stats(
columns=[
"PM10_ug_m3",
"PM2_5_ug_m3",
"European_AQI"
]
)
Full Statistics For Selected Columns
table.stats(
columns=[
"PM10_ug_m3",
"PM2_5_ug_m3"
],
mode="full"
)
Round Output
table.stats(
round_digits=2
)
Data Integrity Validation
The integrity() method validates data using user-defined rules.
Basic Integrity Check
table.integrity()
Uses built-in checks only.
Validate Selected Columns
Using column indexes:
table.integrity(
columns=[0,1,2]
)
Using column names:
table.integrity(
columns=[
"Latitude",
"Longitude"
]
)
Creating Rules
Rules are defined as a dictionary.
Example:
rules = {
"Latitude": {
"required": True,
"dtype": "numeric",
"min": -90,
"max": 90
},
"Longitude": {
"required": True,
"dtype": "numeric",
"min": -180,
"max": 180
}
}
Run:
table.integrity(
rules=rules
)
Supported Rules
Required Values
{
"required": True
}
Unique Values
{
"unique": True
}
Data Type Validation
{
"dtype": "numeric"
}
Supported:
numeric
text
boolean
date
datetime
Numeric Range Validation
{
"min": 0,
"max": 100
}
Positive Values
{
"positive": True
}
Non-Negative Values
{
"non_negative": True
}
Allowed Values
{
"allowed": [0,1]
}
Minimum Length
{
"min_length": 3
}
Maximum Length
{
"max_length": 50
}
Alphabetic Only
{
"isalpha": True
}
Example:
Andrew
Alice
Bob
Numeric Only
{
"isnumeric": True
}
Example:
12345
67890
Alphanumeric Only
{
"isalnum": True
}
Example:
ABC123
Student01
Regular Expressions
Email validation:
{
"regex": r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"
}
Phone validation:
{
"regex": r"^[0-9\-\+\(\) ]+$"
}
Allowed Characters
{
"allowed_chars": "A-Za-z0-9_"
}
Allows:
abc
ABC
123
student_01
Complete Integrity Example
rules = {
"Latitude": {
"required": True,
"dtype": "numeric",
"min": -90,
"max": 90
},
"Longitude": {
"required": True,
"dtype": "numeric",
"min": -180,
"max": 180
},
"Hazardous_Event": {
"allowed": [0,1]
},
"Station_Name": {
"required": True,
"min_length": 3,
"max_length": 50
}
}
table.integrity(
rules=rules
)
Data Health Reports
The data_health() method provides a high-level overview of dataset quality.
table.data_health()
What Is Included?
- Dataset health score
- Missing value analysis
- Missing rows report
- Duplicate row detection
- Rows requiring attention
- Severity classification
Limit Problem Rows Displayed
table.data_health(
max_problem_rows=25
)
Hide Problem Rows
table.data_health(
show_problem_rows=False
)
Complete Workflow Example
from dakitlab import Table
table = Table(
df,
title="Environmental Monitoring Data"
)
table.show()
table.stats(
columns=[
"PM10_ug_m3",
"PM2_5_ug_m3",
"European_AQI"
],
mode="full"
)
rules = {
"Latitude": {
"required": True,
"min": -90,
"max": 90
},
"Longitude": {
"required": True,
"min": -180,
"max": 180
}
}
table.integrity(rules=rules)
table.data_health()
Current Public Methods
| Method | Description |
|---|---|
| Table() | Create a table object |
| show() | Display styled table |
| display() | Advanced table display |
| interactive() | Interactive dataframe view |
| set_layout() | Layout customization |
| set_header_style() | Header customization |
| set_cell_style() | Cell customization |
| set_global_style() | Global styling |
| stats() | Statistical summary reports |
| integrity() | Rule-based validation |
| data_health() | Dataset health assessment |
Roadmap
Planned future classes:
- Summary
- CompareFrames
- Cleaner
- SchemaValidator
- QuickPlot
- CorrelationMap
- DistributionGrid
- Report
- Snapshot
- Profiler
License
MIT License
Author
Andrew Benyeogor Osenwe
Built for practical data analysis, exploratory data analysis, and notebook productivity.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dakitlab-0.0.2.tar.gz.
File metadata
- Download URL: dakitlab-0.0.2.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.17.0 {"ci":null,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.13.0"},"installer":{"name":"hatch","version":"1.17.0"},"openssl_version":"OpenSSL 3.0.15 3 Sep 2024","python":"3.13.0","system":{"name":"Windows","release":"11"}} HTTPX2/2.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e5296e02ba248b93f429aa173c7eb3122407bb0376f6d0de0338f09e49293f1
|
|
| MD5 |
465bd524ad4132b154a7d988a9b0df90
|
|
| BLAKE2b-256 |
15f392b65b1840ab5151f1eb6550399e29d03b635ea830f408fd199dbc633e6c
|
File details
Details for the file dakitlab-0.0.2-py3-none-any.whl.
File metadata
- Download URL: dakitlab-0.0.2-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.17.0 {"ci":null,"cpu":"AMD64","implementation":{"name":"CPython","version":"3.13.0"},"installer":{"name":"hatch","version":"1.17.0"},"openssl_version":"OpenSSL 3.0.15 3 Sep 2024","python":"3.13.0","system":{"name":"Windows","release":"11"}} HTTPX2/2.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7f6a60190303a468564f1b3528f8dbc66d04d815e6e01887136f8f6418deee2
|
|
| MD5 |
c8c29f805667f75eb3b8ac82e3124070
|
|
| BLAKE2b-256 |
f2daefbf2bcf720d52a55eeff46d8027d838e3e14cc1f4f18505071a4aa50064
|