Skip to main content

A "linter" for pandas DataFrames to automate data quality audits.

Project description

LintData

Python 3.13+ Licence: MIT CI

A "linter" for pandas DataFrames to automate data quality audits.

Installation

You can install LintData via pip:

pip install lintdata

Via UV:

uv add lintdata

Or install from source:

git clone https://github.com/patelheet30/lintdata.git
cd lintdata
pip install -e .

Features

20+ Data Quality Checks - Missing values, duplicates, outliers, type consistency, and more
Zero Configuration - Works out of the box with sensible defaults
Highly Configurable - Customize thresholds and select specific checks
Multiple Export Formats - Text, HTML, JSON, and CSV reports
Custom Checks API - Extend with your own validation logic
Pandas Native - Integrates seamlessly via .lint accessor

Quick Start

import pandas as pd
import lintdata

# Load your DataFrame
df = pd.read_csv("your_data.csv")

# Run quality checks
report = df.lint.report()
print(report)

Example Output:

--- LintData Quality Report ---
Shape: (1000, 8)

Running Checks:
Found 5 issue(s):
  1. [Missing Values] Column 'age': 45 missing values (4.5%)
  2. [Duplicates] Found 12 duplicate rows (1.2% of data)
  3. [Outliers] Column 'salary': 8 potential outliers detected (IQR method)
  4. [Mixed Types] Column 'phone' contains both numeric and string values
  5. [High Cardinality] Column 'user_id' has 987 unique values (98.7%)

--- End of Report ---

Available Checks

LintData includes 22+ built-in checks across multiple categories:

  • Missing Data: Missing values, missing patterns
  • Duplicates: Duplicate rows, duplicate columns
  • Data Types: Mixed types, type consistency
  • Statistical: Outliers, skewness, correlation warnings
  • Categorical: Cardinality, rare categories, case consistency
  • Numerical: Negative values, zero inflation
  • Strings: Whitespace, special characters, length outliers
  • Dates: Format consistency, future dates, date range anomalies
  • Multi-table: Referential integrity (foreign key validation)

Export Formats

Save reports in multiple formats:

# HTML report with visualizations
df.lint.report(report_format='html', output='report.html')

# JSON for programmatic access
df.lint.report(report_format='json', output='report.json')

# CSV for spreadsheet analysis
df.lint.report(report_format='csv', output='issues.csv')

Custom Checks

Extend LintData with your own validation logic:

def check_email_format(df):
    """Validate email addresses."""
    warnings = []
    for col in df.select_dtypes(include='object').columns:
        if 'email' in col.lower():
            invalid = df[~df[col].str.contains('@', na=False)]
            if len(invalid) > 0:
                warnings.append(f"[Email] Column '{col}': {len(invalid)} invalid emails")
    return warnings

# Register and use
df.lint.register_check(check_email_format)
df.lint.report()

Documentation

Full documentation available at: LintData Documentation

Issues and Support

For general help or to report bugs, please open an issue on GitHub: LintData Issues.

If you have questions or need assistance, feel free to reach out via Discord: patelheet30

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lintdata-1.0.0.tar.gz (31.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lintdata-1.0.0-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file lintdata-1.0.0.tar.gz.

File metadata

  • Download URL: lintdata-1.0.0.tar.gz
  • Upload date:
  • Size: 31.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lintdata-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fb4c326cdf270527f4829becd71eabcd30c8c63f3ea12e11bb8b480f6e0c9274
MD5 df2a4f4e8d014e7cdadb57af8a3315f1
BLAKE2b-256 c94db4069cb656de430360cb6118a41ab61ae544c2dbad383dc2f502447fe350

See more details on using hashes here.

File details

Details for the file lintdata-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: lintdata-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lintdata-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b74c8879d0459779f3029aedac0b46f2179253e9a54c0926ebbc0499745735d0
MD5 29e31e46a1213673d996f2812bedee0c
BLAKE2b-256 aaab0246c5c52cd8bdd911f555b795a205733581d72e6aaedf24d2f1028fd61a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page