Skip to main content

A "linter" for pandas DataFrames to automate data quality audits.

Project description

LintData

PyPI Python 3.13+ Licence: MIT CI

A "linter" for pandas DataFrames to automate data quality audits.

Installation

You can install LintData via pip:

pip install lintdata

Via UV:

uv add lintdata

Or install from source:

git clone https://github.com/patelheet30/lintdata.git
cd lintdata
pip install -e .

Features

20+ Data Quality Checks - Missing values, duplicates, outliers, type consistency, and more
Zero Configuration - Works out of the box with sensible defaults
Highly Configurable - Customize thresholds and select specific checks
Multiple Export Formats - Text, HTML, JSON, and CSV reports
Custom Checks API - Extend with your own validation logic
Pandas Native - Integrates seamlessly via .lint accessor

Quick Start

import pandas as pd
import lintdata

# Load your DataFrame
df = pd.read_csv("your_data.csv")

# Run quality checks
report = df.lint.report()
print(report)

Example Output:

--- LintData Quality Report ---
Shape: (1000, 8)

Running Checks:
Found 5 issue(s):
  1. [Missing Values] Column 'age': 45 missing values (4.5%)
  2. [Duplicates] Found 12 duplicate rows (1.2% of data)
  3. [Outliers] Column 'salary': 8 potential outliers detected (IQR method)
  4. [Mixed Types] Column 'phone' contains both numeric and string values
  5. [High Cardinality] Column 'user_id' has 987 unique values (98.7%)

--- End of Report ---

Available Checks

LintData includes 22+ built-in checks across multiple categories:

  • Missing Data: Missing values, missing patterns
  • Duplicates: Duplicate rows, duplicate columns
  • Data Types: Mixed types, type consistency
  • Statistical: Outliers, skewness, correlation warnings
  • Categorical: Cardinality, rare categories, case consistency
  • Numerical: Negative values, zero inflation
  • Strings: Whitespace, special characters, length outliers
  • Dates: Format consistency, future dates, date range anomalies
  • Multi-table: Referential integrity (foreign key validation)

Export Formats

Save reports in multiple formats:

# HTML report with visualizations
df.lint.report(report_format='html', output='report.html')

# JSON for programmatic access
df.lint.report(report_format='json', output='report.json')

# CSV for spreadsheet analysis
df.lint.report(report_format='csv', output='issues.csv')

Custom Checks

Extend LintData with your own validation logic:

def check_email_format(df):
    """Validate email addresses."""
    warnings = []
    for col in df.select_dtypes(include='object').columns:
        if 'email' in col.lower():
            invalid = df[~df[col].str.contains('@', na=False)]
            if len(invalid) > 0:
                warnings.append(f"[Email] Column '{col}': {len(invalid)} invalid emails")
    return warnings

# Register and use
df.lint.register_check(check_email_format)
df.lint.report()

Documentation

Full documentation available at: LintData Documentation

Issues and Support

For general help or to report bugs, please open an issue on GitHub: LintData Issues.

If you have questions or need assistance, feel free to reach out via Discord: patelheet30

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lintdata-1.1.0.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lintdata-1.1.0-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file lintdata-1.1.0.tar.gz.

File metadata

  • Download URL: lintdata-1.1.0.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lintdata-1.1.0.tar.gz
Algorithm Hash digest
SHA256 29645d6af06c09851ad1c4423b1532bec6e3a4ae89b7525bb4fe8bd6e10255ef
MD5 a8a014cfd0a6ceca9fbbfe0ffba8d62f
BLAKE2b-256 b5018b9bf23d5bc6b783c6b32ed2a899edc2ee6d4e9b3fdad6038418d04b4d65

See more details on using hashes here.

File details

Details for the file lintdata-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: lintdata-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.15 {"installer":{"name":"uv","version":"0.9.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lintdata-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b5f9eaa959b0a9ad175c8fe23655c10d1dae287aee28f6509e8d48f89c42391
MD5 0c1e8550428ffafcf4b4db63ed330cb3
BLAKE2b-256 c8f1f598bf572c46029e82608c518e0b12213ea428287e4b9f212dc9444e30d2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page