Skip to main content

A "linter" for pandas DataFrames to automate data quality audits.

Project description

LintData

PyPI Python 3.13+ Licence: MIT CI

A "linter" for pandas DataFrames to automate data quality audits.

Installation

You can install LintData via pip:

pip install lintdata

Via UV:

uv add lintdata

Or install from source:

git clone https://github.com/patelheet30/lintdata.git
cd lintdata
pip install -e .

Features

20+ Data Quality Checks - Missing values, duplicates, outliers, type consistency, and more
Zero Configuration - Works out of the box with sensible defaults
Highly Configurable - Customize thresholds and select specific checks
Multiple Export Formats - Text, HTML, JSON, and CSV reports
Custom Checks API - Extend with your own validation logic
Pandas Native - Integrates seamlessly via .lint accessor

Quick Start

import pandas as pd
import lintdata

# Load your DataFrame
df = pd.read_csv("your_data.csv")

# Run quality checks
report = df.lint.report()
print(report)

Example Output:

--- LintData Quality Report ---
Shape: (1000, 8)

Running Checks:
Found 5 issue(s):
  1. [Missing Values] Column 'age': 45 missing values (4.5%)
  2. [Duplicates] Found 12 duplicate rows (1.2% of data)
  3. [Outliers] Column 'salary': 8 potential outliers detected (IQR method)
  4. [Mixed Types] Column 'phone' contains both numeric and string values
  5. [High Cardinality] Column 'user_id' has 987 unique values (98.7%)

--- End of Report ---

Available Checks

LintData includes 22+ built-in checks across multiple categories:

  • Missing Data: Missing values, missing patterns
  • Duplicates: Duplicate rows, duplicate columns
  • Data Types: Mixed types, type consistency
  • Statistical: Outliers, skewness, correlation warnings
  • Categorical: Cardinality, rare categories, case consistency
  • Numerical: Negative values, zero inflation
  • Strings: Whitespace, special characters, length outliers
  • Dates: Format consistency, future dates, date range anomalies
  • Multi-table: Referential integrity (foreign key validation)

Export Formats

Save reports in multiple formats:

# HTML report with visualizations
df.lint.report(report_format='html', output='report.html')

# JSON for programmatic access
df.lint.report(report_format='json', output='report.json')

# CSV for spreadsheet analysis
df.lint.report(report_format='csv', output='issues.csv')

Custom Checks

Extend LintData with your own validation logic:

def check_email_format(df):
    """Validate email addresses."""
    warnings = []
    for col in df.select_dtypes(include='object').columns:
        if 'email' in col.lower():
            invalid = df[~df[col].str.contains('@', na=False)]
            if len(invalid) > 0:
                warnings.append(f"[Email] Column '{col}': {len(invalid)} invalid emails")
    return warnings

# Register and use
df.lint.register_check(check_email_format)
df.lint.report()

Documentation

Full documentation available at: LintData Documentation

Issues and Support

For general help or to report bugs, please open an issue on GitHub: LintData Issues.

If you have questions or need assistance, feel free to reach out via Discord: patelheet30

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lintdata-1.0.1.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lintdata-1.0.1-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file lintdata-1.0.1.tar.gz.

File metadata

  • Download URL: lintdata-1.0.1.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lintdata-1.0.1.tar.gz
Algorithm Hash digest
SHA256 10b20ac81f73b36fa04c99c323655deba61a68198d967930ae27cf0f73c74ddc
MD5 1cb6244e5cde7ea6080b29c20141cba6
BLAKE2b-256 60ca57c85a6a717445f37baca51bfb35de3536467323c6fa8857e97e701d0c2b

See more details on using hashes here.

File details

Details for the file lintdata-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: lintdata-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.13 {"installer":{"name":"uv","version":"0.9.13"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for lintdata-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 96bdd84440bd88925cb3246790e04352acc385eba53160560c048e1a6f673ecb
MD5 9430e0cbb6764734f711aa629898c5c3
BLAKE2b-256 8b6f2be6722591d5a41f190bd2677dadd47942a2baa64a4d375e77c7cf468aa1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page