Skip to main content

A linting tool for xarray datasets

Project description

xinter

A comprehensive linting and data quality checking tool for xarray datasets.

Overview

xinter provides automated data quality checks for xarray datasets, helping you identify issues like missing values, outliers, incorrect units, and other data anomalies. It features an extensible architecture that allows you to easily add custom checkers for your specific data validation needs.

Features

  • 25 Built-in Checkers: Comprehensive checks for data quality including:

    • Missing values (NaNs)
    • Statistical properties (mean, std, skewness, kurtosis)
    • Outlier detection (IQR method)
    • Data type validation
    • Units verification and parsing
    • Coordinate uniformity checks
    • Shape and size validation
    • And many more...
  • Extensible Architecture: Easily add custom checkers using a simple decorator pattern

  • Rich CLI Output: Beautiful terminal output with tables showing results

  • DataFrame Export: Convert results to pandas DataFrames for further analysis

  • Coordinate Checking: Optionally check coordinate arrays in addition to data variables

  • Group Support: Handle datasets with hierarchical groups (e.g., Zarr, NetCDF4)

Installation

pip install xinter

Or install from source:

git clone https://github.com/samueljackson92/xinter.git
cd xinter
pip install -e .

Quick Start

Command Line Interface

Lint a single file:

xl mydata.zarr

Lint multiple files:

xl file1.nc file2.zarr file3.nc

Check coordinates in addition to data variables:

xl mydata.zarr --coords

Specify a group within the dataset:

xl mydata.zarr --group=/equilibrium

Python API

from xinter.cli import lint_dataset, reports_to_dataframe

# Lint a dataset
reports = lint_dataset("mydata.zarr", check_coords=True)

# Convert to DataFrame for analysis
df = reports_to_dataframe(reports)

# Filter for failed checks
failures = df[~df["success"]]
print(failures)

# Export to CSV
df.to_csv("lint_report.csv", index=False)

Built-in Checkers

Checker Description
NaNs Proportion of NaN values
Mean Mean value
Standard deviation Standard deviation
IQR outliers Proportion of values outside IQR range
Range Range of values (max - min)
Max Maximum value
Min Minimum value
Duplicate values Proportion of duplicate values
Negative values Proportion of negative values
Zero values Proportion of zero values
Constant values Whether all values are constant
Infinite values Proportion of infinite values
Skewness Skewness of the distribution
Kurtosis Kurtosis of the distribution
Entropy Shannon entropy of the distribution
Data type Data type of the variable
Units Units attribute
Units parsable Whether units can be parsed by pint
Diff Mean of first differences
Diff constant Whether differences are constant (coordinates only)
Shape Shape of the variable
Size Total number of elements
Variable name Name of the variable
Dimension names Names of the dimensions
Constant along dimension Whether values are constant along the first dimension

Creating Custom Checkers

You can easily extend xinter with custom checkers:

from xinter.linters import DataArrayChecker, LinterRegistry, CheckerResult
import xarray as xr

@LinterRegistry.register()
class MyCustomChecker(DataArrayChecker):
    """Check if values are within expected range."""
    
    name = "Value range check"
    description = "Checks if values fall within [0, 100]"
    
    def check(self, var: xr.DataArray) -> CheckerResult:
        min_val = var.min().item()
        max_val = var.max().item()
        in_range = 0 <= min_val and max_val <= 100
        
        return CheckerResult(
            value=in_range,
            message=f"Range: [{min_val}, {max_val}]",
            success=in_range,
        )

Your custom checker will automatically be included in all linting operations.

Output Format

The reports_to_dataframe() function produces a DataFrame with the following columns:

  • file_path: Path to the dataset file
  • target_type: Either "data_vars" or "coords"
  • variable_name: Name of the variable
  • checker_name: Name of the checker
  • value: The check result value
  • message: Descriptive message about the result
  • success: Boolean indicating if the check passed

Web Dashboard (GUI)

xinter includes a modern, interactive web-based dashboard for visualizing linting results. The dashboard provides:

  • 📊 Interactive Charts: Explore data quality metrics with beautiful Plotly visualizations
  • 🔍 Real-time Filtering: Filter results by file and group
  • 📈 Comprehensive Analytics: NaN distribution, data types, statistical distributions, entropy analysis, and more

Installation

Install with GUI support:

pip install -e ".[gui]"

Usage

Launch the dashboard for any linting report:

xl-gui linting_report.parquet

Or with custom options:

xl-gui thomson_scattering.parquet --port 8080 --title "Thomson Scattering Analysis"

The dashboard will open in your browser at http://localhost:8050 (or your specified port).

See GUI_README.md for detailed documentation and examples.

Development

# Clone the repository
git clone https://github.com/yourusername/xinter.git
cd xinter

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
ruff format .

# Lint code
ruff check .

License

MIT License - see LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Authors

Acknowledgments

xinter builds on the excellent work of the xarray, pandas, and pint communities.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xinter-0.1.0.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xinter-0.1.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

File details

Details for the file xinter-0.1.0.tar.gz.

File metadata

  • Download URL: xinter-0.1.0.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xinter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 add8f8c3bff83d8d4890ff9a20a21a0b9f67cdb9cbf85ffd332f6ff302f2bc11
MD5 0a374ebcba31e9c9d5d4168054427c03
BLAKE2b-256 72552be3e1dc45d03175e46d895b7c212b8bc9b70c3fdc86085c00ebd4bd419a

See more details on using hashes here.

Provenance

The following attestation bundles were made for xinter-0.1.0.tar.gz:

Publisher: publish.yml on samueljackson92/xinter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file xinter-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: xinter-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 24.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for xinter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 581d7a71a1ca2feca21cbc5e713aec6e41eae3f595a39d011fc0d41d3ff47cb7
MD5 412d2316e37ec1f6e1e1217c91995e99
BLAKE2b-256 aea8745a9941bc34d8279c62d4ff57ef1c9b1a43c8a716c3b2138e63f7effb9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for xinter-0.1.0-py3-none-any.whl:

Publisher: publish.yml on samueljackson92/xinter

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page