Skip to main content

Python package for validating, processing and parsing directories.

Project description

katachi

Release Build status codecov Commit activity License

Logo

Katachi is a Python package for validating, processing, and parsing directory structures against defined schemas.

Note: Katachi is currently under active development and should be considered a work in progress. APIs may change in future releases.

Features

  • 📐 Schema-based validation - Define expected directory structures using YAML
  • 🧩 Extensible architecture - Create custom validators and actions
  • 🔄 Relationship validation - Validate relationships between files (like paired files)
  • 🚀 Command-line interface - Easy to use CLI with rich formatting
  • 📋 Detailed reports - Get comprehensive validation reports

Installation

Install from PyPI:

pip install katachi

For development:

git clone https://github.com/nmicovic/katachi.git
cd katachi
make install

Quick Start

Define a schema (schema.yaml)

semantical_name: data
type: directory
pattern_name: data
children:
  - semantical_name: image
    pattern_name: "img\\d+"
    type: file
    extension: .jpg
    description: "Image files with numeric identifiers"
  - semantical_name: metadata
    pattern_name: "img\\d+"
    type: file
    extension: .json
    description: "Metadata for image files"
  - semantical_name: file_pairs_check
    type: predicate
    predicate_type: pair_comparison
    description: "Check if images have matching metadata files"
    elements:
      - image
      - metadata

Validate a directory structure

katachi validate schema.yaml target_directory

Command-Line Examples

Validate a simple directory structure:

katachi validate "tests/schema_tests/test_sanity/schema.yaml" "tests/schema_tests/test_sanity/dataset"

Validate a nested directory structure:

katachi validate "tests/schema_tests/test_depth_1/schema.yaml" "tests/schema_tests/test_depth_1/dataset"

Validate paired files (e.g., ensure each .jpg has a matching .json file):

katachi validate "tests/schema_tests/test_paired_files/schema.yaml" "tests/schema_tests/test_paired_files/data"

Python API

from pathlib import Path
from katachi.schema.importer import load_yaml
from katachi.schema.validate import validate_schema

# Load schema from YAML
schema = load_yaml(Path("schema.yaml"), Path("data_directory"))

# Validate directory against schema
report = validate_schema(schema, Path("data_directory"))

# Check if validation passed
if report.is_valid():
    print("Validation successful!")
else:
    print("Validation failed with the following issues:")
    for result in report.results:
        if not result.is_valid:
            print(f"- {result.path}: {result.message}")

Extending Katachi

Custom validators

from pathlib import Path
from katachi.schema.schema_node import SchemaNode
from katachi.validation.core import ValidationResult, ValidatorRegistry

def my_custom_validator(node: SchemaNode, path: Path) -> ValidationResult:
    # Custom validation logic
    return ValidationResult(
        is_valid=True,
        message="Custom validation passed",
        path=path,
        validator_name="custom_validator"
    )

# Register the validator
ValidatorRegistry.register("custom_validator", my_custom_validator)

Custom file processing

from pathlib import Path
from typing import Any
from katachi.schema.actions import register_action, NodeContext

def process_image(node, path: Path, parent_contexts: list[NodeContext], context: dict[str, Any]) -> None:
    # Custom image processing logic
    print(f"Processing image: {path}")
    # Access parent context if needed
    for parent_node, parent_path in parent_contexts:
        if parent_node.semantical_name == "timestamp":
            print(f"Image from date: {parent_path.name}")
            break

# Register the action
register_action("image", process_image)

Contributing

Contributions are welcome! See CONTRIBUTING.md for details.

License

This project is licensed under the terms of the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

katachi-0.0.4a0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

katachi-0.0.4a0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file katachi-0.0.4a0.tar.gz.

File metadata

  • Download URL: katachi-0.0.4a0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for katachi-0.0.4a0.tar.gz
Algorithm Hash digest
SHA256 d13ec2d119a78410fbe2d79c34200acb9149280b6b11118d399b345eb9ff23e0
MD5 4ed73c9af1e6df8d44d41a423016e537
BLAKE2b-256 6b0b12dd0a72b251f58626eb689e38915c3661d8fd2851cdcdbe0252527513ef

See more details on using hashes here.

File details

Details for the file katachi-0.0.4a0-py3-none-any.whl.

File metadata

  • Download URL: katachi-0.0.4a0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for katachi-0.0.4a0-py3-none-any.whl
Algorithm Hash digest
SHA256 293ae646523a1284d3ab58d9552a82abec25cfe890aef9964ceee32860e67a5a
MD5 c08bbe6be158a992966d63aff26aaf57
BLAKE2b-256 e6fd1ea7bad8054b81cab39e30b3e6bdf9f4b74ea279591b88a0052d2c263a2f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page