Skip to main content

Python package for validating, processing and parsing directories.

Project description

katachi

Release Build status codecov Commit activity License

Logo

Katachi is a Python package for validating, processing, and parsing directory structures against defined schemas.

Note: Katachi is currently under active development and should be considered a work in progress. APIs may change in future releases.

Features

  • 📐 Schema-based validation - Define expected directory structures using YAML
  • 🧩 Extensible architecture - Create custom validators and actions
  • 🔄 Relationship validation - Validate relationships between files (like paired files)
  • 🚀 Command-line interface - Easy to use CLI with rich formatting
  • 📋 Detailed reports - Get comprehensive validation reports

Installation

Install from PyPI:

pip install katachi

For development:

git clone https://github.com/nmicovic/katachi.git
cd katachi
make install

Quick Start

Define a schema (schema.yaml)

semantical_name: data
type: directory
pattern_name: data
children:
  - semantical_name: image
    pattern_name: "img\\d+"
    type: file
    extension: .jpg
    description: "Image files with numeric identifiers"
  - semantical_name: metadata
    pattern_name: "img\\d+"
    type: file
    extension: .json
    description: "Metadata for image files"
  - semantical_name: file_pairs_check
    type: predicate
    predicate_type: pair_comparison
    description: "Check if images have matching metadata files"
    elements:
      - image
      - metadata

Validate a directory structure

katachi validate schema.yaml target_directory

Command-Line Examples

Validate a simple directory structure:

katachi validate "tests/schema_tests/test_sanity/schema.yaml" "tests/schema_tests/test_sanity/dataset"

Validate a nested directory structure:

katachi validate "tests/schema_tests/test_depth_1/schema.yaml" "tests/schema_tests/test_depth_1/dataset"

Validate paired files (e.g., ensure each .jpg has a matching .json file):

katachi validate "tests/schema_tests/test_paired_files/schema.yaml" "tests/schema_tests/test_paired_files/data"

Python API

from pathlib import Path
from katachi.schema.importer import load_yaml
from katachi.schema.validate import validate_schema

# Load schema from YAML
schema = load_yaml(Path("schema.yaml"), Path("data_directory"))

# Validate directory against schema
report = validate_schema(schema, Path("data_directory"))

# Check if validation passed
if report.is_valid():
    print("Validation successful!")
else:
    print("Validation failed with the following issues:")
    for result in report.results:
        if not result.is_valid:
            print(f"- {result.path}: {result.message}")

Extending Katachi

Custom validators

from pathlib import Path
from katachi.schema.schema_node import SchemaNode
from katachi.validation.core import ValidationResult, ValidatorRegistry

def my_custom_validator(node: SchemaNode, path: Path) -> ValidationResult:
    # Custom validation logic
    return ValidationResult(
        is_valid=True,
        message="Custom validation passed",
        path=path,
        validator_name="custom_validator"
    )

# Register the validator
ValidatorRegistry.register("custom_validator", my_custom_validator)

Custom file processing

from pathlib import Path
from typing import Any
from katachi.schema.actions import register_action, NodeContext

def process_image(node, path: Path, parent_contexts: list[NodeContext], context: dict[str, Any]) -> None:
    # Custom image processing logic
    print(f"Processing image: {path}")
    # Access parent context if needed
    for parent_node, parent_path in parent_contexts:
        if parent_node.semantical_name == "timestamp":
            print(f"Image from date: {parent_path.name}")
            break

# Register the action
register_action("image", process_image)

Contributing

Contributions are welcome! See CONTRIBUTING.md for details.

License

This project is licensed under the terms of the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

katachi-0.0.5a0.tar.gz (3.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

katachi-0.0.5a0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file katachi-0.0.5a0.tar.gz.

File metadata

  • Download URL: katachi-0.0.5a0.tar.gz
  • Upload date:
  • Size: 3.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for katachi-0.0.5a0.tar.gz
Algorithm Hash digest
SHA256 260bc71072129840e2281811a6b8f4ee30b11492ed23459634d0cb714be83adc
MD5 aabff4a6f88a8c3c2e80b3d14745f884
BLAKE2b-256 4d59920327530032a9513101eea48f9fb32b8fbb7d2126a43b96da9413fe7942

See more details on using hashes here.

File details

Details for the file katachi-0.0.5a0-py3-none-any.whl.

File metadata

  • Download URL: katachi-0.0.5a0-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.14

File hashes

Hashes for katachi-0.0.5a0-py3-none-any.whl
Algorithm Hash digest
SHA256 6450e3be158a47c13aa121880658cb2d8bf3b18cc5934b9b092be9ce0741e0e7
MD5 2f57d8c309abd646b43bda9b9eb36368
BLAKE2b-256 4ca15f7bfa7c6331fc0e3cbb5e52cd49276eac446d0d4fdb401b86e97df0edbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page