Skip to main content

AST-based linter for Python DTO discipline and facade-ban enforcement — framework-agnostic.

Project description

dto-strict

AST-based linter for Python DTO discipline and facade-ban enforcement — pluggable, framework-agnostic.

Why dto-strict?

Data Transfer Objects (DTOs) provide a critical boundary between services and prevent the fragmentation of business-logic definitions across codebases. However, when function signatures leak Dict[str, Any] or when services build dict literals inline instead of using structured DTOs, code becomes:

  • Loosely typed: Shape mismatches only surface at runtime.
  • Duplicated: The same business object gets redefined wherever it's used.
  • Hard to evolve: Changing a field requires updating dicts in 10+ places.

Facade functions (module-level helpers that wrap framework machinery) similarly tend to proliferate and obscure intent when unmarked. The "facade—celery schedule" pattern makes intent explicit.

Why in healthcare? Healthcare systems (HIPAA/PHI/HIPAA-regulated compliance platforms) benefit from strong DTO boundaries because they force explicit thinking about what data is structured, typed, and auditable. When handling patient records, medical documents, and compliance reports, untyped dicts create liability: a field can be added silently, changed in shape unpredictably, and no type checker catches missing PII handling.

dto-strict enforces DTO and facade discipline via static AST analysis, with 6 focused rules:

  1. R001 (HIGH): Detect Dict[str, Any] or bare dict/list/tuple in service-layer function signatures (strict mode optional).
  2. R002 (MEDIUM): Flag inline dict literals with 3+ string keys; exception tags can require justification.
  3. R003 (MEDIUM): Flag repr=False in dataclasses (v0.2 canonical: plain @dataclass(frozen=True, slots=True) without repr=False; legacy mode available).
  4. R004 (HIGH): Demand exception tags on module-level functions (e.g., # facade — celery schedule).
  5. R005 (LOW): Encourage validators to use DTO.from_dict() pattern.
  6. R006 (HIGH): Detect typing.Any in function signatures (parameters and return types).

All rules are configurable; violations can be disabled, severity overridden, or paths scoped.

Install

pip install dto-strict

Quick Start

Basic CLI Usage

# Lint a single file
dto-strict apps/compliance/services.py

# Lint a directory
dto-strict apps/

# Output as GitHub Actions annotations
dto-strict apps/ --format github

# Output as JSON
dto-strict apps/ --format json

Configuration (pyproject.toml)

[tool.dto-strict]
service_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]
dto_paths = [
    "**/dtos.py",
    "**/dtos/*.py",
]
exception_tags = [
    "facade — celery schedule",
    "FRAMEWORK",
]
disabled_rules = ["R005"]  # Disable low-priority rules if desired
severity_overrides = { "R002" = "low" }  # Downgrade specific rules

Strict Mode (v0.2)

v0.2 introduces canonical mode alignment with modern DTO practices and strict collection detection:

[tool.dto-strict]

# R001: Catch bare dict/list/tuple without type parameters
strict_collections = true  # Default: false. When true, bare collections trigger violations.

# R002: Require justification on exception tags + configurable dict key threshold
exception_tag_requires_justification = true  # Default: false.
# Tags must now use format: "tag: explanation" (e.g., "facade — celery schedule: transient event payload")

min_dict_keys = 3  # NEW in v0.2: Threshold for R002 dict literal flagging (default: 3)

# Limit reuse of exception tags in a single file
max_exception_tags_per_file = 3  # Default: null (no limit)

# R003: Canonical mode (v0.2 default) flags repr=False as anti-canonical + strict/relaxed modes
r003_mode = "canonical"  # Default: "canonical" (v0.2). Use "legacy" for v0.1 behavior.
# In canonical mode: @dataclass(frozen=True, slots=True) is correct; repr=False is flagged.
# In legacy mode: @dataclass must include frozen=True, slots=True, AND repr=False (v0.1 requirement).

r003_strict_repr = true  # NEW in v0.2: In canonical mode, flag repr=False (default: true)
# Set to false for relaxed mode: only checks frozen+slots, ignores repr=False

# R004: NEW auto-detect class-method-wrapping pattern
# Module-level functions that delegate to class methods are now auto-detected
# (no exception tag needed; reduces false positives)

# R006: Scope typing.Any detection to specific paths
r006_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]

Baseline Ratchet Mode (v0.2)

Accept current violations as "baseline" debt and track only new violations:

# Generate baseline from current state
dto-strict apps/ --generate-baseline > .dto-strict-baseline.json

# Subsequent runs accept baseline violations; new ones trigger failure
dto-strict apps/ --baseline .dto-strict-baseline.json

Baseline tracks violations by file, line, and rule ID. When violations are fixed and removed from the codebase, the baseline can be regenerated (exit code 0 + notice on removal).

Why canonical mode? Per 2026-05-09 DTO-strict pivot, the canonical pattern is:

  • @dataclass(frozen=True, slots=True) — immutability + memory efficiency
  • NO repr=False — let repr work normally; custom __repr__ not needed
  • Store values, don't override output; if a field is PII-sensitive, use external redaction tools

GitHub Actions

Create .github/workflows/dto-strict.yml:

name: dto-strict
on:
  pull_request:
    paths: ['apps/**.py']

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.12'
      - run: pip install dto-strict
      - run: dto-strict apps/ --format github

Pre-commit Hook

Add to .pre-commit-config.yaml:

- repo: local
  hooks:
    - id: dto-strict
      name: dto-strict
      entry: dto-strict
      language: python
      types: [python]
      additional_dependencies: ['dto-strict']
      stages: [commit]

Rules

R001: Dict[str, Any] and Bare Collections in Service Signatures (HIGH)

Service-layer functions should not accept or return Dict[str, Any]. With strict_collections=true, bare dict, list, and tuple without type parameters are also flagged.

Fail (always):

def process_user(config: Dict[str, Any]) -> Dict[str, Any]:
    return {"status": "ok"}

Fail (with strict_collections=true):

def fetch_users() -> list:  # Bare list
    return []

def merge_configs(base: dict, overrides: dict) -> dict:  # Bare dicts
    return {**base, **overrides}

Pass:

from typing import Dict

@dataclass(frozen=True, slots=True)
class UserConfigDTO:
    timeout: int
    retries: int

def process_user(config: UserConfigDTO) -> Dict[str, str]:
    return {"status": "ok"}

def fetch_users() -> list[UserDTO]:  # Typed list
    return []

def merge_configs(base: dict[str, Any], overrides: dict[str, Any]) -> dict[str, Any]:  # Typed dicts
    return {**base, **overrides}

Rationale: Typed parameters enable IDE completion and catch shape mismatches early. Bare collections hide shape from static checkers and readers.


R002: Inline Dict Literals (MEDIUM)

Service files with inline dict literals containing 3+ string keys should define a DTO instead. Exception tags allow one-off inline dicts; with exception_tag_requires_justification=true, tags must include a colon-delimited explanation.

Fail (no tag):

def build_response(user_id: int) -> dict:
    return {
        "user_id": user_id,
        "status": "active",
        "timestamp": "2025-01-01",
    }

Fail (tag without justification, if required):

def build_response(user_id: int) -> dict:  # facade — celery schedule
    return {
        "user_id": user_id,
        "status": "active",
        "timestamp": "2025-01-01",
    }

Pass (with justified tag):

def build_response(user_id: int) -> dict:  # facade — celery schedule: SNS event envelope (transient)
    return {
        "user_id": user_id,
        "status": "active",
        "timestamp": "2025-01-01",
    }

Pass (define DTO instead):

@dataclass(frozen=True, slots=True)
class ResponseDTO:
    user_id: int
    status: str
    timestamp: str

def build_response(user_id: int) -> ResponseDTO:
    return ResponseDTO(user_id, "active", "2025-01-01")

Rationale: Shared shapes should live in DTOs. Inline dicts make duplication invisible. Exception tags are for rare transient payloads; they should explain why.


R003: Dataclass Canonical Form (MEDIUM)

Canonical mode (v0.2 default): Dataclasses must use frozen=True, slots=True WITHOUT repr=False.

Legacy mode (v0.1): Requires frozen=True, slots=True, repr=False.

Canonical Mode (v0.2)

Fail (canonical mode):

@dataclass(frozen=True, slots=True, repr=False)  # Anti-canonical: has repr=False
class UserDTO:
    user_id: int

Pass (canonical mode):

@dataclass(frozen=True, slots=True)
class UserDTO:
    user_id: int

@dataclass(frozen=True, slots=True)  # Both params present
class ConfigDTO:
    timeout: int

Rationale (canonical):

  • frozen=True: Immutability enforces single-source-of-truth.
  • slots=True: Memory efficiency and prevents attribute typos.
  • NO repr=False: Default repr is fine; if a field is sensitive, use external redaction (logging mixin, etc.)

Legacy Mode (v0.1)

Use r003_mode = "legacy" in pyproject.toml if your codebase still requires repr=False:

@dataclass(frozen=True, slots=True, repr=False)
class UserDTO:
    user_id: int

R004: Module-Level Functions (HIGH)

Bare module-level functions (facades, framework hooks) must carry an exception tag in a comment or docstring.

Fail:

def process_user(user_id: int):
    pass

def send_notification(message: str):
    pass

Pass:

def process_user(user_id: int):  # facade — celery schedule
    pass

def send_notification(message: str):  # FRAMEWORK
    """Send via SNS."""
    pass

class UserService:
    def process(self, user_id: int):
        # Class methods don't need tags
        pass

Exception Tags: Configurable via pyproject.toml exception_tags list.

Rationale: Facades blur intent. Tags make intent explicit and signal "this is framework-specific, not business logic."


R005: Validator Pattern (LOW)

validate_*() functions should use DTO.from_dict() or raise ValidationError to enforce payload shape.

Fail:

def validate_user_payload(payload: dict) -> bool:
    return "user_id" in payload and "email" in payload

Pass:

def validate_user_payload(payload: dict) -> UserDTO:
    try:
        user = UserDTO(
            user_id=payload["user_id"],
            email=payload["email"],
        )
        return user
    except (KeyError, TypeError) as e:
        raise ValidationError(f"Invalid shape: {e}")

Rationale: Validators should enforce structure, not just presence.


R006: typing.Any in Signatures (HIGH)

Function signatures in service files should not use typing.Any. Build a proper DTO or use narrow type protocols instead.

Fail:

from typing import Any

def process(data: Any) -> Any:  # Bad: loses all type info
    pass

def fetch_config() -> Optional[Any]:  # Bad: Any defeats Optional
    return None

Pass:

from typing import Optional, Protocol

class Readable(Protocol):
    def read(self) -> bytes:
        ...

def process(data: dict[str, str]) -> dict[str, int]:  # Properly typed
    pass

def fetch_config() -> Optional[ConfigDTO]:  # Specific type
    return None

def read_file(f: Readable) -> bytes:  # Protocol for file-like objects
    return f.read()

Rationale: Any defeats static type checking and IDE completion. It hides shape assumptions and makes refactoring dangerous. Use protocols for file-like or callback types; use DTOs for business shapes.


PHI / Sensitive Data Handling (Pattern 1)

Why R003 removed blanket repr=False: The v0.2 canonical pivot intentionally moves away from blanket repr=False as a PHI masking mechanism. Instead, use explicit __repr__ overrides on DTOs containing sensitive fields.

Pattern 1: Explicit __repr__ on Sensitive DTOs

from dataclasses import dataclass

@dataclass(frozen=True, slots=True)
class Patient:
    """Patient DTO with sensitive fields."""
    patient_id: str
    name: str
    ssn: str  # Sensitive
    date_of_birth: str  # Sensitive

    def __repr__(self) -> str:
        """Mask PHI fields in repr."""
        return f"Patient(patient_id={self.patient_id!r}, name=<redacted>, ssn=<redacted>, date_of_birth=<redacted>)"

When a Patient DTO is logged or printed, only non-sensitive fields appear:

>>> p = Patient(patient_id="P123", name="Alice", ssn="123-45-6789", date_of_birth="1990-01-01")
>>> print(p)
Patient(patient_id='P123', name=<redacted>, ssn=<redacted>, date_of_birth=<redacted>)

Why explicit over blanket?

  • Auditable: Developers explicitly decide which fields are sensitive and how to mask them.
  • Flexible: Different DTOs can have different masking strategies (redact, hash, truncate, etc.).
  • Future-proof: External tools (e.g., AWS Comprehend Medical) can be layered on top for dynamic PHI detection.
  • Healthcare / HIPAA: The combination of explicit DTOs + selective __repr__ overrides is a standard privacy-by-design pattern in regulated systems.

Suppressing Violations

Violations can be suppressed using # noqa comments. The linter recognizes:

  • # noqa — Suppress all rules on this line
  • # noqa: dto-strict — Suppress all dto-strict rules on this line
  • # noqa: dto-strict-R001 — Suppress rule R001 only
  • # noqa: dto-strict-R001, dto-strict-R002 — Suppress multiple rules

Examples:

# Suppress a Dict[str, Any] violation on a specific function
def legacy_callback(config: Dict[str, Any]) -> None:  # noqa: dto-strict-R001
    """Old API we can't change."""
    pass

# Suppress all rules on a line
def process() -> dict:  # noqa
    return {}

# Suppress just R002 (inline dict literal) violation
error_response = {  # noqa: dto-strict-R002
    "status": "error",
    "code": 500,
    "message": "Internal server error",
}

Output Formats

Text (default)

app.py:10: R001 Dict[str, Any] in signature: process_user
service.py:20: R002 Inline dict literal with 4 keys

GitHub Actions

::error file=app.py,line=10,col=5::R001 Dict[str, Any] in signature: process_user
::warning file=service.py,line=20,col=0::R002 Inline dict literal with 4 keys

JSON

[
  {
    "rule_id": "R001",
    "severity": "HIGH",
    "file": "app.py",
    "line": 10,
    "col": 5,
    "message": "Dict[str, Any] in signature: process_user"
  }
]

Exit Codes

Code Meaning
0 No violations
1 HIGH severity violations present
2 MEDIUM severity violations only
3 LOW severity violations only

Configuration Reference

[tool.dto-strict]

# Paths to check for service-layer violations (R001, R002, R004, R006)
# Default: ["apps/*/services/*.py", "**/services/*.py"]
service_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]

# Paths to check for DTO definitions (R003)
# Default: ["**/dtos.py", "**/dtos/*.py"]
dto_paths = [
    "**/dtos.py",
    "**/dtos/*.py",
]

# Paths for R006 (typing.Any detection)
# Default: ["apps/*/services/*.py", "**/services/*.py"]
r006_paths = [
    "apps/*/services/*.py",
    "**/services/*.py",
]

# Allowed exception tags for R004 (module-level facades)
# Default: ["facade — celery schedule", "FRAMEWORK"]
exception_tags = [
    "facade — celery schedule",
    "FRAMEWORK",
    "CUSTOM_TAG",
]

# (v0.2) Bare dict/list/tuple without type parameters flagged as violations
# Default: false
strict_collections = true

# (v0.2) Exception tags must include colon-delimited justification
# Default: false
exception_tag_requires_justification = true

# (v0.2) Maximum exception tags per file (null = unlimited)
# Default: null
max_exception_tags_per_file = 3

# (v0.2) R003 mode: "canonical" (v0.2 default) or "legacy" (v0.1)
# In canonical: repr=False is anti-canonical and flagged
# In legacy: frozen=True, slots=True, repr=False all required
# Default: "canonical"
r003_mode = "canonical"

# Disable specific rules entirely
# Default: []
disabled_rules = ["R005"]

# Override severity for specific rules
# Valid values: "HIGH", "MEDIUM", "LOW"
# Default: {}
severity_overrides = {
    "R002" = "low",
}

Design Philosophy

Pluggable, not opinionated. Every rule is:

  • Configurable: Path patterns, exception tags, severity levels.
  • Disable-able: Set disabled_rules = ["R001"] to skip it entirely.
  • Framework-agnostic: No Django/FastAPI/Flask assumptions; adapters for each framework are opt-in extras.

Defaults bundled, not imposed. Out-of-the-box rules target Django + DRF + Celery patterns, but you can customize for your stack.

Development

git clone https://github.com/jekhator/dto-strict.git
cd dto-strict
python3 -m venv .venv && source .venv/bin/activate
pip install -e .[dev]

# Run tests
pytest tests/ -v

# Run linter on itself
dto-strict src/ --format github

License

Apache License 2.0. See LICENSE.

Contributing

Issues and PRs welcome. Please include fixtures (good + bad examples) for new rules.

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dto_strict-0.2.2.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dto_strict-0.2.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file dto_strict-0.2.2.tar.gz.

File metadata

  • Download URL: dto_strict-0.2.2.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dto_strict-0.2.2.tar.gz
Algorithm Hash digest
SHA256 a9995bdbf21920348cce2e72122c18e7b56ecaa257d1902b16513448b00e0a32
MD5 cabfdb78161c663750ea88273f4f373e
BLAKE2b-256 e76a4d0f7534e3c6da0eb8bde80dc412d1a6acbf1c30a86dcd32edf6a54ee9b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dto_strict-0.2.2.tar.gz:

Publisher: publish.yml on jekhator/dto-strict

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dto_strict-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: dto_strict-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dto_strict-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fd346900cc6a6477e22be4cd04c984235f3133132dd43e646fc875573fa579bc
MD5 881cd87d4fe2146ab1fa827530575c18
BLAKE2b-256 fbec5818a5f1882721113fe178279e1edd5cdf01e779cb089fe90530e45ea46e

See more details on using hashes here.

Provenance

The following attestation bundles were made for dto_strict-0.2.2-py3-none-any.whl:

Publisher: publish.yml on jekhator/dto-strict

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page