Skip to main content

A simple collection type for homogeneous, immutable and ordered sequences.

Project description

PureSet

PureSet

Python Version License Version

PureSet is an immutable, ordered, and hashable collection type for Python. It ensures type homogeneity across elements, making it a robust replacement for both sets and sequences in production applications. PureSet offers accuracy, predictability, and clarity in managing homogeneous data structures.


Core Features

  • Immutability: Elements cannot be changed after creation; assures data integrity.
  • Ordering: Retains insertion sequence, making it predictable for iteration, exporting, or display use cases.
  • Hashability: Collections of hashable objects are themselves hashable; can be used as dictionary keys.
  • Uniqueness: Removes duplicates according to standard Python object equality.
  • Type and Schema Homogeneity: Strict enforcement that all elements are of not only the same type, but also of the same shape (for dicts and custom objects—by attribute/property names and types).
  • Performance: Optimized for high efficiency in membership, intersection, union, and set-like operations.
  • Signature Inspection: Provides a .signature property representing the canonical type/structure of the set’s contents, critical for debugging, API contracts, and documentation.
  • Universal Container: Works seamlessly with primitives, tuples, dicts, custom classes, and even mixed nested containers.

Installation & Requirements

To install the PureSet package, simply use pip:

pip install pureset
  • Python Versions: Compatible with Python 3.9 and above.
  • Dependencies: Pure Python, with no external dependencies.

API Overview

This section presents expanded, realistic examples of PureSet in production-grade scenarios, demonstrating its capabilities beyond simple collections.

Real-World Usability

  • Contracts in APIs: Require or emit only valid structures to callers; enforce contract at runtime.
  • Data Pipelines (ETL): Guarantee all records are clean, normalized, and of valid shape before aggregation or transformation.
  • State Machines: Prevent illegal state transitions by checking membership in a PureSet of allowed values.
  • Unique Entity Sets: Model deduplicated entities (users, objects, configurations) with order preserved and structure enforced.
  • Distributed Computing: Share, serialize, or hash-combine validated and immutable data blocks across processes or systems.

PureSet’s .signature is especially useful for audits, logging, debugging mismatches, and can be serialized for external schema verification.

from pureset import PureSet

1. Robust Enum Replacement and State Management

PureSet provides a type-safe, ordered, and immutable alternative for defining a finite set of states or options, offering clear advantages over traditional string literals or basic tuples. It's particularly useful for defining state machine transitions or valid configuration options.

# Define a set of valid order states for an e-commerce system
# The order guarantees a predictable sequence for UI display or reporting.
ORDER_STATES = PureSet("Pending", "Processing", "Shipped", "Delivered", "Cancelled")

def process_order_status_update(order_id: str, new_status: str) -> None:
    if new_status not in ORDER_STATES:
        raise ValueError(
            f"Invalid order status '{new_status}' for order {order_id}.\n"
            f"Allowed states are: [{ORDER_STATES.join(' | ')}]"
        )

    # In a real system, this would interact with a database or external service
    print(f"Order {order_id}: Status updated to '{new_status}'.")


# Simulate a valid status update
process_order_status_update("ORD12345", "Shipped")

# Simulate an invalid status update
try:
    process_order_status_update("ORD12346", "Returned")
except ValueError as e:
    print(e)
    # Invalid order status 'Returned' for order ORD12346. 
    # Allowed states are: [Pending | Processing | Shipped | Delivered | Cancelled]

2. Validating Homogeneity and Schema Consistency for Complex Data Structures

When dealing with collections of dictionaries or custom objects in data processing pipelines or API interactions, ensuring all elements conform to a specific schema is paramount. PureSet enforces not just type homogeneity but also structural consistency, raising errors for schema mismatches. NOTE: PureSet always refers to the first element as a validator of all other elements given afterwards. You can always check the validator schema by using the .signature property.

# Define a PureSet of user profiles, each represented by a dictionary.
# PureSet ensures all dictionaries have the same keys and value types.
user_profiles = PureSet(
    {"id": 1, "name": "Alice Smith", "age": 28, "email": "alice@example.com"},
    {"id": 2, "name": "Bob Johnson", "age": 35, "email": "bob@example.com"},
)

# Attempt to add a profile with a mismatched schema (e.g., missing 'email' or different key)
try:
    mismatched_profiles = PureSet(
        {"id": 3, "name": "Charlie Brown", "age": 42, "email": "charlie@example.com"},
        {"id": 4, "name": "Diana Prince", "years_old": 30},  # Schema mismatch
    )
except TypeError as e:
    print(e)
    # Incompatible element type or shape at position 2:
    # Exp: (<class 'dict'>, {'age': <class 'int'>, 'email': <class 'str'>, 'id': <class 'int'>, 'name': <class 'str'>});
    # Got: (<class 'dict'>, {'id': <class 'int'>, 'name': <class 'str'>, 'years_old': <class 'int'>})


# Example with nested tuples: PureSet enforces consistency for tuples with consistent internal types.
data_points = PureSet((1, "x_coord", 10.5), (2, "y_coord", 20.3))

# Attempt to create a PureSet with inconsistent tuple element types
try:
    invalid_data_points = PureSet(
        (1, "x_coord", 10.5),
        (2, "y_coord", "invalid_value"),  # Type mismatch within tuple
    )
except TypeError as e:
    print(e)
    # Incompatible element type or shape at position 2:
    # Exp: (<class 'tuple'>, (<class 'int'>, <class 'str'>, <class 'float'>));
    # Got: (<class 'tuple'>, (<class 'int'>, <class 'str'>, <class 'str'>))

6. Layer Validation in ML/DL Model Pipelines or Validation of Nested Containers

Handling sequences, matrix input, or data layer validation:

batch = PureSet(
    ([1.4, 2.8, 3.1], 'class_a'),
    ([0.9, 2.2, 3.5], 'class_b'),
)
print(batch.signature)
# Output: (tuple, ([float, float, float], str))

Testing


License

This project is released under the Apache License 2.0. Please review the LICENSE file for further details.


PureSet is engineered to give your Python data code the safety, transparency, and power required for production-scale scenarios—across API, analytics, ML, and system development!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pureset-1.0.250704.2.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pureset-1.0.250704.2-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file pureset-1.0.250704.2.tar.gz.

File metadata

  • Download URL: pureset-1.0.250704.2.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.9

File hashes

Hashes for pureset-1.0.250704.2.tar.gz
Algorithm Hash digest
SHA256 ddb79b584054fe7e7559534e25ad43f39a55d05d61af9fb535781cf93c363e9c
MD5 0175e93034f948357db9ed636b9b3a7d
BLAKE2b-256 5ed80441f9d84e14b529e59e8a3f850032785c525c3e050297ade98ae51f92f1

See more details on using hashes here.

File details

Details for the file pureset-1.0.250704.2-py3-none-any.whl.

File metadata

  • Download URL: pureset-1.0.250704.2-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.9

File hashes

Hashes for pureset-1.0.250704.2-py3-none-any.whl
Algorithm Hash digest
SHA256 daed1f0da37c0bf9009a2a9f947b7d2bbfe2ee9d8d05c2a1247cbeee61003659
MD5 d13d62abe646bcc5cdfea23bc4cceab9
BLAKE2b-256 89bfcf2d56ab4ed468ea9c7507a179590bec46d0041af52b684df85cbe6d97a2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page