A simple collection type for homogeneous, immutable and ordered sequences.
Project description
PureSet
For general Python development matters, being this package or any, contact me at
gabrielmaia.silva00@gmail.com
PureSet is an immutable, ordered, and hashable collection type for Python.
It ensures type homogeneity across elements, making it a robust replacement for both sets and sequences in production applications.
PureSet offers accuracy, predictability, and clarity in managing homogeneous data structures.
v1.1 NOTE: Now with Numpy and Pandas data type support (see below)! Check changelog (TBA) for details.
Core Features
- Immutability: Elements cannot be changed after creation; assures data integrity and reproducibility.
- Ordering: Retains insertion sequence—predictable for iteration, exporting, or display use cases.
- Hashability: Collections of hashable (and even nested) objects are themselves hashable; can be dictionary keys.
- Uniqueness: Removes duplicates according to standard Python object equality.
- Deep Type & Schema Homogeneity: Strict enforcement that all elements are of the same type and "shape" (for nested dicts, arrays, pandas or custom classes: attributes/properties and value types are all enforced).
- Performance: Optimized for high efficiency in membership, intersection, union, and set-like operations—even with very large sets.
- Signature Inspection:
.signatureproperty represents the canonical type/structure of the set’s contents, for debugging, documentation, and dynamic runtime/schema checks. - Universal Container: Works seamlessly with primitives, tuples, dicts, custom classes, numpy arrays, pandas DataFrames/Series, UserString/UserList/etc., and even many mixed nested containers.
- Extensible: Transparent support for new types via the "freeze/restore" protocol.
- Serialization Ready: Supports pickling, as well as custom freeze/restore for efficient export/import (including cross-version/cross-platform).
- Advanced API: Full set operations (
|,&,-,^), mapping/filtering, slices, composition, custom schema validation patterns, and more.
Installation & Requirements
To install the latest PureSet package, use pip:
pip install -U pureset
- Python Versions: Compatible with Python 3.9 and above.
- Dependencies: Pure Python, no required dependencies. Numpy|Pandas are optional for enhanced functionality.
Usage & API Overview
This section presents realistic, production-focused examples that go well beyond simple unique containers.
Basic Example Usage
>>> from pureset import PureSet
>>> PureSet(1, 2, 3)
PureSet(1, 2, 3)
>>> PureSet(1, 2, 2, 3)
PureSet(1, 2, 3)
>>> PureSet("a", "b", "b")
PureSet('a', 'b')
>>> len(PureSet(8, 8, 9))
2
Robust Enum Replacement | State Management
Type-safe, ordered, and immutable replacement for sets of valid states/options.
>>> ORDER_STATES = PureSet("Pending", "Processing", "Shipped", "Delivered", "Cancelled")
>>> "Processing" in ORDER_STATES
True
>>> "Returned" in ORDER_STATES
False
>>> print(ORDER_STATES)
PureSet('Pending', 'Processing', 'Shipped', 'Delivered', 'Cancelled')
Contracts & API Schema Checking
PureSet as a runtime type-and-shape schema enforcer.
>>> user_profiles = PureSet(
... {"id": 1, "name": "Alice Smith", "age": 28, "email": "alice@example.com"},
... {"id": 2, "name": "Bob Johnson", "age": 35, "email": "bob@example.com"},
... )
>>> user_profiles.signature
(<class 'dict'>, {'age': <class 'int'>, 'email': <class 'str'>, 'id': <class 'int'>, 'name': <class 'str'>})
>>> # Mismatched schema!
>>> PureSet(
... {"id": 1, "name": "Alice", "age": 28, "email": "alice@a.com"},
... {"id": 2, "name": "Bob", "years_old": 35} # will fail!
... )
Traceback (most recent call last):
...
TypeError: Incompatible element type or shape at position 2:
Exp: (<class 'dict'>, {'age': <class 'int'>, 'email': <class 'str'>, 'id': <class 'int'>, 'name': <class 'str'>});
Got: (<class 'dict'>, {'id': <class 'int'>, 'name': <class 'str'>, 'years_old': <class 'int'>})
Validated Nested Data for ML|DL Pipelines
Reliable, transparent structure-checking for data with deep/complex layout.
>>> batch = PureSet(
... ([1.4, 2.8, 3.1], 'class_a'),
... ([0.9, 2.2, 3.5], 'class_b'),
... )
>>> batch.signature
(<class 'tuple'>, ((<class 'list'>, (<class 'float'>, 3)), <class 'str'>))
Deduplication and Set Algebra
Entries are always unique, preserving original order.
>>> a = PureSet(1, 2, 3)
>>> b = PureSet(3, 4, 2)
>>> (a | b).to_list()
[1, 2, 3, 4]
>>> (a & b).to_list()
[2, 3]
>>> (a - b).to_list()
[1]
>>> (a ^ b).to_list()
[1, 4]
Using PureSet with Numpy and Pandas
>>> import numpy as np, pandas as pd
>>> arr = np.array([1, 2, 3])
>>> ps = PureSet(arr)
>>> ps[0].shape
(3,)
>>> df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
>>> PureSet(df)[0].equals(df)
True
>>> idx = pd.Index([5, 7, 9])
>>> PureSet(idx)[0].equals(idx)
True
Mixing non-matching types (ndarray with list/tuple, DataFrame with list) will raise a TypeError.
Complex Custom Objects, NamedTuples, UserList, UserDict
>>> from collections import namedtuple, UserList, UserDict
>>> Pt = namedtuple("Pt", "x y")
>>> PureSet(Pt(2,3), Pt(3,4))[1]
Pt(x=3, y=4)
>>> ul = UserList([1,2,3])
>>> PureSet(ul)[0]
[1, 2, 3]
>>> ud = UserDict({'foo': 99})
>>> PureSet(ud)[0]
{'foo': 99}
Freeze/Restore: Reliable, Deep Immutability and Serialization
>>> x = [{'a': [1, 2]}, {'a': [3, 4]}]
>>> frozen = PureSet.freeze(x)
>>> PureSet.restore(frozen)
[{'a': [1, 2]}, {'a': [3, 4]}]
Advanced Features and Extensibility
- Rich Set Algebra:
|,&,-,^ops (union, intersection, difference, symmetric difference). - Slicing and Indexing: Supports all Pythonic sequence semantics, including negative and slice indexing.
- Compatibility Checking:
.compatible(other)method ensures two sets are structurally equivalent before combining/operating. - Signature Inspection:
.signatureprovides a Python-type-based schema, invaluable for API contracts, docs, and dynamic validation. - Freeze/Restore API: PureSet can be losslessly frozen to a hashable representation, and restored—even across Python versions.
- Protocol for New Types: Pluggable mechanism for custom freeze/restore for advanced user classes, numpy, pandas, and beyond.
- Mixes with UserString, Counter, ChainMap, deque, array.array, memoryview, and more (see full list in docs/tests).
Performance and Scalability
- Highly optimized for large scale: construction, lookup, and set algebra achieve competitive performance even for sets of tens of millions of elements.
- Performance gap to built-in set is logarithmically bounded (see docs for latest benchmarks).
- PureSet’s internal caching and O(1) hash-based fast paths guarantee speed for all practical workloads.
Testing
v1.1.250706.0: 56 tests; 0 Failures; 0 Errors
The current testing suite is only available through the GitHub repository.
- Full test suite includes:
- Edge cases for numpy, pandas, UserDict/UserList/UserString, Counter, deque, ChainMap
- Deeply nested and empty structures, custom and standard containers
- Type and schema enforcement for real-world mixed and homogeneous datasets
- Serialization and "restoration" safety
License
This project is released under the Apache License 2.0. Please review the LICENSE file for further details.
PureSet is engineered to give your Python code safety, consistency, integrity and high power for production-scale scenarios across APIs, analytics, ML, and beyond!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pureset-1.1.250706.0.tar.gz.
File metadata
- Download URL: pureset-1.1.250706.0.tar.gz
- Upload date:
- Size: 21.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c4fd14e6da88b2b6852b07d1d359d0e63a59c13dde41d68c132aa64576f126a
|
|
| MD5 |
4a2cef0967a249b3848c22f503c9b84f
|
|
| BLAKE2b-256 |
d06f68fade0126c054bf3a9049ca3696aea0dc55f670b8e0de5a82528386fd9c
|
File details
Details for the file pureset-1.1.250706.0-py3-none-any.whl.
File metadata
- Download URL: pureset-1.1.250706.0-py3-none-any.whl
- Upload date:
- Size: 19.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
768a484920408ffd4f2793825372cfd7ba0c0144f05a0f79da269670fcb7bbd5
|
|
| MD5 |
b746456d37aca7fdb7b434de1f33995c
|
|
| BLAKE2b-256 |
f6494d09f72e2cb2560917cd55834c412262483357c983b6bddc8b7ca1ff0d2f
|