A dictionary that de-duplicates values.
Project description
DeDuplicationDict
A dictionary that de-duplicates values.
A dictionary-like class that deduplicates values by storing them in a separate dictionary and replacing them with their corresponding hash values. This class is particularly useful for large dictionaries with repetitive entries, as it can save memory by storing values only once and substituting recurring values with their hash representations.
This class supports nested structures by automatically converting nested dictionaries into
DeDuplicationDict
instances. It also provides various conversion methods to convert between regular
dictionaries and DeDuplicationDict
instances.
Installation
pip install deduplicationdict
Usage
from deduplicationdict import DeDuplicationDict
# Create a new DeDuplicationDict instance
dedup_dict = DeDuplicationDict.from_dict({'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})
# or
dedup_dict = DeDuplicationDict(**{'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})
# Add a new duplicate key-value pair
dedup_dict['d'] = [1, 2, 3]
dedup_dict['e'] = [1, 2, 3]
# Print the dictionary
print(f"dedup_dict.to_dict(): {dedup_dict.to_dict()}")
# output: {'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7], 'd': [1, 2, 3], 'e': [1, 2, 3]}
# Print the deduplicated dictionary internal
print(f"dedup_dict.key_dict: {dedup_dict.key_dict}")
# output: {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}
print(f"dedup_dict.value_dict: {dedup_dict.value_dict}")
# output: {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}
# Print the deduplicated dictionary
print(f"to_json_save_dict: {dedup_dict.to_json_save_dict()}")
# output: {'key_dict': {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}, 'value_dict': {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}}
assert dedup_dict["a"] == [5, 6, 7]
assert dedup_dict["b"] == 2
assert dedup_dict["c"] == [5, 6, 7]
assert dedup_dict["d"] == [1, 2, 3]
assert dedup_dict["e"] == [1, 2, 3]
assert DeDuplicationDict.from_json_save_dict(dedup_dict.to_json_save_dict()).to_dict() == dedup_dict.to_dict()
Usage with SqliteDict: SqliteDeDuplicationDict.py
Results from Testing
Method | JSON Memory (MB) | In-Memory (MB) |
---|---|---|
dict |
14.089 MB | 27.542 MB |
DeDuplicationDict |
1.7906 MB | 3.806 MB |
Memory Saving | 7.868x | 7.235x |
Documentation
The documentation for this project is hosted on Read the Docs.
License
This project is licensed under the terms of the Mozilla Public License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file deduplicationdict-1.0.4.tar.gz
.
File metadata
- Download URL: deduplicationdict-1.0.4.tar.gz
- Upload date:
- Size: 11.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 412cba02a591d04ffc958a060dcda6a58e1b6562151b485e36d44a643f2c6c6d |
|
MD5 | e93b8f7290103db3aeebaa38b59e4208 |
|
BLAKE2b-256 | fc1cd3c7bba92dc5572cc1f50e0ed1a88f82b11b8ef1c318f66b5728b07b027e |
File details
Details for the file deduplicationdict-1.0.4-py3-none-any.whl
.
File metadata
- Download URL: deduplicationdict-1.0.4-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 81ed7d41d18f78a241ef426d3b24c6a6dd00ac6ffa642cc0e8fd2eddb673e3b8 |
|
MD5 | 52b52f1710985ad70367ec6f5210f486 |
|
BLAKE2b-256 | b876835d737a1443b956f075d11f0aa1213c27ef634a3261913a71350791652f |