Skip to main content

A dictionary that de-duplicates values.

Project description

DeDuplicationDict

PyPI version Python package Documentation Status Python License: MPL 2.0

Github

A dictionary that de-duplicates values.

A dictionary-like class that deduplicates values by storing them in a separate dictionary and replacing them with their corresponding hash values. This class is particularly useful for large dictionaries with repetitive entries, as it can save memory by storing values only once and substituting recurring values with their hash representations.

This class supports nested structures by automatically converting nested dictionaries into DeDuplicationDict instances. It also provides various conversion methods to convert between regular dictionaries and DeDuplicationDict instances.

Installation

pip install deduplicationdict

Usage

from deduplicationdict import DeDuplicationDict

# Create a new DeDuplicationDict instance
dedup_dict = DeDuplicationDict.from_dict({'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})
# or
dedup_dict = DeDuplicationDict(**{'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7]})

# Add a new duplicate key-value pair
dedup_dict['d'] = [1, 2, 3]
dedup_dict['e'] = [1, 2, 3]

# Print the dictionary
print(f"dedup_dict.to_dict(): {dedup_dict.to_dict()}")
# output: {'a': [5, 6, 7], 'b': 2, 'c': [5, 6, 7], 'd': [1, 2, 3], 'e': [1, 2, 3]}

# Print the deduplicated dictionary internal
print(f"dedup_dict.key_dict: {dedup_dict.key_dict}")
# output: {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}
print(f"dedup_dict.value_dict: {dedup_dict.value_dict}")
# output: {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}

# Print the deduplicated dictionary
print(f"to_json_save_dict: {dedup_dict.to_json_save_dict()}")
# output: {'key_dict': {'a': '7511debb', 'b': '7c7ad8f0', 'c': '7511debb', 'd': 'f9343d7d', 'e': 'f9343d7d'}, 'value_dict': {'7511debb': [5, 6, 7], '7c7ad8f0': 2, 'f9343d7d': [1, 2, 3]}}

assert dedup_dict["a"] == [5, 6, 7]
assert dedup_dict["b"] == 2
assert dedup_dict["c"] == [5, 6, 7]
assert dedup_dict["d"] == [1, 2, 3]
assert dedup_dict["e"] == [1, 2, 3]
assert DeDuplicationDict.from_json_save_dict(dedup_dict.to_json_save_dict()).to_dict() == dedup_dict.to_dict()

Usage with SqliteDict: SqliteDeDuplicationDict.py

Results from Testing

Method JSON Memory (MB) In-Memory (MB)
dict 14.089 MB 27.542 MB
DeDuplicationDict 1.7906 MB 3.806 MB
Memory Saving 7.868x 7.235x

dict vs DeDuplicationDict

Documentation

The documentation for this project is hosted on Read the Docs.

License

This project is licensed under the terms of the Mozilla Public License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deduplicationdict-1.0.4.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

deduplicationdict-1.0.4-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file deduplicationdict-1.0.4.tar.gz.

File metadata

  • Download URL: deduplicationdict-1.0.4.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.16

File hashes

Hashes for deduplicationdict-1.0.4.tar.gz
Algorithm Hash digest
SHA256 412cba02a591d04ffc958a060dcda6a58e1b6562151b485e36d44a643f2c6c6d
MD5 e93b8f7290103db3aeebaa38b59e4208
BLAKE2b-256 fc1cd3c7bba92dc5572cc1f50e0ed1a88f82b11b8ef1c318f66b5728b07b027e

See more details on using hashes here.

File details

Details for the file deduplicationdict-1.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for deduplicationdict-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 81ed7d41d18f78a241ef426d3b24c6a6dd00ac6ffa642cc0e8fd2eddb673e3b8
MD5 52b52f1710985ad70367ec6f5210f486
BLAKE2b-256 b876835d737a1443b956f075d11f0aa1213c27ef634a3261913a71350791652f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page