图像去重工具包.
Project description
ImageDeduplication
图像去重
项目地址:https://github.com/firstelfin/ImageDeduplication PYPI地址:https://pypi.org/project/imgDedup/
自定义去重数据HashCode
数据类,包含了success、value(phash的hex值)、error、img_path属性。集成了__sub__、__repr__、__bool__、to_dict方法。这是去重的基本数据结构。
去重核心算法-phash
代码路径:
imgDedup/tools/imageFingerprint.py
imageFingerprint:get_phash 基于imagehash.phash实现了感知哈希算法,计算图像的指纹。函数的输入可以是图像路径,也可以是图像ndarray。
去重管线1--单个数据集去重
代码路径:
imgDedup/utils/deduplication.py
deduplication:SelfDeduplication 类实现了单个数据集的去重。去重逻辑是并发加载每个图片的HashCode, 然后初始化一个保存的空列表,循环这些HashCode,如果HashCode与列表中的HashCode都不相似,则将此HashCode加入列表。最后返回列表中的HashCode。
使用案例:
>>> sd = SelfDeduplication(
... src_dir=Path(f"xxxx"),
... dst_dir=Path(f"xxxxx"),
... use_link=True,
... threshold=5,
... hash_size=16
... )
>>> sd(save_json_path=Path(f"xxx/dedup测试/status/deduplication_record.json"))
去重管线2--多个数据集去重
代码路径:
imgDedup/utils/deduplication.py
deduplication:CrossDatasetDeduplication 类实现了多个数据集的去重。去重逻辑是并发加载每个数据集的图片的HashCode, 然后初始化一个保存的空列表,循环这些HashCode,如果HashCode与列表中的HashCode都不相似,则将此HashCode加入列表。最后返回列表中的HashCode。
使用案例:
>>> mdl = MultiDeduplication(
... src_dir=Path("xxx/xxxx_deduplication_record.json"),
... dst_dir=Path("xxxxx/images"),
... targets=[
... Path("xxxxxx/images"),
... Path("aaaaaa/images"),
... Path("ssssss/images"),
... Path("wwwwww/images"),
... Path("ffffff/xxxxxx_deduplication_record.json"),
... Path("ccccccc/eeeeee_deduplication_record.json"),
... ],
... threshold=26,
... use_link=False,
... hash_size=16
... )
... mdl(save_json_path=Path("xxxss--202508_deduplication_record.json"))
Install
源码安装:
>>> git clone https://github.com/firstelfin/ImageDeduplication.git
>>> cd &&pip install .
通过PYPI安装:
>>> pip install imgDedup
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imgdedup-1.0.0.tar.gz.
File metadata
- Download URL: imgdedup-1.0.0.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d731c43123e766b484469b7621e5bf0aeb2cb4d90ae31f75b860eaa80bca6f1
|
|
| MD5 |
c3c5a07ca2f317abe1f0cbce0e9c9383
|
|
| BLAKE2b-256 |
9de8a783fb51883e2442e5992586e30f460a51192d90b7ca6f90123df796d3b7
|
File details
Details for the file imgdedup-1.0.0-py3-none-any.whl.
File metadata
- Download URL: imgdedup-1.0.0-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d8c771f7d203bdc1e736f6c7b0f3d608afb1466d1f77b12cf263a8c724686f4
|
|
| MD5 |
5158fe317b918638a21752fd38a7cc7f
|
|
| BLAKE2b-256 |
86f768fae34d393e658c62c9f0866761c3e333711b5aa3d66f0712cf363f4e45
|