去重过滤器,提供常见的去重方案,开发便捷、性能极高。
Project description
简介
去重过滤器,提供常见的去重方案,开发便捷、性能极高。
去重方案
种类 | 去重方案 | 说明 | 特点 | 缺点 | 置出方案 |
---|---|---|---|---|---|
Memory | MemoryFilter | 基于内存集合类型实现 | 准确性高 | 不能持久化 | 随机删除 |
File | FileFiler | 基于文件+集合类型实现 | 准确性高 | 本地内存和存储占用大 | 利用文件指针区间删除 |
Redis | RedisBloomFilter AsyncRedisBloomFilter |
基于Redis Bitmap和布隆过滤器算法实现 | 占用内存极小 | 有误判的情况且不容易删除元素 | 随机删除 |
RedisStringFilter AsyncRedisStringFilter |
基于Redis String数据结构实现 | 不会误判,能基于过期时间实现查询去重和确认机制 | 占用资源很大,需尽可能压缩和设置过期时间 | 设置过期时间 | |
RedisSetFilter AsyncRedisSetFilter |
基于Redis Set数据结构实现 | 准确性高 | 占用资源较大 | 随机删除 | |
RedisSortedSetFilter AsyncRedisSortedSetFilter |
基于Redis SortedSet数据结构实现 | 准确性高 | 占用资源较大 | 根据分值删除 | |
SQL | SQLFilter | 基于SQL关系数据库表主键来实现 | 准确性高 | 在大规模去重场景性能差 | 按时间删除 |
项目特点
- 多种方案提供不同场景需求。
- 基于Lua脚本支持批量操作,速度快。
- 支持异步,可快速集成到异步代码和异步框架中。
去重示例
RedisBloomFilter
import redis
from dupfilter import RedisBloomFilter
server = redis.Redis(host="127.0.0.1", port=6379)
rbf = RedisBloomFilter(server=server, key="bf", block_num=2)
print(rbf.exists_many(["1", "2", "3"]))
rbf.insert_many(["1", "2", "3"])
print(rbf.exists_many(["1", "2", "3"]))
AsyncRedisBloomFilter
import asyncio
import aioredis
from dupfilter import AsyncRedisBloomFilter
async def test():
server = aioredis.from_url('redis://127.0.0.1:6379/0')
arbf = AsyncRedisBloomFilter(server, key='bf')
stats = await arbf.exists_many(["1", "2", "3"])
print(stats)
await arbf.insert_many(["1", "2", "3"])
stats = await arbf.exists_many(["1", "2", "3"])
print(stats)
loop = asyncio.get_event_loop()
loop.run_until_complete(test())
DefaultFilter
在项目中,可能在外层参数确认是否走去重逻辑,这时为了方法的逻辑一致性,预留默认去重类。
from dupfilter import MemoryFilter
from dupfilter import DefaultFilter
is_dup = True # 全局设置是否去重
if is_dup:
flr = MemoryFilter()
else:
flr = DefaultFilter(default_stat=False)
print(flr.exists("1"))
FilterCounter
对去重结果进行统计判断
from dupfilter import MemoryFilter
from dupfilter import FilterCounter
flt = MemoryFilter()
flt_counter = FilterCounter()
values = ['1', '2', '3']
for value in values:
flt_counter.insert_stat(flt.exists(value))
# 进行判断和统计
print(flt_counter.any(), flt_counter.all(), flt_counter.count())
Others
和上述示例类似
相关库
- redis:redis/aioredis
- mysql:pymysql/aiomysql
- sqlite:sqlite3
- oracle:cx_Oracle/cx_Oracle_async
后续优化
- 部分去重方案的重置逻辑完善
关于作者
- 邮箱:1194542196@qq.com
- 微信:hu1194542196
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dupfilter-0.0.5.tar.gz
(13.1 kB
view details)
Built Distribution
dupfilter-0.0.5-py3-none-any.whl
(17.7 kB
view details)
File details
Details for the file dupfilter-0.0.5.tar.gz
.
File metadata
- Download URL: dupfilter-0.0.5.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
5124bfc61b83fc14898d7d92f89ced9af360732c6681ca652f1954b9a429ffb2
|
|
MD5 |
66d7873adff5bc23eeaa24be5eabd110
|
|
BLAKE2b-256 |
f96465b08506342f1edff2904a708441074e94279a71410ca5abc63b78354a06
|
File details
Details for the file dupfilter-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: dupfilter-0.0.5-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
93d879807ea9c60781274e7400db2f6f84de86ae93cb04efb41f01d63ecf7c03
|
|
MD5 |
a98612319b6170df5552b3347bb924c2
|
|
BLAKE2b-256 |
defaa7fb296fa9e4e2bb78458a4d74e7ae1271605528beae26c9be5bf9a9b6a1
|