Skip to main content

去重过滤器,提供常见的去重方案,开发便捷、性能极高。

Project description

简介

去重过滤器,提供常见的去重方案,开发便捷、性能极高。

去重方案

种类 去重方案 说明 特点 缺点
Memory MemoryFilter 基于内存集合类型实现 准确性高 不能持久化
File FileFiler 基于文件+集合类型实现 准确性高 本地内存和存储占用大
Redis RedisBloomFilter
AsyncRedisBloomFilter
基于Redis Bitmap和布隆过滤器算法实现 占用内存极小 有误判的情况且不容易删除元素,若要删除可随机删除
RedisStringFilter
AsyncRedisStringFilter
基于Redis String数据结构实现 不会误判,能基于过期时间实现查询去重和确认机制 占用资源很大,需尽可能压缩和设置过期时间
RedisSetFilter
AsyncRedisSetFilter
基于Redis Set数据结构实现 不会误判,占用内存相对较少 不易删除元素,若要删除可随机删除

项目特点

  1. 多种方案提供不同场景需求。
  2. 基于Lua脚本支持批量操作,速度快。
  3. 支持异步,可快速集成到异步代码和异步框架中。

代码示例

RedisBloomFilter

import redis
from dupfilter import RedisBloomFilter

server = redis.Redis(host="127.0.0.1", port=6379)
rbf = RedisBloomFilter(server=server, key="bf", block_num=2)
print(rbf.exists_many(["1", "2", "3"]))
rbf.insert_many(["1", "2", "3"])
print(rbf.exists_many(["1", "2", "3"]))

AsyncRedisBloomFilter

import asyncio
import aioredis
from dupfilter import AsyncRedisBloomFilter


async def test():
    server = aioredis.from_url('redis://127.0.0.1:6379/0')
    arbf = AsyncRedisBloomFilter(server, key='bf')
    stats = await arbf.exists_many(["1", "2", "3"])
    print(stats)
    await arbf.insert_many(["1", "2", "3"])
    stats = await arbf.exists_many(["1", "2", "3"])
    print(stats)


loop = asyncio.get_event_loop()
loop.run_until_complete(test())

Others

和上述示例类似

关于作者

  1. 邮箱:1194542196@qq.com
  2. 微信:hu1194542196

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dupfilter-0.0.1.tar.gz (8.6 kB view hashes)

Uploaded Source

Built Distribution

dupfilter-0.0.1-py3-none-any.whl (10.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page