Skip to main content

Inverted Index using efficient Redis set

Project description

Redis-index: Inverted Index using efficient Redis set

Redis-index helps to delegate part of the work from database to cache. It is useful for highload projects, with complex serach logic underneath the hood.

Build Status codecov License Code style: black PyPI

Introduction

Suppose you have to implement a service that will fetch data for a given set of filters.

GET /api/companies?region=US&currency=USD&search_ids=233,816,266,...

Filters may require a significant costs for the database: each of them involves joining multiple tables. By writing a solution on raw SQL, we have a risk of stumbling into database performance.

Such "heavy" queries can be precalculated, and put into redis SET. We can intersect the resulting SETs with each other, thereby greatly simplifying our SQL.

search_ids = {233, 816, 266, ...}
us_companies_ids = {266, 112, 643, ...}
usd_companies_ids = {816, 54, 8395, ...}

filtered_ids = search_ids & us_companies_ids & usd_companies_ids  # intersection
...
"SELECT * from companies whrere id in {filtered_ids}"

But getting such precalculated SETS from Redis to Python memory could be another bottleneck: filters can be really large, and we don't want to transfer a lot of data between servers.

The solution is intersect these SETs directly in redis. This is exactly what redis-index library does.

Installation

Use pip to install redis-index.

pip install redis-index

Usage

  1. Declare your filters. They must inherit BaseFilter class.
from redis_index import BaseFilter

class RegionFilter(BaseFilter):

    def get_ids(self, region, **kwargs) -> List[int]:
        """
        get_ids should return a precalculated list of ints.
        """
        with psycopg2.connect(...) as conn:
            with conn.cursor() as cursor:
                cursor.execute('SELECT id FROM companies WREHE region = %s', (region, ))
                return cursor.fetchall()

class CurrencyFilter(BaseFilter):

    def get_ids(self, currency, **kwargs):
        with psycopg2.connect(...) as conn:
            with conn.cursor() as cursor:
                cursor.execute('SELECT id FROM companies WREHE currency = %s', (currency, ))
                return cursor.fetchall()
  1. Initialize Filtering object
from redis_index import RedisFiltering
from hot_redis import HotClient

redis_clent = HotClient(host="localhost", port=6379)
filtering = RedisFiltering(redis_clent)
  1. Now you can use filtering as a singleton in your project. Simply call filter() method with specific filters, and your search_ids
company_ids = request.GET["company_ids"]  # input list
result = filtering.filter(search_ids, [RegionFilter("US"), CurrencyFilter("USD")])

The result will be a list, that contains only ids, that are both satisfying RegionFilter and CurrencyFilter.

How to warm the cache?

You can warm up the cache in various ways, for example, using the cron command

*/5  *   *   *   *   python warm_filters

Inside such a command, you can use specific method warm_filters

result = filtering.filter(search_ids, [RegionFilter("US"), CurrencyFilter("USD")])

Or directly RedisIndex class

for _filter in [RegionFilter("US"), CurrencyFilter("USD")]:
    filter_index = RedisIndex(_filter, redis_client)
    filter_index.warm()

Statsd integration

Redis-index optionally supports statsd-integration.

Redis-Index performance

Redis-Index by filters

Code of Conduct

Everyone interacting in the project's codebases, issue trackers, chat rooms, and mailing lists is expected to follow the PyPA Code of Conduct.

History

[0.1.11] - 2019-11-08

Added

  • Added code for initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redis_index-0.5.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redis_index-0.5.0-py3-none-any.whl (6.7 kB view details)

Uploaded Python 3

File details

Details for the file redis_index-0.5.0.tar.gz.

File metadata

  • Download URL: redis_index-0.5.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.5 Darwin/22.2.0

File hashes

Hashes for redis_index-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8c95fda93a3120941e08d0bea2d0f028b65432a8c48df80a577365b9132af11e
MD5 9a9f25778e4b4307dcf929b1901c4f8f
BLAKE2b-256 242995154aca385115f521d3bb61f218ad6dbb390956e71c8d771c61e4dd3bf3

See more details on using hashes here.

File details

Details for the file redis_index-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: redis_index-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 6.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.5 Darwin/22.2.0

File hashes

Hashes for redis_index-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5596d7d80edce155506d36011b96c73842210f038a1311238ea1f6cd585b4c85
MD5 77b9a8f268701f32501c22aa71382db0
BLAKE2b-256 e9cdcf991f74d3cbed232f31fcd6c5d76ae9e9569ea804fa09fa4c9ee7299e78

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page