Powerful analytics library using Redis bitmaps
Project description
NEW! Try out our new standalone bitmapist-server, which improves memory efficiency 443 times and makes your setup cheaper and more scaleable. It's fully compatable with bitmapist that runs on Redis.
bitmapist: a powerful analytics library for Redis
This Python library makes it possible to implement real-time, highly scalable analytics that can answer following questions:
- Has user 123 been online today? This week? This month?
- Has user 123 performed action "X"?
- How many users have been active have this month? This hour?
- How many unique users have performed action "X" this week?
- How many % of users that were active last week are still active?
- How many % of users that were active last month are still active this month?
- What users performed action "X"?
This library is very easy to use and enables you to create your own reports easily.
Using Redis bitmaps you can store events for millions of users in a very little amount of memory (megabytes). You should be careful about using huge ids as this could require larger amounts of memory. Ids should be in range [0, 2^32).
Additionally bitmapist can generate cohort graphs that can do following:
- Cohort over user retention
- How many % of users that were active last [days, weeks, months] are still active?
- How many % of users that performed action X also performed action Y (and this over time)
- And a lot of other things!
If you want to read more about bitmaps please read following:
- http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/
- http://redis.io/commands/setbit
- http://en.wikipedia.org/wiki/Bit_array
- http://www.slideshare.net/crashlytics/crashlytics-on-redis-analytics
Installation
Can be installed very easily via:
$ pip install bitmapist4
Ports
Examples
Setting things up:
import bitmapist4
b = bitmapist4.Bitmapist()
Mark user 123 as active and has played a song:
b.mark_event('active', 123)
b.mark_event('song:played', 123)
Answer if user 123 has been active this month:
assert 123 in b.MonthEvents('active')
assert 123 in b.MonthEvents('song:played')
How many users have been active this week?:
len(b.WeekEvents('active'))
Iterate over all users active this week:
for uid in b.WeekEvents('active'):
print(uid)
To explore any specific day, week, month or year instead of the current one,
uou can create an event from any datetime object with a from_date
static
method.
specific_date = datetime.datetime(2018, 1, 1)
ev = b.MonthEvents('active').from_date(specific_date)
print(len(ev))
There are methods prev
and next
returning "sibling" events and
allowing you to walk through events in time without any sophisticated
iterators. A delta
method allows you to jump forward or backward for
more than one step. Uniform API allows you to use all types of base events
(from hour to year) with the same code.
current_month = b.MonthEvents('active')
prev_month = current_month.prev()
next_month = current_month.next()
year_ago = current_month.delta(-12)
Every event object has period_start
and period_end
methods to find a
time span of the event. This can be useful for caching values when the caching
of "events in future" is not desirable:
ev = b.MonthEvent('active', dt)
if ev.period_end() < datetime.datetime.utcnow():
cache.set('active_users_<...>', len(ev))
Tracking hourly is disabled (to save memory!) You can enable it with a constructor argument.
b = bitmapist4.Bitmapist(track_hourly=True)
Additionally you can supply an extra argument to mark_event
to bypass the default value::
b.mark_event('active', 123, track_hourly=False)
Unique events
Sometimes data of the event makes little or no sense and you are more interested if that specific event happened at least once in a lifetime for a user.
There is a UniqueEvents
model for this purpose. The model creates only one
Redis key and doesn't depend on the date.
You can combine unique events with other types of events.
A/B testing example:
active = b.DailyEvents('active')
a = b.UniqueEvents('signup_form:classic')
b = b.UniqueEvents('signup_form:new')
print("Active users, signed up with classic form", len(active & a))
print("Active users, signed up with new form", len(active & b))
You can mark these users with b.mark_unique
or you can automatically
populate the extra unique cohort for all marked keys
b = bitmapist4.Bitmapist(track_unique=True)
b.mark_event('premium', 1)
assert 1 in b.UniqueEvents('premium')
Perform bit operations
How many users that have been active last month are still active this month?
ev = b.MonthEvents('active')
active_2months = ev & ev.prev()
print(len(active_2months))
# Is 123 active for 2 months?
assert 123 in active_2months
Operators &
, |
, ^
and ~
supported.
This works with nested bit operations (imagine what you can do with this ;-))!
Delete events
If you want to permanently remove marked events for any time period you can use the delete()
method:
ev = b.MonthEvents.from_date('active', last_month)
ev.delete()
If you want to remove all bitmapist events use:
b.delete_all_events()
Results of bit operations are cached by default. They're cached for 60 seconds for operations, contained non-finished periods, and for 24 hours otherwise.
You may want to reset the cache explicitly:
ev = b.MonthEvents('active')
active_2months = ev & ev.prev()
# Delete the temporary AND operation
active_2months.delete()
# delete all bit operations (slow if you have many millions of keys in Redis)
b.delete_temporary_bitop_keys()
Bulk updates with transactions
If you often performs multiple updates at once, you can benefit from Redis pipelines, wrapped as transactions inside bitmapist.
with b.transaction():
b.mark_event('active')
b.mark_event('song:played')
Migration from previous version
The API of the "bitmapist4.Bitmapist" instance is mostly compatible with the API of previous version of bitmapist (module-level). Notable changes outlined below.
- Removed the "system" attribute for choosing the server. You are supposed to use different Bitmapist class instances instead. If you used "system" to work with pipelines, you should switch to transactions instead.
- bitmapist.TRACK_HOURLY and bitmapist.TRACK_UNIQUE module-level constants moved to bitmapist4.Bitmapist attributes and can be set up with a class constructor.
- On a database level, new bitmapist4 uses "bitmapist_" prefix for Redis keys, while old bitmapist uses "trackist_" for historical reasons. If you want to keep using the old database, or want to use bitmapist and bitmapist4 against the same database, you need to explicitly set the key prefix to "trackist_".
- If you use bitmapist-server, make sure that you use the version 1.2 or newer. This version adds the support for EXPIRE command which is used to expire temporary bitop keys.
Replace old code which could look like this:
import bitmapist
bitmapist.setup_redis('default', 'localhost', 6380)
...
bitmapist.mark_event('acive', user_id)
With something looking like this:
from bitmapist4 import Bitmapist
bitmapist = Bitmapist('redis://localhost:6380', key_prefix='trackist_')
...
bitmapist.mark_event('acive', user_id)
Bitmapist cohort
Cohort is a group of subjects who share a defining characteristic (typically subjects who experienced a common event in a selected time period, such as birth or graduation).
You can get the cohort table using bitmapist4.cohort.get_cohort_table()
function.
Each row of this table answers the question "what part of the cohort
performed activity
over time", and Nth cell of that row represents the
number of users (absolute or in percent) which still perform the activity
N days (or weeks, or months) after.
Each new column of the cohort unfolds the behavior of different similar cohorts over time. The latest row displays the behavior of the cohort, provided as an argument, the one above displays the behavior of the similar cohort, but shifted 1 day (or week, or month) ago, etc.
For example, consider following cohort statistics
table = get_cohort_table(b.WeekEvents('registered'), b.WeekEvents('active'))
This table shows what's the rate of registered users is still active the same week after registration, then one week after, then two weeks after the registration, etc.
By default the table displays 20 rows.
The first row represents the statistics from cohort of users, registered 20 weeks ago. The second row represents the same statistics for users, registered 19 week ago, and so on until finally the latest row shows users registered this week. Naturally, the last row will contain only one cell, the number of users that were registered this week AND were active this week as well.
Then you may render it yourself to HTML, or export to Pandas dataframe with df() method.
Sample from user activity on http://www.gharchive.org/
In [1]: from bitmapist4 import Bitmapist, cohort
In [2]: b = Bitmapist()
In [3]: cohort.get_cohort_table(b.WeekEvents('active'), b.WeekEvents('active'), rows=5, use_percent=False).df()
Out[3]:
cohort 0 1 2 3 4
05 Nov 2018 137420 137420 25480.0 18358.0 21575.0 18430.0
12 Nov 2018 150975 150975 22195.0 25833.0 21165.0 NaN
19 Nov 2018 121417 121417 22477.0 15796.0 NaN NaN
26 Nov 2018 152027 152027 25606.0 NaN NaN NaN
03 Dec 2018 130470 130470 NaN NaN NaN NaN
The dataframe can be further colorized (to be displayed in Jupyter notebooks) with stylize().
Copyright: 2012-2018 by Doist Ltd.
License: BSD
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bitmapist4-4.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f1a17cec134387d74506721bf70108bb671750c9d3bbfdc9aca950b28ebb6c8 |
|
MD5 | acb0e255c3bc2c2276bd93406dfcb76f |
|
BLAKE2b-256 | 2db1b264ba7e84eac1f75d569a31b1938d04c491706f96f69e8e7d54fcc3442b |