Skip to main content

Code from the book Fighting Churn With Data

Project description

fight-churn

This is code for the book "Fighting Churn With Data: Science and strategy for keeping your customers"; the book serves as a detailed guide to the code. You can get more information at:

This page contains the most up to date setup instructions, as well as information about some extra code that is mentioned in the book.

WARNING TO PYPI USERS: None of the links or images in this document work on the pypi.org website! To use the links in this README view it on github.

Quick Start Setup Instructions

Section 1 covers installing the pre-requisite software.

0 Getting Started
1 Prerequisites
1.1 Python 3
1.2 Postgres
1.2.1 Database Setup

Section 2 provides the fastest way to get started with the code, from a command line

[2 Quickstart With Python Package
2.1 Create a virtual environment
2.2 Install the fightchurn package
2.3 Create a directory for output
2.4 Start the Python virtual environment
2.5 Import the run_churn_listing_module
2.6 Set the churn environment variables
2.7 Run the data simulation
2.8 Run code listings

Advanced Setup Instructions

The following are the "advanced" setup for people who are already accustomed to running Python in either a Jupyter Notebook or an Integrated Development Enivornment (IDE).

  • Jupyter Notebook Setup
  • Developers IDE Setup. Author's recommendation: If you are really going to use this code for anything it is recommended you take the time to do the IDE setup - this allow you to step through the code with a debugger, which is a great way to learn about it. Not to mention, if you are going to modify the code this is really required.

There are also some extra code parts, which are partially documented on this page..

WARNING TO PYPI USERS: None of the links or images in this document work on the pypi.org website! To use the links in this README view it on github.


0 Getting Started

Note from the author: These are basic startup/setup instructions that I think should work for most people using either shell Python, Jupyter Notebook, or an IDE, on either Mac or Windows. I also want to apologize in advance because I am neither an expert in PostgreSQL nor an expert in Python, but I am about to give a lot of advice on how to setup and use these technologies - if you find I am not doing things the best way, or just not how you would have done it, please be patient. Same goes for the rudimentary state of some of the code - I'm doing the best I can with the time I've got. If you want to make things better please help help out! :)

Before you can load data or run the code you have to do some setup on your system. If you never have done this before it may seem like a lot of work, and it kind of is, but this amount of setup is routine when you begin to work with a new technology. If already know how to do this sort of thing feel free to ignore my instructions, which are primarily written for beginners to use GUI tools to get up and running.



1 Prerequisites

Required:

  • 12 Gb free disk space for the simulation data
  • Python 3 and the required packages (requirements.txt)
  • PostgreSQL

Recommended:

  • PgAdmin - for PostgreSQL database setup
  • PyCharm (Community Edition) OR Jupyter Notebooks - for running Python programs

1.1 Python 3

If you need help installing Python 3, you can refer to this page for Mac:

Another good alternative for Mac is using Homebrew:

For Windows there are resources here:

(If you are on linux I'm going to assume you know how to install your own python...)

Note about Python and Package versions

Nearly all of the code for Fighitng Churn With Data should run with any Python 3.x version and all common package versions.

The only packages used that have version dependencies are the xgoost and and shap packages introduced in the later listings of chapter 9. These packages contain recent updates and may only be compatible with versions of Python later than 3.9, at the time of this writing. Note xgboost has other installation issues on Windows and Mac platforms, as described below in the section "Installing Virtual Environment and Requirements".

Please create an issue in the repository if you find any other instances of package or version incompatibilities.


1.2 PostgreSQL

To install PostgreSQL for Mac following these instructions:

To install PostgreSQL for Windows, use :

(That page has a different Mac installer if you don't like Postgresapp.)

For both Mac and Win, I also recommend installing installing pgAdmin to make it easier to import and export data, and run adhoc queries. Follow the instructions here:

For Mac you should make sure Posgres is running - here's what it looks like if you installed with PostgresApp on a Mac:

Postgres Running on Mac

For Windows, I have not yet figured out how to make sure Postgres is running, but I also have not yet had a problem with it not running (please notify me if you have something to contribute on either subject.)



1.2.1 Creating a Database

The first thing you might need to do is connect to your local server (when I do this on Mac this is necessary; on Windows, the connection to the localhost server was already present by default.) If you don't already see localhost under the Servers tree in Pgadmin, control (right) clicking on the root of the Servers tree and selecting Create

Connect to Server in PgAdmin

A dialog will open. Assuming you are working on a PostgreSQL database installed on your own computer then
in the first tab (General) name your connection localhost, and on the second tab (Connection) enter the address 127.0.0.1 (which is the IP address to connect to a database locally.) You should also enter your user name and password. So your dialog should look like the one below - then hit Save.

Connect to Server in PgAdmin

Next you need to create a new database to hold all of the churn data schemas you create. You will probably create multiple schemas as you work on the examples in the book and/or your own data so this will help keep these organized. An easy way to create a database is in PgAdmin - right click on the Databases node under localhost in the tree:

Create Database in PgAdmin

And enter the name of the new database (I used churn, but you can use whatever you want - just make the appropriate settings in your environment variable, section 1.2.2.3 below):

Name the database



2 Quickstart With Python Package


2.1 Create a virtual environment

First, you should make a new virtual environment in which to install the Fight Churn code...

python3 -m venv <python_environment_name>

Use whatever you like for <python_environment_name> ; if you don't normally make python environments, call it "py_env". That command doesn't have any output.

Next, your need to activate your environment:

~ user$ source churn_env/bin/activate
(py_env) ~ user$ 

2.2 Install the fightchurn package

pip install fightchurn

This will lead to a lot of outputs, starting with something like this:

(py_env) MacBook-Yo:~ carl$ pip install fightchurn
Collecting fightchurn
  Using cached fightchurn-0.3.5-py3-none-any.whl (99 kB)
Collecting docutils==0.17.1
  Using cached docutils-0.17.1-py2.py3-none-any.whl (575 kB)
...

Your output probably will not say "Using cached..." unless you have already installed this before, so don't worry if yours looks a bit different than whats shown above. Regardless, after several minutes (depending on your system and internet connection) you should see this:

Successfully installed Pillow-8.1.2 Pygments-2.9.0 SQLAlchemy-1.4.3 bleach-3.3.0 build-0.4.0 certifi-2020.12.5 chardet-4.0.0 cloudpickle-1.6.0 colorama-0.4.4 cycler-0.10.0 docutils-0.17.1 fightchurn-0.3.5 greenlet-1.0.0 idna-2.10 importlib-metadata-4.5.0 joblib-1.0.1 keyring-23.0.1 kiwisolver-1.3.1 llvmlite-0.36.0 matplotlib-3.4.0 numba-0.53.1 numpy-1.20.2 packaging-20.9 pandas-1.2.3 patsy-0.5.1 pep517-0.10.0 pkginfo-1.7.0 postgres-3.0.0 psycopg2-binary-2.8.6 psycopg2-pool-1.1 pyparsing-2.4.7 python-dateutil-2.8.1 pytz-2021.1 readme-renderer-29.0 requests-2.25.1 requests-toolbelt-0.9.1 rfc3986-1.5.0 scikit-learn-0.24.1 scipy-1.6.2 shap-0.39.0 six-1.15.0 slicer-0.0.7 statsmodels-0.12.2 threadpoolctl-2.1.0 toml-0.10.2 tqdm-4.59.0 twine-3.4.1 urllib3-1.26.4 webencodings-0.5.1 xgboost-1.3.3 zipp-3.4.1

Windows XG-Boost Warning: At the time of this writing there have been problems reported installing the xgboost package on Windows: https://discuss.xgboost.ai/t/pip-install-xgboost-isnt-working-on-windows-x64/57. If you are unable to install xgboost with pip, then you can try to install using the instructions outlined in that link. Alternatively, you can remove that requirement - note that you can still run all the code in the book except for the 2nd half of chapter 9 without xgboost.

Mac XG-Boost Warning: At the time of this writing there have been problems reported installing the xgboost package on recent versions of Mac OS: https://stackoverflow.com/questions/61971851/getting-this-simple-problem-while-importing-xgboost-on-jupyter-notebook If you are unable to install xgboost with pip, try performing the Lib OMP installation described above first. Alternatively, you can remove the xgboost requirement - note that you can still run all the code in the book except for the 2nd half of chapter 9 without xgboost.


2.3 Create a directory for output

You should make a local folder to store your output. On linux that would look as follows:

mkdir my_churn_output_folder

Naturally you can make your folder using the GUI if you are on Mac or Windows

2.4 Start the Python virtual environment

Next you should start your Python environment, and enter a python shell:

source churn_env/bin/activate
python

You should see something like the following...

(py_env) :~ user$ python
Python 3.9.6 (default, Jun 29 2021, 05:25:02) 
[Clang 12.0.5 (clang-1205.0.22.9)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 

2.5 Import the run_churn_listing_module

Next, you will import the fightchurn package that you will use to run everything:

from fightchurn import run_churn_listing

That has no output, but it might take a few moments

2.6 Set the churn environment variables

Now you need to set a few enviroment variables. These are:

  1. The database name : 'churn' in the example below
  2. The username for the database
  3. The password for the database
  4. A local folder where outputs can be written....
run_churn_listing.set_churn_environment('churn','user','password','/path/to/my_churn_output_folder')

This will print out a confirmation line as follows:

Setting Environment Variables user=carl for db=churn

2.7 Run the data simulation

Next, you need to write some data to the database in order to run the code against - no data is provided with the code distribution. Use the following command:

run_churn_listing.run_standard_simulation(init_customers=10000)

The example is for a standard simulation of 10,000 customers. If you want to speed things up you can run it for 1000 customers and things will still work okay - the results will just be a bit more noisy and random.

You will see output as follows...

Creating schema socialnet7 (if not exists)...
Creating table event (if not exists)
Creating table subscription (if not exists)
Creating table event_type (if not exists)
Creating table metric (if not exists)
Creating table metric_name (if not exists)
Creating table active_period (if not exists)
Creating table observation (if not exists)
Creating table active_week (if not exists)
Creating table account (if not exists)

Creating 2000 initial customers for month of 2020-01-01
Simulated customer 0/2000: 2 subscriptions & 100 events
Simulated customer 100/2000: 448 subscriptions & 154,047 events
Simulated customer 200/2000: 872 subscriptions & 282,882 events
Simulated customer 300/2000: 1,324 subscriptions & 426,866 events
Simulated customer 400/2000: 1,767 subscriptions & 557,543 events
...

This will continue for a while - maybe 15-30 minutes if you ran the full 10,000 customer simulation.

2.8 Run code listings

Now you are ready to run some code from the book! To do that you use the run_listing function that you previously imported. For examle, the following is chapter 2, listing 2:

run_churn_listing.run_listing(2,2)

You should see output like this:

Running chapter 2 listing 2 churn_rate on schema socialnet7
SQL:
----------
set search_path = 'socialnet7'; with 
date_range as ( 
	select  '2020-03-01'::date as start_date, '2020-04-01'::date as end_date
), 
start_accounts as  
(
	select distinct account_id        
	from subscription s inner join date_range d on
		s.start_date <= d.start_date   
		and (s.end_date > d.start_date or s.end_date is null)
),
...

----------
RESULT:
Record(churn_rate=0.0570875665215288, retention_rate=0.942912433478471, n_start=2067, n_churn=118)

Explaining what you ares seeing there is beyond the scope of this README, thats what the book is about! But if you have gotten this far, then you have completed all the setup and you are ready to follow along with the book (or videos, however you are learning the code...)

Advanced Setup Instructions

The following are the "advanced" setup for people who are already accustomed to running Python in either a Jupyter Notebook or an Integrated Development Enivornment (IDE)

Authors

License

This project is licensed under the MIT License - see the LICENSE.md file for details

(top)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fightchurn-0.3.6.tar.gz (56.4 kB view hashes)

Uploaded Source

Built Distribution

fightchurn-0.3.6-py3-none-any.whl (101.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page