Skip to content

A project demonstrating how to predict customer churn based on product usage behaviours.

Notifications You must be signed in to change notification settings

richhuwtaylor/fighting-churn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fighting Churn with Data

This project demonstrates how the usage and subsciption data or a product or service can be combined identify user behaviours which predict customer churn. This analysis can then be used to suggest tactics to fight churn.

The SQL and python scripts included in this repo are intended to be used on the simulated social network 'SocialNet7' dataset which can be generated by running the setup of fight-churn for the book Fighting Churn with Data by Carl Gold.

The SQL scripts and Python notebooks of this project follow the natural order of any effort to combat churn. They should be followed and executed in order. Any intermediate outputs are held in the output folder.

Part 1

Focuses on setting up the metrics used for churn analysis.

  • churn-calculations includes SQL scripts for calculating:
    • activity event based churn
    • MRR churn
    • net retention
    • standard account-based churn
  • insert-metrics includes a SQL script for inserting aggregated metrics for each kind of analytics event.
  • event-quality-assurance contains a notebook and SQL scripts for plotting events over time.
  • metric-quality-assurance contains a notebook and SQL scripts for spotting anomalous metric values which might indicate problems with event collection.
  • account-tenure contains scripts for calculating account tenure (the length of time for which there is a continuous subscription for a single account) and inserting this into the data warehouse as its own metric.
  • identify-active-periods contains SQL scripts for calculating the active periods (allowing for a maximum 7 day gap between subscriptions) and inserting these into an active_period table. These are used to determine whether or not a metric observation ended in churn.
  • create-churn-dataset is where the the fun begins! Here, we create a dataset of 'per-month' event metric observations which form the basis of our churn analysis.

Part 2

Focuses on cohort analysis and clustering metrics into groups of behaviours. These metrics and groups could then be used by the business to target interventions to stop people from churning from the product.

  • metric-summary-stats contains a notebook for checking summary statistics for all metrics (so that we can check the percentage of zero-values).
  • metric-scores contains a notebook for producing normalised ("scored") versions of each event metric.
  • metric-cohorts contains notebooks for performing cohort analysis on inidividual and grouped versions of our metrics.
  • metric-correlations contains a notebook for calculating and visualising the matrix of Pearson correlation coefficients between metrics.
  • group-behavioural-metrics contains notebooks for:
    • grouping metrics together using hierarchical clustering (using SciPy's linkage and fcluster) and generating a loading matrix for averaging together the scores of those groups
    • applying the loading matrix to create grouped scores.

Part 3

Focuses on forecasting churn probability with logistic regression.

The subscription data, analytics data and the churn metrics produced from them are stored locally in a PostgreSQL database.

Future work

Some ways in which this project could be expanded are:

  • ratios of metrics - perform cohort analysis for ratios of metrics such as
    • replies per message
    • likes per post
    • posts per message
    • unfriends per new friend

About

A project demonstrating how to predict customer churn based on product usage behaviours.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published