Name	Name	Last commit message	Last commit date
Latest commit History 22 Commits
insert-ratio-metrics	insert-ratio-metrics
output	output
part1	part1
part2	part2
part3	part3
README.md	README.md

Fighting Churn with Data

This project demonstrates how the usage and subsciption data or a product or service can be combined identify user behaviours which predict customer churn. This analysis can then be used to suggest tactics to fight churn.

The SQL and python scripts included in this repo are intended to be used on the simulated social network 'SocialNet7' dataset which can be generated by running the setup of fight-churn for the book Fighting Churn with Data by Carl Gold.

The SQL scripts and Python notebooks of this project follow the natural order of any effort to combat churn. They should be followed and executed in order. Any intermediate outputs are held in the output folder.

Part 1

Focuses on setting up the metrics used for churn analysis.

churn-calculations includes SQL scripts for calculating:
- activity event based churn
- MRR churn
- net retention
- standard account-based churn
insert-metrics includes a SQL script for inserting aggregated metrics for each kind of analytics event.
event-quality-assurance contains a notebook and SQL scripts for plotting events over time.
metric-quality-assurance contains a notebook and SQL scripts for spotting anomalous metric values which might indicate problems with event collection.
account-tenure contains scripts for calculating account tenure (the length of time for which there is a continuous subscription for a single account) and inserting this into the data warehouse as its own metric.
identify-active-periods contains SQL scripts for calculating the active periods (allowing for a maximum 7 day gap between subscriptions) and inserting these into an active_period table. These are used to determine whether or not a metric observation ended in churn.
create-churn-dataset is where the the fun begins! Here, we create a dataset of 'per-month' event metric observations which form the basis of our churn analysis.

Part 2

Focuses on cohort analysis and clustering metrics into groups of behaviours. These metrics and groups could then be used by the business to target interventions to stop people from churning from the product.

metric-summary-stats contains a notebook for checking summary statistics for all metrics (so that we can check the percentage of zero-values).
metric-scores contains a notebook for producing normalised ("scored") versions of each event metric.
metric-cohorts contains notebooks for performing cohort analysis on inidividual and grouped versions of our metrics.
metric-correlations contains a notebook for calculating and visualising the matrix of Pearson correlation coefficients between metrics.
group-behavioural-metrics contains notebooks for:
- grouping metrics together using hierarchical clustering (using SciPy's linkage and fcluster) and generating a loading matrix for averaging together the scores of those groups
- applying the loading matrix to create grouped scores.

Part 3

Focuses on forecasting churn probability with logistic regression.

The subscription data, analytics data and the churn metrics produced from them are stored locally in a PostgreSQL database.

Future work

Some ways in which this project could be expanded are:

ratios of metrics - perform cohort analysis for ratios of metrics such as
- replies per message
- likes per post
- posts per message
- unfriends per new friend

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fighting Churn with Data

Part 1

Part 2

Part 3

Future work

About

Releases

Packages

Languages

richhuwtaylor/fighting-churn

Folders and files

Latest commit

History

Repository files navigation

Fighting Churn with Data

Part 1

Part 2

Part 3

Future work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages