Using SQL to Estimate Customer Lifetime Value (LTV) without Machine Learning
The Statsbot team estimated LTV 592 times for different clients and business models.
Customer lifetime value, or LTV, is the amount of money that a customer will spend with your business in their “lifetime,” or at least, in the portion of it that they spend in a relationship with you. It’s an important indicator of how much you can spend on acquiring new customers. For example, your customer acquisition cost (CAC) is $150, and LTV is $600. You would be able to increase the budget to get more people and grow your business. The balance between CAC and LTV allows you to check any business for market survival.
Estimating LTV is a predictive metric which depends on future purchases, based on past patterns, and allows you to see how much risk you are exposed to as a business, and how much you can afford to spend to acquire new clients. At an individual level, it also enables you to figure out who your highest-value customers are likely to be, not just now, but also in the future.
In order to understand how to estimate LTV, it is useful to first think about evaluating a customer’s lifetime value at the end of their relationship with us. So, say a customer stays with us for 12 months, and spends $50 per month.
The revenue that they generated for our business over their lifetime is then $50*12 = $600. Simple! We can consider the basic definition of LTV as a sum of payments from a specific user.
This same principle applies to the group. If we want to see the average LTV for the group we can look at total spend divided by the number of customers. When we’re talking about estimating LTV for the group, or predictive LTV, we need to take into account how long customers stay with us. To get that, we actually look at it “backwards”: we look at how many customers we lose over time, or the churn rate.
How do you calculate LTV for SaaS?
At a group level, the basic formula for estimating LTV is this:
Where ARPU is average monthly recurring revenue per user and the churn rate is the rate at which we are losing customers (so the inverse of retention).
This basic formula can be obtained from assumption:
Next Month Revenue = (Current Month Revenue) * (1 – Churn Rate)
Note: When we’re estimating customer lifetime value for SaaS we can neglect Gross Margin, because costs are minor and don’t affect the accuracy of a result. But when we calculate predictive LTV for ecommerce later in this article, we’ll include COGS in our formula.
The main limitation of the LTV formula above is that it assumes that churn is linear over time, as in: we are as likely to lose a customer between the first month of membership of our service and the second, as we are to lose them much later on. Going deeper into the nature of predictive LTV, we can say that it’s a sum of a geometric series, and linear churn doesn’t look like a straight line (as is shown in many articles about LTV).
In fact, we know that linear churn is usually not the case.
In a flexible subscription model, we lose many people at the very beginning, when they are “testing out” a service, but once they have been with us for a long time, they are less likely to leave.
Ultimately, it depends on the type of contract that exists between customers and the business: for example, annual renewals, where churn is more linear, will result in LTV that is very close to the formula above.
Services which do not have any contracts may lose a high percentage of their new customers, but then churn may slow down.
We can think of this concept graphically:
If the LTV of the group is the area under the line, we can very clearly see that the rate at which we lose customers will impact our LTV estimates very significantly. So we will need to take this into account when we are making our calculations. For a first estimate of LTV, however, it makes sense to go with the simplest formula. After that, we will add levels of complexity.
Extracting ARPU and churn using SQL
In order to make the most basic estimate of LTV, we need to look at our transaction history. Then, we can establish the average revenue per customer as well as the churn rate over the period that we are looking at. For simplicity, I’m going to look at the last year.
You can calculate ARPU in 2 steps:
month_ARPU AS(SELECT visit_month, Avg(revenue) AS ARPU FROM
Datediff(MONTH, ‘2010-01-01’, transaction_date) AS visit_month,
Sum(transaction_size) AS revenue
WHERE transaction_date > Dateadd(‘year’, -1, CURRENT_DATE)
GROUP BY 1)
The results will look like this:
In the case above, that would give us an average monthly spend of $987.33.
Calculating churn rate is a bit more complicated, as we need the percentage of people not returning from one month to the next, taking each group of customers according to the month of their first visit, and then checking if they came back or not in the following month.
The problem is always that, in a transactional database, we have customers’ visits on separate lines, rather than all on the same line.
The way to fix that problem is to join the transactional database to itself, so that we can see a customer’s behavior on one single line.
In order to isolate those who churned, we take the visits from month 1, and left join the visits from month 2 on the cust_id. The lines where visits from month 2 have a cust_id that is null are the ones where the customer has not returned.
WITH monthly_visits AS (SELECT DISTINCT Datediff(month, ‘2010-01-01’, transaction_date) AS visit_month, cust_id
transaction_date > dateadd(‘year’, -1, current_date)),
WHEN cust_type=’churn’ THEN 1
END)/count(cust_id) AS churn_rate
past_month.visit_month + interval ‘1 month’ AS current_month,
WHEN this_month.cust_id IS NULL THEN ‘churn’
END AS cust_type
LEFT JOIN monthly_visits this_month ON
AND this_month.visit_month=past_month.visit_month + interval ‘1 month’
GROUP BY 1)
Say this gives us a result of 0.1, just for simplicity.
It is a simple calculation, then, to estimate LTV: we have monthly ARPU and monthly churn, so we just divide one by the other!
$987.33/0.1 = $9873.3
As stated earlier, there are limits to this formula, mostly because it makes a series of assumptions that may not hold in the real world. The main one is that retention and churn rates are stable both across cohorts and across time.
Stability across cohorts implies that early adopters of your service act in similar ways to late adopters, while stability across time implies that customers’ likelihood of churning out is the same at the beginning of their relationship with you as it is, for example, 2 years in. Depending on how close to the truth these assumptions are, you may need to revise your LTV estimate downwards.
Link: Using SQL to Estimate Customer Lifetime Value (LTV) without Machine Learning