How to understand Starbucks customers and individualize promotional offers

Christoph Emmert
11 min readAug 24, 2021
Photo by TR on Unsplash

The Udacity Nanodegree in Data Science capstone project deals with advanced analysis of a data set. This data set was provided by Starbucks. It contains data about the behaviour of users in relation to promotional offers. The goal of the project is to analyse this data in order to generate important insights for Starbucks and to derive relevant actions. For this purpose, the analysis procedures carried out are described in detail below.

Project Overview & Business Understanding

The data set contains simulated data that mimics customer behavior on the Starbucks rewards mobile app. Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offer during certain weeks. Not all users receive the same offer, and that was the challenge to solve with this data set. Our task is to combine transaction, demographic and offer data to determine which demographic groups respond best to which offer type. This data set is a simplified version of the real Starbucks app because the underlying simulator only has one product whereas Starbucks actually sells dozens of products. Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. Informational offers also have a validity period even though these ads are merely providing information about a product; for example, if an 1 informational offer has 7 days of validity, we can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.

  • Do user characteristica influence the effectiveness of promotional offers?
  • Are there distribution channels that are more effective than others or can be related to a certain group of users?
  • Does the offer type play a crucial role in how users react on a promotion?

The following analysis will investigate which offer strategies are best received by which customer group and whether there is a significant difference between the groups. In this way, Starbucks could gain insights into how best to promote and offer coupons or other promotional offers. In the end the revealed insights indicate how to increase the effectiveness of promotional offers. In a more descriptive way I will answer the questions if user characteristica influence the effectiveness of promotional offers and if the offer type plays a crucial role in how users react on a promotion. On top of that I will build a recommendation machine that states which offer should be sent to a particular customer to increase its value and which demographic groups respond to which offer type best.

Data Set

The data contains three files:

  • portfolio.json: offer ids and metadata about each offer (duration, type, etc.)
  • profile.json: demographic data for each customer
  • transcript.json: records for “transactions”, “offers received”, “offers viewed”, and “offers completed”

In the following I will shortly explain the data in detail:

portfolio.json

  • id (string) — offer id
  • offer_type (string) — the type of offer (BOGO, discount, informational)
  • difficulty (int) — minimum amount required to spend to complete an offer
  • reward (int) — reward for completing an offer
  • duration (int) — time for the offer to be open in days
  • channels (list of strings) — via which channel user received the offer (Email, Web, Mobile, Social)

profile.json

  • age (int) — age of the customer
  • became_member_on (int) — the date when customer created an account
  • gender (str) — gender of the customer
  • id (str) — customer-id
  • income (float) — customer’s income

transcript.json

  • event (str) — record description (transaction, offer received, offer viewed, offer completed)
  • person (str) — customer-id
  • time (int) — time in hours since the start of the test
  • value — (dict of strings) — offer id or transaction amount

Problem Statement

Predicting the purchase offer to which a possible higher level of response or user actions like ‘offer received’, ‘offer viewed’, ‘transaction’ and ‘offer completed’ can be achieved based on the demographic attributes of the customer and other attributes of the companies purchase offers.

Metrics

I use a FunkSVD algorithm to build a recommendation machine. By calculating the mean squared error it is possible to evaluate the model and also keeping track of iterations is important. After computing the error for every user offer pair all square errors can be sumed up. Then the number of latent features can be used to tune the parameters.

Data Exploration & Cleaning

Clean portfolio data set

  • create a copy of the original dataframe for further implementation
  • convert the column ‘Channels’ into 4 different columns on the basis of different types of channel
  • rename the column name from ‘ID’ to ‘offer_id’

Clean profile data set

  • convert the datatype of ‘became_member_on’ column and sort the date into proper format
  • change the column name from ‘ID’ to ‘customer_id’

Clean transcript data set

  • change the column name from ‘person’ to ‘customer_id’
  • convert the column ‘Event’ into 4 different columns on the basis of different types of event
  • convert the column ‘Values’ into 2 different column

There are 2175 unique customers older than 110 compared to 33772 records. Compared to the users younger than 110 there is a difference in the ratio from records and unique customers which could lead to the assumption that either customers older than 110 are quite active and receive a lot of offers or there is a error in the data. From my point of view it is difficult to decide but it is quite unrealistic so for a better analysis I will replace the values with an age over 100 with NaN.

Age distribution
Income distribution

User characteristica & the effectiveness of promotional offers

A logical approach to analysing promotional strategies is to evaluate whether certain characteristics of users are associated with different behaviour. This can help determine whether strategies should be adjusted depending on gender, age or other demographic characteristics. The challenge in the present data set was to consolidate individual event tracking parameters of user behaviour and prepare them for analysis. The core of the analysis was the assumption of a user journey in which a user receives an offer, views it and can complete it. By completing the individual steps, the effectiveness of the offers can be determined and interesting parameters analyzed.

First, looking at the gender provides some insightful information. With male users having the biggest share in the data it can be expected that they also receive and view offers compared to female and other genders in a relative way. It is all the more surprising that in the step of offer completion, the relative proportion of men and women is increasingly equal. This indicates that women are more likely to complete an offer.

Relative action on user journey stage

Following on from this, another statistic shows that women have a higher average transaction value than men. This suggests that women spend more money on average than men and are therefore the potentially interesting user group when measured by average transaction value.

Average transaction value

The role of the offer type

Another relevant factor in the analysis is the type of offer. This can be a discount, a buy one — get one or purely informative. A look at the absolute (graph) and relative figures shows that discount offers lead to more completions of the offer. Although bogo’s are viewed more often, discounts achieve a higher effectiveness in leading users to a purchase decision. As a result, discounts are the most appropriate type of offer to influence purchase decisions.

Data Modeling

As described earlier it is important to detect the user journey from receiving, viewing and completing an offer so the data analysis makes sense with regard to the promotional offers Starbucks sends out. By looping through offers and users I can compare event tracking data from a particular user and its offers. If the criteria is matched the user data will be saved in the user item matrix. The final matrix contains information that I can use for further analysis and also some null values if the criteria is not meet. This step involves some computing and therefore takes some time.

Algorithm

I choose a FunkSVD as the algorithm has to handle some missing values inside the matrix. With it I can split the matrix into user matrix, latent feature set and offer matrix. Afterwards the data set got split into test and training data. This can happen without no restrictions although I decided to user the latest data to test the older data. Then the customer response on a particular offer can be predicted and the prediction can be tested.

Metrics

The metrics used to evaluate the model are:

  • Mean Squared Error
  • Iterations
  • Number of latent feature for parameter tuning

For parameter tuning, I choose the number of latent features with 5, 10 and 15 . For 15 latent features, the MSE is 0.0097 — For 10 latent features it is 0.0172 and for 5 latent features it is 0.0298. 15 latent features achieve the lowest MSE.

Evaluation

After splitting the user-offer matrix into the user matrix and the offer matrix, it is tested with the test data. Those tests revealed an interesting insight. The model built with 5 latent features achieves the worst performance of MSE 0.0298. The model built with 15 latent features achieves the best prediction result with MSE 0.0097, while the model built with 10 latent features has a MSE of 0.0172. This could be caused by overfitting or by splitting the data set and not correctly taking the time frequency an offer is sent over several days into account. Some errant data is produced due to this time gap and in some cases a offer is received in the training data and completed in the test data. When this happens the offer is not count as completed. All of this implies that only based on the MSE it is diffiuclt to decide whoch number of features to use and how to predict the value of the matrix and make recommendations best.

Recommendation Machine

It still is necessary to create a recommendation system to see what our data indicates and if it meets the descriptive analysis that was made earlier. The recommendation system was completed through the predicition function that looped through the offers of a particular user and investigated which offer had the highest score. This enables us to predict the response of a user when receiving offers. For example we predict that user xy will respond positively to a discount of 10% than to a certain BOGO and thus Starbucks should send a special offer to this customer.

Findings

The objective of the recommendation system was to be able to predict how a certain user will react on a particular offer. With the system I can generate an output for every user to see which offer has the highest predicition value.

One problem that occurs is how you deal with new users. As you have no transactional data to rely on, in this case I solved it just sending this user the offer that performs in general best. This can cause some ineffective offer presentations but it is a start to overcome the cold start problem (see below).

Also from primary interest is how customers grouped by gender are reacting to different offer types. It can be seen that men do respond better on discount offers whereas other genders have more or less a similar response on the offers although discount tends to be more successful. Compared to the descriptive findings where discounts performed better in general it gives some more insights to better target the offer types based on gender.

For Starbucks it is from high interest to evaluate which offer performed how and what is the gain made with a certain offer. The analysis below clearly shows that offer 7 (buy 10 dollars get 2 dollars off within 10 days) has the highest gain. With this knowledge it should be possible to assess further decisions on which promotional strategy to choose.

Improvement

With this kind of recommendation system we face a problem when creating predicions for new users as we have no data and information about them that we could use (Cold Start Problem). A possible solution is a ranked based recommendation where a new user just gets offered the best sale item based on historic data from all users. Further potential improvement can be made on splitting the training and the testing set precisely. There it is necessary to keep track of each offer sent and chase whether the user completed it or not. In this way, we can get a more accurate user-offer matrix and also a better trained model. On the same time it will need a longer and a more complex algorithm to split the dataset.

Conclusions

Compared to the descriptive analysis made the implication that women have a higher completion rate was true which makes them potentially interesting for placing some kind of promotional offers. Also interestingly a small difference can be observed in the offer type insights. Especially men can be reached in a more succesful way by offering a discount. The descriptive analysis showed that discounts have a higher completion rate whereas the recommendation system provides information that bogo and discount have more or less the same chance of success in women and others. Only within the male user group discounts perform clearly better. Looking at the distribution channels we can provide information that leads to the assumption that social performs best. Only looking at the ten different offers the one that performs best is “buy 10 dollars get 2 dollars off within 10 days”.

After creating a user item matrix and implementing a SVD algorithm and train and test the data some more confidence regarding the data ist gained. Combined with the descriptive analysis some useful insights could be detected. With this approach Starbucks has a better understanding of how to use their promotional offers and also being able to calculate the success rate per user.

For further information or detailed technical explanations visit the Github Repo. To engage in further discussions about Data Science contact me on LinkedIn.

--

--