Assigning Respondents to Clusters/Segments in New Data Files in Displayr
Once you have created segments or clusters, it is often useful to assign people in other data sets to the segments (this is also known as segment tagging and scoring). For example, you may want to tag a customer database with predicted segment memberships. Or, you may want to assign respondents in a tracker to segments. When doing this, there are two basic approaches:
- You can assign people to segments in the new data file using the same variables as used when forming the segments, or,
- You can predict segment membership based on a different set of variables.
Before proceeding with any of these approaches, it is a good idea to take a copy of your project and make your changes in the copy.
The basic principle underlying all of these approaches is that you create a model in one data set, and then import a revised data set, but making sure that the model does not update to reflect the new data. Then, you use the existing model to make predictions in the new data set with the new variables as inputs.
Assigning people to segments in the new data file using the same variables
The best way to do this depends on whether we have used latent class analysis (Insert > Groups/Segments (Analysis) or k-means cluster analysis (Insert > More (Analysis) > Segment > K-Means Cluster Analysis).
Segments formed using latent class analysis
A three-segment latent class solution is shown below. This has been based on a sample size of 400. To allocate people in a new data file using these segments:
- Click on the data set in the Data Tree.
- Press Update in the Object Inspector and select the new data file. You will see some warnings. Ignore them (i.e., do not follow the suggestion about modifying the segments, as this will re-run the segments on a new data file).
- The Groups/Segments ... variable, which is in the Data Tree, has now automatically been updated, allocating people in the new data file to the segments.
Segments formed using k-means
A three-cluster k-means solution is shown above. To allocate people in a new data file using these segments:
- Click on the k-means solution and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector).
- Take a copy of line 2 of the code. In my example, it looks like this:
kmeans = KMeans(data.frame(understand, shop, key, value, interested),
- Click on the data set in the Data Tree.
- Press Update in the Object Inspector and select the new data file.
- From the Ribbon, select Insert > R (Variables) > Numeric Variable.
- In the R Code box in the Object Inspector, paste in the copied code, and modify it so that it looks like this (the key bits to retain from your pasted code are kmeans or whatever it has been changed to and the variable names):
predict(kmeans, newdata = data.frame(understand, shop, key, value, interested))
- Give the variable an appropriate Name and Label.
- Change the Structure of the variable to Mutually exclusive categories (Nominal) (this setting is found in the Object Inspector under Properties > Inputs).
- Press Labels (below DATA VALUES) and enter any labels you desire and press OK.
Predict segment membership using a different set of variables
In this scenario, segments have been formed and then a predictive model is used to predict segment membership on either:
- A completely different set of variables (e.g., demographics, or some other data available in a customer database).
- A subset of the variables used to create the segments. (Tip: if you are building a predictive model based on exactly the same variables as used to create segments, you are making a mistake, and should instead use the approach described in the previous section).
The output above from a multinomial logit (MNL) model (Insert > More (Analysis) > Regression > Multinomial Logit), predicting segment membership based on firmographics. The goal is to now predict segment membership in a new data file, that contains the same predictor variables.
- Click on the model output and make sure that Automatic is not checked (this option is in Inputs > R Code in the Object Inspector).
- Take a copy of the line of code that looks similar to this (with different variable names):
glm = Regression(segmentsGXVYS ~ q1 + q2 + q3 + q4 + q5,
- Click on the data set in the Data Tree.
- Press Update in the Object Inspector and select the new data file.
- Form the Ribbon, select Insert > R (Variables) > Numeric Variable.
- In the R Code box in the Object Inspector, paste in the copied code, and modify it so that it looks like this (the key bits to retain from your pasted code are glm or whatever it has been changed to and the variable names):
predict(glm, newdata = data.frame(q1, q2, q3, q4, q5))
- Give the variable an appropriate Name and Label.
- Change the Structure of the variable to Mutually exclusive categories (Nominal) (this setting is found in the Object Inspector under Properties > Inputs).
- Press Labels (below DATA VALUES) and enter any labels you desire and press OK.