Comparing MaxDiff Models and Creating Ensembles in Displayr
There are a variety of different models available in Displayr to perform MaxDiff analysis. In this post we firstly describe how to easily compare the models. Secondly, we demonstrate how to create an ensemble which combines the models and potentially improves prediction accuracy.
Types of MaxDiff model
There are two main categories of MaxDiff model: hierarchical Bayes and latent class. Within these categories, models are further specified by other parameters such as the number of classes. We frequently want to experiment with a variety of different models in order to find the most accurate.
To illustrate the comparison, we are going use 1 and 3 class hierarchical Bayes models as well as a 1 class latent class model. This post describes how to set up a MaxDiff model in Displayr. I'll also be using the technology data described in that post. For each model we leave out 2 questions during the fitting process. The prediction accuracy for the 2 questions provides and unbiased estimate of accuracy (compared to the accuracy from the questions used for fitting). The output for the 1 class hierarchical Bayes model is below.
Comparing models
To create a table comparing several models, navigate to Insert > More > Marketing > MaxDiff > Ensemble. Then drag models into the Input models box, or select them from the drop-down list.
If you don't tick the Ensemble box, Displayr will create a table that just compares the models. When this is the case, it is not necessary that the models use the same underlying data. If you do check the Ensemble box then Displayr creates an additional model. This requires that the underlying models all use the same data.
The table for my 3 models is as follows. The best values for each measure are shaded in dark blue and the worst are shaded in light blue.
We can see that the 1 class hierarchical Bayes model performs the best in terms of accuracy on the holdout questions. It also has superior BIC and log-likelihood metrics (which are measures of goodness of fit).
How models are combined in an ensemble
To create an ensemble, we use the respondent utilities (also known as coefficients or parameters). I provide a brief overview here but this post describes more about MaxDiff.
- Utilities are a measure of how much each respondent prefers each alternative.
- The models fits (i.e. estimates) these utilities from the responses to questions.
- The preference of a respondent for an alternative is calculated as e raised to the power of the utility.
- The probability that the respondent will chose a specific alternative is given by the the ratio of the preference for that alternative to the sum of preferences of all possible alternatives.
The table below shows the utilities for the first 10 respondents. Apple, Google and Samsung tend have high utilities, so are the preferred alternatives.
The ensemble is created by averaging utility tables across the models.
Why ensembles can improve accuracy
We can see from the earlier table that the ensemble has a superior out-of-sample prediction accuracy to each of the 3 underlying models. Since the ensemble is created by averaging, it may be surprising that the ensemble accuracy isn't just the average accuracy.
Tho understand this effect, imagine if you know nothing about tennis (maybe you don't need to imagine!) and asked one person "Who is the best male tennis player in the world?". They reply "Roger Federer". Depending on how much you think that person knows, you will trust their answer to a certain degree. Now you ask the same question to another 99 people. If their answers all generally agree, you can be more confident that Roger really is the best. If you get a mixture of responses including Rafael Nadal and Novak Djokovic then you would not be so sure who will win the next grand slam tournament.
Ensembles work in a similar manner. Each model makes predictions and some models will be better than others at predicting in a specific situation. By taking the average utilities we reduce the noise from individual models (noise is technically known as variance in this situation).
It's also important to consider model correlation. If the models are very similar then the benefit from averaging will be small. In the extreme case of identical models, each additional model brings nothing new and there is no increase in accuracy. If the models are diverse and each is a good predictor in different situations, then the increase in accuracy is large. In practice the models are similar, so the benefit is small but potentially tangible enough that the winners of prediction competitions almost always use ensembles.
Ensemble parameter histograms
By setting Output to Ensemble we can visualize the respondent utility distributions in the same manner as for the underlying models. I've shown this below.
We can also use Insert > More > Marketing > MaxDiff > Save Variable(s) to add the coefficients, preference shares or proportion of correct predictions to the data set.
Read more about market research, or try this analysis yourself! The flipMaxDiff R package, which uses the rstan package, creates the hierarchical Bayes models and ensemble.