Recap
In our first workshop we created an experimental design with 10 versions and 10 questions per respondent.
We've now interviewed 1,056 respondents.
In the workshop, we worked out that the work location attribute, shown in the bottom right corner, is problematic, as some people have jobs where it's impossible to work from home.
After the workshop, I decided to solve this problem by using a split cell design.
Split cell design
We asked people if their job could be done from home. If not, we didn't show them the last attribute..
Take a moment to read through the choice questions so you get a feeling for the wording we used.
This webinar
In thiswebinar we will review the goal of the statistical analysis we need to do, and then go through he key steps in getting it done.
Utilities
The statistical analysis of conjoint is all about utilities.
Here are the utilities from our study.
As we have talked about in the previous webinar,
- Utility is just another way of saying how appealing something is, and
- we typically set the first level of an attribute to 0 and estimate everything relative to it.
If these utilities are correct, they tell us that:
- People prefer more money to less. Der.
- Having 5% pay rise is pretty marginal. It needs to be 10%.
- People value a carbon neutral employer. It's worth almost 10% of salary. The key point is for an employer to be committed to being neutral in 10 years.
- People really care about the tools they use. You have to pay people a lot to use bad software.
- People prefer fully remote. But, it's less important than the other things.
From this information alone it's possible to predict market share and draw lots of conclusions.
But, if you think about it, there's a bit of a problem with these conclusions. In the real world, there are differences between people. It says that people prefer Fully remote to office every day. But surely that can't be everybody!
These utilities…
About 20 years ago choice-based conjoint experienced a major jump forward.
It became the norm to compute a utility for every respondent for every attribute level
Looking here, the utility of a 20% salary increase is 2.1. This is an average.
We can see that the first person's utility is 0.3,the next person's utility is 0.8, and so on.
But, if you pause for a moment and think about it, you will realize that there must be a lot of noise in utilities that are estimated for each person. If we ask them 10 questions, as in this experiment, where they just choose from a set of alternatives, how can we so precisely compute their utility is, say, 0.3 rather than 0.4?
We can't.
Each respondent's utility is itself…
The way that the modern techniques for calculating utilities work is that they identify the distribution off possible values for each respondent for each attribute level.
These possible utilities are called draws.
So, while we have concluded that respondent 7 has an average utility of 2.3, it's possible their utility is 1.8, 3, 2.3, 2, or any of the 100 numbers in this table
The average of these 100 numbers is our best single guess.
But, the smart thing is to recognize we have uncertainty and to use this when drawing final conclusions.
The goal with modern choice based conjoint is to estimate these draws. That is, for each attribute level we want to estimate the distribution of possible utilities that each person can have.
Our goal today is to work out how to calculate these draws.
These numbers are then used to derive all the key outputs that we will discuss in the following webinars.
Getting the data into Displayr
This is the easy bit. How do we get the data into Displayr.
First, we need to get the design.
Now, let's get the data in. I will load up the 100 respondents from my soft send.
When doing conjoint studies, you always want to stop the interviewing after a soft send and check everything works, as it's so easy to make a mistake.
Data sets + > My computer > ... > Conjoint > Climate > Low Down Soft Send...
+ New page > Title and Content
Title: Initial model
Search: HB
+ Anything > Advanced .. > Choice modeling > HB
There's lots of ways I can importa a design. For example, it may come from sawtooth or some other program. I will hook it up to the design I just imported
Now, I have to hook it up to the data.
First, I select the choices people made when answering the questions.
This will take a few minutes to calculate.
While we wait, let's do pop quiz
All else being equal, would you prefer a job with...
- A 5% pay rise
- A company that was carbon neutral
OK, let's look at the results.
The alternative attributes the utility that people had for choosing the 1st, 2nd, 3rd and fourth column in the questionnaire. We can see here that alternatives 2 and 3 have the highest utility, telling us that people preferred the middle options. Perhaps this is a mobile phone thing. This is a response bias. By treating it as an attribute, we remove it from the data.
Looking at salary, note that
- A 5% salary is marginally negative.
- The bigger salary increases are much more appealing on average. Tht is, I am looking at the mean column.
- As we would expect, there's a lot of variation.
- Looking at carbon neutral, it is appealing on average. But, surprisingly, Neutral is less appealling than 5 years. Remember, though, this is just the 100 respondents from the soft send.
- The software choice makes sense.
- The key thing with Work location isn't the averages, it's he degree of variation. Some people prefer fully remote. Others don't wan fully remote.
Hygiene test
Having calculated some results, we now need to check them.
Many things that we calculate, such as percentages and means, have simple exact formulas.
But, more complicated analyses have a lot of trial and error hidden under the hood.
An analysis is said to have converged when we are confident that we have done enough trial and error and have arrived at a good result.
This is a test of hygiene in much the same way as using hand sanitizer. There's no guarantee you die if you don't use it. But, you're more likely to.
Look here in the Pages tree. There's a little warning symbol.
When we click on the model, we get the warning.
As you can see, we've got some weird technical warnings. So, we've failed the hygiene test.
What do we do? We follow Displayr's advice.
It's telling us to run the model with more iterations.
Inputs > MODEL > Iterations
I'm also going to someything else that I will explain a bit later, which is I will save the draws for each person.
SIMULATION > Iterations saved per individual: 1000
Remember, we walked about the draws for each person, which take uncertainty into account. By default they aren't saved as it can make projects really big, which slows things down. So, as I will need them later, I am saving them.
Hierarchical Bayes - 2000
Here's the result with 2,000 iterations for the full sample of 1,056 people.. We've passed this hygiene test.
The good hygiene dividend
The output on the left is from the default settings with 100 iterations three fail the hygiene test
The output on the right is with all the hygiene issues resolved.
The results are almost identical.
Our experience is that this is usually the case. But, take note of the topic. Hygiene. You can get into trouble if you ignore it.
Smell test
The smell test checks to see if the results are intuitively plausible.
So, look at the averages. Do they seem sensible?
Are the distributions plausible.
I'll give you 10 seconds to draw a conclusion.
Does this model pass the smell test?
- Yes
-No
Ok, for those of you that said No, tell us why, please type it into the question field.
Remove random choosers
We've shown each person 10 choice questions. Our data is only meaningful if people read and think through each of the questions.
Inevitably some people will answer near randomly. We need to remove them from the data.
There is a metric for understanding the quality of a model called the root likeihood, or RLH for short.
We can calculate the quality of a model by looking at this measure for each person.
That is, it's a measure of how well we can predict their choices.
We can add a variable to the data file that contains this data.
Let's do it for our soft send data.
SAVE VARIABLES > RLH
Let's have a look at it.
Title: Distribution of RLH
Visualizaton > Distributions > RLH
Let's put the various percentiles on it.
So, we can see that the RL ranges from 0.226 to 0.742. But, what's a good result?
The easiest way to answer this question is to simulate some random data.
Duplicate page Initial Model
RESPONDENT DATA > Data Souce > Simulated choice
Simulated sample size: 100
I'm going to use a bit of code to create he histogram
Calculation:
hist(choice.model.2$rlh)
As you can see, it ranges around the 0.22 to 0.33 range.
I'm going to use the 95th pecentile as the cutoff.
Calculation:
random.rlh.cutoff = quantile(choice.model.2$rlh, .95)
OK, so we should chuck out any data with an RLH of less than 0.297
Remove irrational choosers
There's another class of people we might want to get rid of. That's people that are making choices that don't make sense at all. It's not always a smart thing to do as it's easy to get it wrong when trying to judge irrationality.
As an example of irrationality, let's look at attitude to a higher salary. All else being equal, no rational person is going to prefer a lower salary.
When we look the distribution, we can see the most people have positive utilities, also known as coefficients in this context. But, there are a small number that have negative utilities.
Should we just delete them?
No, we need to take uncertainty into account.
Title: Respondent draws for 20% price increase
draws = choice.model$beta.draws
respondent.draws.salary20 = t(draws[,, "Salary (compared to current): Salary 20% higher"])
In this table, each row represents a person, each column represent their draws.
For example, if you look at row 10, you can see that most of their draws are positive and none are near 0. This tells us that this person definitely wants a high salary.
But, if we look at row 1, we can see hat most are positive, but not all. So there is some uncertainty.
Which of these people is irrational? I want to do a statistical test. I want to give them the benefit of the doubt.. I want to only say they are irrational if 95% of their draws are less than 0. So, I'll start by calculation the percentage of people with draws less than 100%
irrational = AverageEachRow(respondent.draws.salary20<0) > 0.95
Let's sort to have a look.
OK
So, we can't say any of these people are definitely irrational.
But, we are only looking at the data from the soft send. This may change with the full data set.
OK, now we're going to filter our model to leave out the random responders and anybody who is irrational.
I will create a new variable to use as a filter
Insert > Custom code > R Numeric
`RLH from choice.model` > random.rlh.cutoff & irrational == FALSE
Name: Valid
Label: valid
Usable as a filter
Note that our predictive accuracy of the old model is 75.2%
But, our new model is now 82%.
Reviewing the random responding and irrational choosing
I've done the same thing for the full data set
Check external validity
External validity is a term of art in research, which deals with whether the results will hold up in the real world.
Different levels of external validity
The simplest check, is whether we can predict above random in a cross-validated sample.
The next level up is whether the utilities we've estimated correlate with other data.
- For example, in our study, we'd expect that how people trade off salary versus having a carbon neutral employer will relate to politica preferences.
The ideal check is the extent to which the model can accurately predict historic and future events, such as changes in market share. This is a nice thing to try and do, but in practice it's rarely done validly as there are too many variables to do the analysis meaningfully.
OK, so we'll check the cross validation now.
This is built into Displayr and Q.
MODEL > Questions for cross validation: 1
Title: Cross validation
We have asked people 10 questions. By choosing this option, for each respondent, we randomly select nine of the questions and use them to build the model. Then, we see how accurately it predicts the 10th.
In this study we have four questions. So, by chance alone, we would expect to be around 25% accuracy. Our accuracy is 49.4%, which is much better than chance.
The next test is the extent to which the data is correlated with other data.
I will return to our initial model.
And, I will save out each respondent's utilities.
Utilities (Min 0, Max Range 100)
As you can see, it's added the utilities to the data file
Title: Correlating utilities with political party
As you can see, we're finding that the reupbilcans have a much higher utility for salary, which makes sense. When we do this with the full sample, rather than the soft send, we see a much clearer pattern.
Choose the best model
And now for the last stage. Choosing the best model.
Options
Most of the time, once people have done the steps above they're finished.
But, occasionally it can make sense to do something more complicated.
In our study, we need to explore segmentation. That is, creting different models in different segments.
Remember, we used the split cell design. We should see if we need different segments for the different cells.
Let's look at our cross-validated result
Let's filter this to just include he data of people that can't work from home
Search: Regardless
Can be done
Label at top: Can work from home
Title: can work from home
Now, segmented analyses on our small soft send data won't work well, so having shown you how to do it, let's look at the data for our complete sample.
We can automatically update everything w've done here
I've already done this, so let' s look at the complete results.
Go to page: Hiearchical Bayes with 1 question used for cross validation
Our predictive accuracy here is 61.6%.
Go to page Hierarchical Bayes: 100 iterations - work cannot be done from home with cross validation
In this segment, our predictive accuracy is 67.9%, so that's much better.
And, the predictive accuracy in this segment is the same as in the aggregate analysis, so the combined effect is clearly that we should do the different models in the different segments.
With the original model estimated for the total sample, our accuracy is 49.4%