Watch this webinar and learn the advanced tracking (longitudinal) analysis and reporting techniques that make your data more accurate and your workflow faster.
Watch this webinar and learn the advanced tracking (longitudinal) analysis and reporting techniques that make your data more accurate and your workflow faster.
The document featured in this webinar can be viewed here.
The articles mentioned in the webinar can also be read here:
This webinar starts where the previous webinar finished. So, if you haven't seen the beginner's guide, I suggest you watch it first.
Agenda
In this webinar, we will be digging into two broad themes.
How can we efficiently work out what's changed?
How do we explain why measures have changed.
We will kick off with some great small multiples visualizations that are designed to help us to quickly see what's going on in a market.
Column with Trend
So, here's a table. It's showing frequency of buying different burger brands over time. What's going on?
I'll give you 5 seconds to see how good you are at figuring out the story in this data.
If you are experience with analyzing trackers, you will hve spotted the key result. Which is that only one brand changed in the last quarter. Arnolds went up by 0.3. Remember this. 0.3.
What's a better way of seeing all of this?
Visualization > Time Series > Column with Trend Test
For you Q users, these are the instructions.
OK, so now we can see that Arnold's clearly has, by far, the highest average consumption.
At the moment the columns are based on the total sample. I like to tweak them to just show the last time period.
Chart > Columns > Period type: Duration
The blue arrow is telling us that it jumped last quarter. And, it's the only brand that went up.
The trend lines at the bottom are showing us trend over the past three years.
So, while Arnold's had a good last quarter, it's actually got long term decline.
We can see that some of these smaller brands are on the rise.
Small multiples with test for trend
But, that previous table was an easy one.
Here's a big grid. It's got 21 columns, but you can't see them all on the page as there are too many to show.
In this data set I've got 27 months of data.
Let's look at how this big grid looks when crossed by month.
It's now got 567 columns. How can we visualize that?
Visualization > Time Series > Small Multiples with Test for Trend.
Now, remember before, we had seen that Arnold's was in decline over the long run. Here we can start to understand why. It's struggling with
a. Value
b. Drinks
c. Cleanliness
d. Decor
e. Design
So, that's the long-term trend. But, before we had seen it had improved in the last quarter. Let's look at the story over the past 6 months
Filter > Month
6 months
So, Arnold's has improved in Drinks, which was one of its long term weaknesses. And, i's gain was perhaps driven by opening hours, fast drive through, and easy drive through.
Testing linear trend
Most statistical tests just compare two columns. How do we test trend? It's simple.
This is the data we were looking at showing number of visits on the first visualization
If I drag across Month, we'll just have a crosstab looking at everything by month.
Instead, I will duplicate this and make it numeric.
Now, Displayr is computing the correlation. And, the correlation is a test of trend. So, that's how the math works.
Now, when we use a correlation we are implicitly testing a linear trend. How do we check for something nonlinear?
Spline with simultaneous confidence interval
We need to use something vey exotic called a spline with a simultaneous confidence interval
As always, if you can't find something in Display or q, we use the Search box.
Search: Spline
Advanced > Test > Spline
We can only do one variable at a time for this analysis, so I need to create a separate duplicate of the Arnold's variable from the variable set.
We need to tell the model to treat the oucome as linear
Regression type: Linear
Warning
OK, so how do we read this thing?
The black line is our best guess at trend.
The pink is our confidence interval.
So, doing the trend analysis, tells us that the long term trend is clearly a decline. But, the evidence for an increase in the last quarter is pretty marginal. That is, it could be a bit of a fluke.
Ok, so we've got one statistical test that shows that we went up in the last quarter, and a second one that says maybe not. That's the reality of tracking. Sometimes the evidence is mixed. How do we choose? You look for other evidence. E.g., sales data that's not from the survey.
Removing seasonality - Seasonal decompsoition
We had a request at the last webinar to discuss how to remove seasonality from data.
This is one of those techniques that's taught in business school, but virtually never used with commercial survey data. The reason is that it only works well when you use 4 or ore years of data, but it's rarely that relevant to look at monthly or quarterly data from a survey over such a long time period. We would instead usually aggregate and look at the annual data
But, let's do it anyway.
So, I've got some data here on Australia beer sales from 1956 through to 2008.
If you look at it, you can see clear evidence of seasonality. OK, so how do we get rid of it?
There's a pretty standard way, called a seasonal decomposition.
We will have to type a few words of code to get it
Click on visualization: Edit
plot(decompose(australian.beer.consumption, type = "multiplicative"))
And there you have it. We can now see the estimated trend, without the seasonality in it.
If you want more detail just reach out to me or support.
Weighting by key measures
OK, so the table at the top is our data on Arnold's again, showing the uplift in consumption the past quarter.
A question of interest is, is the growth due to a growth in terms of the proportion of people eating Arnolds, or, the frequency of consumption among the consumers
We can explore this is by weighting.
Note that in most quarters, Arnold's has been consumed by 94 of people in last month.
Let's now create a weight so it's exactly 94%.
+ Weight
Adjustment variable > Arnold's last month
Now, Displayr' and Q allow you to automatically do the weighting within each time period
Recompute weights for: Quarter
+ New weight
When new quarters are added, this will automatically update
Now, I will select the two tables and apply the weight
Select two tables
Apply Weight: Arnold's Last Month
First, we will check that the weight as worked. The table at the bottom is showing 94% for each period. So, the weight has worked.
Now, look at eh table above. We no longer have a significant increase int he last period. So, this tells us that penetration may have grown, but the evidence that the average frequency grew is not statistically significant.
Automatic text coding
So, here's a Displayr document that's been set up with some automation in it.
I've manually coded four weeks of data on why people dislike their phone company.
You can see the summary table in the top right.
For example, we can see that 16% of people disliked the price of their phone company.
We want to use machine learning to automate all the future coding.
Anything > Advanced Anaysis > Text Analysis > Automatic Categorization > Unstructured Text
And the manual coding that I've already done.
It's got some heavy lifting and will take a few minutes, so I've done it before the webinar
Automatic text coding - pre-done
Now, the key thing we want to look at is the table at the bottom.
What Displayr's done is its used cross-validation to work out how accurately it can categorize the text.
In this example, you can see that when it builds a model on 200 people, it gets 96% accuracy. Which is pretty good. The cutoff is we want .8 or more for Kappa and we have that.
So, we now need to save the automatic categorization out as new variables.
SAVE VARIABALES > Categories
Automatically coded text by week
OK, let's create a crosstab
So, when we update the tracker, what we want to happen is that it adds more weeks of data
Automated quality control check: Age
There's a little snippet of code here. I talked about this in the webinar on data cleaning
It is checking that there are no significant difference in age over time.
If when we update the tracker a significant difference is detected, we will get a warning.
Automated ... Race
And this is the same thing, looking at the distribution of race.
Brand
And, I've got some pretty pages. Now I am going to update this tracker. You may have seen this bit before.
Note that AT&T has a score of 31
we can see 4 weeks of data in the sparkline.
Let's update it now!
First Displayr checks that the files match. It's detected a change in one of the labels. That's fine.
As you can see, the AT&T NPS has gone up, fro 31 to 34
We need to click on each other page to get it to update.
Automated quality control check: Age stable over time
No problem here.
Automated quality control check: Race
Ah, we have an error. For some reason there was s spike in the hispanic sample. We should consider weighting
Automatic text coding - pre-done
This model needs to re-run with the new data. This is going to take a while, so we will come back to it.
I've almosy shown everything here. I will come back to the teg coding. If you want to automate email alerts, please check out the help page on it.
Agenda
So, we've now reviewed everything on the left hand side.
Now let's look at some ways we can work out why measures have changed.
Metric change decomposition
We looked before at the metric where Arnold's consumption had risen in the most recent quarter.
We want to dig into that to see what we can learn.
There's a useful little bit of math for doing that.
Whatever change we seen in a metric can be broken down by sub-group into two components.
For example, if we break it down by Gender, the change we observe can be looked at in terms of
OK, I'll do it manually first, and then we'll automate t.
Change in group size
OK, so let's first explore if there's a change in the gender breakdown over quarter.
So, nothing is significant in the last period.
Change in average consumption within gender
OK, so now we want to see if consumption diffeeed in the genders.
This is a bit tricker, as we need to look at consumption within gender within time period.
First, we select the consumption variable
+ Filter > Filter One Varible Set by Another
So, we want to filter the variable by Gender
So, some new variables have been created
Now, we create the crosstab
Columns: Quarter
Note that there is some movement. E.g., females went up by 1.4 to 1.8
But, not significant.
Now, this process would get a bit tiring if we had to do it manually for every different possible profiling variable.
Change Explorer Prototype
Now, what I'm about to show you is a prototype. We will make it better in the future.
Anything > Visualization > Exotic > Change Explorer Prototype
Outcome: Arnold's
Date: Quarter
Profiling varibles:
a. Gender
b. Age
c. Location
d. C2 Work Status (at bottom)
PERIODS
From: 2017-03-01
To: 2017-06-30
From: 2017-07-01
To: 2017-09-30
OK, so this bubble chart now shows the size of the change.
The ones in blue and red are significant.
The size of the circle tells us the size of the effect.
Remember that Arnold's grew by 0.3.
Among the students, who make up 20% of the market, it actually grew by 0.86. And, if we multiply these two numbers together, we can attribute .17, that is more than half of the growth, just due to the change in the mean among the students!
There was also a small drop among unemployed, but as the group's so small, it doesn't have a big effect.
Contribution: Change in group size
As I mentioned before, we can decompose effects into
a. Change in means in sub groups
b. Change in group size.
There's also a significant effect due to in he unemployment group size. But, look carefully at the legend here. The actual change is significant, but it's trivially small so we can ignore it.
Entertaining the audience
As I mentioned in the beginner's guide, a key bit to tracking presentation is coming up with new material to keep the audience entertained.
I'm going to share some of the things I've found work.
Brand health and conversion
Just about everybody loves a good band health table and comparison of conversion ratios
Deep dive
You can learn a lot by doing scatterplots of brand health metrics.
Here I have eaten last month by # times eaten.
As you can see, there is a strong correlation between the two.
This happens so often it has a name. It's a law called Double Jeopardy.
If a brand is poor on one metric, it will likely be poor on other metrics.
Note that this completley changes how we look at Burger Chef. It's actually doing rather well in terms of conusmpton when we ake the law into account.
Variations
By looking at the shape of the patterns you can get additional insights.
Exponential, for example, is a winner take all market. Social media markets often work like this.
Basic brand vulnerability matrix
We can learn a lot by contrasting brand attitude with behaviour.
The interesting bits are the people who consumer something they don't like, and vice versa.
... Example
Profiling the various discordant groups gives insight. E.g., we can see here that for this brand, it's the drive through which allows it to have customers that don't like it.
Market maps
Everybody loves a market map
Performance importance
And quad maps.
Quad map for Coca-Cola
Return to tracking automation
And, while I've been talking Displayr's been automatically coding. As you can see, its completed the coding of all 12 weeks of the tracker. Fully automated!
Read more