Webinar

Text Analytics: Quickly turn text into strategy

The world is full of text. From open-ended responses, through to social media and product names.

In this webinar we will show you how to

  • Automatically identify themes in your data.
  • Automatically classify text data into themes (including existing themes).
  • Modify AI prompts to fine-tune your text analysis.
  • Efficiently use categorized text data in other analysis (crosstabs, trend analysis, etc.)

Transcript

AI has and continues to transform how we can and do analyze text data. It’s the biggest productivity gain our industry since online interviewing.

Today I’m going to take you through how AI can speed up and improve the quality of text analytics. I’m going to focus on market research applications, but the same tools are just as useful with social media and other forms of text.

As usual, I will demonstrate this in Displayr. But, you can do much of this in Q. However, I would stress that Q’s AI is much older technology than Displayr. Modern AI needs to be on the cloud, and Q can’t do it as it’s desktop software.

Overview

There’s a widespread misunderstanding about how to perform useful text analytics. We’re going to start off looking at that.

The right approach is to quantify, and then use standard quant tools.

We’ll then look at using AI to:

- Find themes in data
- Classify text into themes

We’ll explore prompt engineering

And how to calculate dimensions

And, we’ll finish off looking at back coding and brand list data.

The wrong way to think about text analytics

The wrong way is to think about text analysis is, as a data visualization problem.

Case study 1

If you’ve been to one of my webinars before, you’ll be familiar with this fun data set.

It seems odd today, but a few years ago in my home land of Australia, Tom Cruise was public enemy number 1, and we asked people why they didn’t like him.

The text

The laziest reporting option is to give all the text responses to the end-user of the research.

I’ll give you a moment to scan some of the responses.

Word Cloud

About 20 years ago, the first of the cool visualizations appear: Word Clouds.

While our word cloud tool is pretty cool, they’re pretty superficial too.

Word maps

There are lots of ways of improving word clouds. For example, in this word map:

- Circles better communicate size than font size.
- Words that usually appear together, bigrams, are shown in a singlec ircle. E.g., Don’t know.
- Circles that are associated are closer together on the map.

Is this better than a word cloud? Sure.

Are they useful as a source of insight?

No.

In the 1960s and 1970s the market research industry was very focused on seeing if it was possible to create data visualizations to summarize all data. This was is known as multidimensional scaling.

What we learned back then was that other than for brand positioning, such maps are usually unhelpful. It’s still true today.

One of the problems with the word map is that they try compress lots of information into two dimensions, and this is an over simplification.

Networks of feature co-ocurrences

An alternative is to draw lines showing the relationships.

These look cool.

I’ll give you a moment to see what insights you can get from this.

In the 20 years since I first saw one of these, I have never, ever, seen one that provides any insight.

Look at this one. It tells us that Tom and Cruise are linked and some people like him. Wow!

More worrying though with things like this is the false positives.

For example, consider seems and arrogant. You’d be tempted to say “there’s a weak but clear relationship showing that Tom Cruise seems arrogant”. But, it’s just 3 people that said it.

So, better than a word cloud yes, but very useful? No.

Quantify, then use standard tools

The successful way to do text analytics, is to first quantify the text, and then use completely standard tools.

Text analysis is the middle step

Text analysis is the middle step. The objective should be to summarize the text as one or more variables. Either categorical or numeric.

That is, in the market research jargon, the goal is to *code* the data.

We’re going to go into this in detail now.

Identify themes with AI

Until recently, the best way to identify the themes in text data was to read them. But that’s all changed now.

Now, I get that a lot of you out there are waiting for AI to do all the work. But, we’re not their yet. We’re still in the era of AI human collaboration.

Approaches to theme identification

With the state of the AI technology today, this is my preferred approach to finding themes. I will show you a live example in a second. But, here’s the basic flow:

Ask the AI to create 10 themes and classify all the responses into the themes.
Then, tweak the themes. All show you this in a live example in a moment.
Here’s the instructions in Q and Displayr.

There are other approaches. But, I do want to call one thing out.
Today, AI can’t accurately work out the number of themes. You’re much better off just guessing the number 5 than leaving the job to AI.

This will change in the future, but it hasn’t yet!

OK, let’s return to Tom Cruise and do it!

Data set > What is it you dislike? > Text Categorization > Start*

For all our users out there, here’s our new user interface, just launched today! This user interface will appear in Q soon as well, but, it wont’ be using the modern AI algorithm, sadly, as it needs to be on the cloud and Q isn’t.

If we know what theme we want, we can manually type them in. But, we’ll get the AI to do the hard work.

Displayr default to 10 themes.

And, we’ll tell the AI to classify our responses into the the themes

Step 1 is to spot check the results, theme by theme.
Let’s profile this.

Let’s view this as a chart.

Object inspector > Visualization > Bar > Bar with skews

I’ve got a templated version of this chart that I prefer.

So, in no time at all, we’ve come up with a pretty good text categorization of this data set, identified how sentiment vary by various factors.

Let’s look at some longitudinal data, from a NPS tracker.

Here I’ve data from the first two months of a tracker, showing dislikes about phone companies. Now, I want to draw your attention to the caption at the bottom of the table. Currently, there is only missing data for 6 respondents.

Let’s bring in some more data. This is a cumulative file, that includes the data from the first two months but adds more data.

It’s automatically added a couple of columns. But, note that we’ve now got lots of missing data.

What’s happened? Anybody who has said exactly the same thing as somebody before has been automatically classified. But, where people have said something new, we need to run the AI again to classify them.

We’ve got 604 people that are not categorized. Let’s classify these people.

If you had more spare time, I’d manually classify the last 1%.

And, as often happens in trackers, none of the differences are significant! But, the good news is we’ coded the extra 1700 respondents in a few seconds to learn this. So, if it had cost as $2 per response, we just saved $3,500!

Prompt engineering

Everything I’ve showed you so far is about getting text analytics done faster. Let’s do something smarter.

Here’s some grocery SKUs. So, we’re not looking at open ended data at all now. Let’s get AI to automatically group it.

+ > AI > Text Categorization

As you can see, it’s worked out the product categories

Let’s do something a bit more interesting. Rather than finding things that are similar, we will let the AI draw conclusions.

*+ AI > Custom Text

Assign all the SKUs into the following categories:
Contains sugar
Does not contain sugar

Calculating dimensions with AI

Everything we’ve done so far has been about categorizing text. That is, classifying a text response as being in one or more themes.

However, rather than saying text is in a theme, we can calculate a score, reflecting some numeric dimension.

The most well known example of this is sentiment. I’ve got some chinese hotel reviews here.

Let’s translate these.

AI > Translate> Source language: Chinese

And we will calculate sentiment

AI > Sentiment

Now, just as we did with the categorization before, I can engineer the prompt. I can illustrate this with a very simple example.

Prompt: 'Please give a score based on the number of different aspects of the hotel that are discussed, giving a score of 1 if they only discussed one thing, 2 if they discussed 2 things, etc.'

Back coding

Back coding, which is also called upcoding, is a nice little time saver.

In this example, we asked people how their household was structured, and one peson chose the other specify option.

Let’s use text categorization for them.

Data > Colas.sav > Living arrangements - other > Text Categorization

Now, we need to hook this up to the coded data.

And, as we can see, we’ve moved the one person from other to th3e correct category.

Spontaneous and top-of-mind awareness

In market research we often ask people to type lists into text boxes. For example, which cell phone providers can you think of? This is known as spontaneous awareness data.

It can be categorized using the tools that I’ve shown you so far, but we’ve got specialist tools just for this.

Case study 3

We’ll use some data we have on spontaneous awareness for American cell phone brands.

This specialist feature is a bit hard to find.

… > Text analysis > Automatic categorization > List of items

As you can see, it’s automatically identified a list of phone carrier brands.

Look at all the different variantes of Verizon that it’s found in the first line.

While Displayr’s been pretty clever, we’ve still got a bit of work to do. Note that
1. AT&T also appears as Att
2. We’ve got Tmobile and T-Mobile

We can merge these categories

We can give Displayr more detailed instructions

There are many more options. It’s always a trade off about how much time you want to spend optimizing

Now let’s just save the first cateogry selected, which is what’s known as top of mind awareness in the trade

SAVE VARIABLES > First category

Overview

So, this is what we covered!

Read more

Live Webinar on Automate your PowerPoint Reports

Register now
close-link
I'm Online

Got 5 mins? I'm online if you want a quick Displayr demo

close-link