Juan Venegas, Senior Data Trainer, Growth Tribe
During the second month of my PhD in biophysics, I went out with a girl who asked me the most terrifying question.
“What’s your PhD about?”
After a deep breath and a long sip of beer, I threw myself into an explanation about genetic switches and master equations, infusing my gestures with a passion that almost knocked my drink off the table. But the moment I got to gene expression and time derivatives, I saw the blood draining from her face.
She, like many stakeholders, was pretty clever, but lacked the necessary background knowledge.
The moment your non-data colleagues hear terms like “data cleaning” or “ROC-AUC”, their attention moves from your presentation to their dinner plans.
This is when analogies, which soften the impact of technical terms and show the audience more familiar concepts, come in handy.
When creating analogies, most come to us unannounced, halfway through a shower or while we debug our code with bleary eyes.
But sometimes they don’t come to us, so we must chase them. This article will give you a roadmap for that hunt.
A bit of terminology
Let’s first introduce the usual terminology for analogies in cognitive psychology:
- The domain in which our data project lives is called the target; in other words, this is the real world.
- The domain in which our analogy lives is called the source; in other words, this is the analogy world.
The idea behind an analogy is to move the data project from the target to the source, explain the analogy version within that domain, and then return to the target, where we can put the insight to good use.
The process to generate analogies for data storytelling
Step 1 — Understand your problem in the target domain
We often struggle to generate analogies because we don’t understand our data project well enough. So, before searching for a source, make sure you know your target, that is, your data project, inside out.
Outline your project or summarise it out loud — those are good ways to find gaps in your knowledge.
I’ll use an example during this process. Let's imagine we’re creating a prediction model to determine which users in our application will churn during the next quarter. To do so, we need to collect behavioural data from the users for a certain period. After a lot of trial and error, we find out that 2 weeks of behavioural data is enough to predict churn within the next 3 months.
That the observational window is much shorter than the prediction period might make sense to us after crunching numbers, but it isn’t obvious to anyone who hasn’t seen the data. They probably wonder, “is 2 weeks long enough?”
We might not have time during our presentation (or in this article) to go through the technical details, so we need to come up with an illustration to justify this.
It’s healthy to remember that, as data professionals, we must be familiar with all the technical details in our project. But as data storytellers, we also get to determine which details the audience needs.
In this case, we’re going to tell them that 2 weeks of behavioural data is enough to predict 3 months of churn in advance. They can do without the statistical details, because these will distract them from the main point.
Step 2 — Brainstorm possible sources
This is the core step of the process, and the most difficult one. It’s even more difficult if you try to come up with the right analogy in one go. It’s much simpler, as many storytellers and creative people will tell you, to first come up with many options (step 2 in this article) and then select the right one (step 3).
So, for this second step, let’s brainstorm different options to find our source.
It’s OK to go over the top, to be melodramatic or plain stupid. It’s also OK to be completely wrong. We’ll let our critical minds do the sifting in the next step.
For our example of the prediction periods, we may brainstorm the following sources/analogies:
- A friend says they’ll borrow a book for “just 2 weeks”, but they end up returning it after 3 months. I know you see things wrong with this analogy. That’s OK. This isn’t the moment to say no, but to keep brainstorming.
- A colleague of mine needs 3 months to finish a project; but another one could be done in 2 weeks.
- I start dating someone. After 2 weeks, I can already see we’re not going to make it through the summer.
Step 3 — Assess and adjust
I’ve written down 3 analogies to illustrate the brainstorming process, but you’ll usually have dozens of them. Discarding the useless ones should be an easy task. Then, once you have a few left, you can evaluate them more carefully. Let’s have a look at the 3 options from the last section.
- Source 1: a friend borrowing a book. In this case there’s no separation between the periods of observation and prediction, between the 2 weeks and the 3 months. Could we say that if our friend hasn’t returned the book in 2 weeks, then it’s likely they won’t return it in 3 months? We could, but they might return it in 2 weeks, and then it wouldn’t make sense to think about the next 3 months, because we already got the book back. This is when the analogy breaks.
- Source 2: colleagues finishing projects. In this case, the two time periods are independent. They’re not consecutive or connected in any way. This one is easy to discard.
- Source 3: dating. You might have noticed I talk about dating a lot. There’s a reason behind that: some topics, like survival, money and love are primal, which makes them universal and understandable. That’s why there are so many stories about them. Moreover, in this case, we’re examining the behaviour of 2 weeks of relationship to predict its survival during the next 3 months, hence mirroring the target problem.
We might take this even further. As the key idea to convey is that a short period of time suffices to predict churn during a longer timespan, we don’t need to fixate on the 2 weeks and the 3 months.
We could say the girl looks at us for a second when we meet, and the reaction in her face indicates that something will go wrong somewhere along the night. If her expression is agreeable, the night might go well. This guarantees that the analogy works both in the positive and negative scenarios.
Sometimes, we won’t find a satisfactory analogy. In that case, we should repeat steps 2 and 3 until we hit the nail on the head, or until we hit the wall with ours.
Step 4 — Come back to the target
This part of the process is key to make the most of any analogy, because it’ll allow your audience to become more familiar with the target problem, the one that matters in the end. To do so:
- Make your analogies as watertight as possible, that is, make the steps in the source mirror the ones in the target.
- When you present your analogy, spend some time back on the target problem. Ask your audience to make the translation from the source back into the target.
If I were presenting our example, I’d repeat a couple of times that we’re going to use 2 weeks of behavioural data to predict 3 months of churn, and that we have plenty of indicators within those 2 weeks to make that prediction successfully. I’d also emphasise that this applies to this particular model, and that this isn’t general.
Many stakeholders, especially C-levels, reason by analogy. In this case, they might use the dating example to understand other problems out there. They’ll do this successfully if they understand one needs to determine the appropriate observational period to make a prediction; they’ll do it poorly if they assume that 2 weeks is the period we should always use, or if they think that dating is a valid analogy for our product or our data.
Short vs. long analogies
The analogy we’ve covered is a short one. Sometimes, however, we may need to guide our audience through a long data process, where it’s easy to get lost. In that case, you may consider a global analogy as a skeleton for the whole presentation.
For example, let’s say we have a dataset made up of web sessions from different users of our platform, and we’d like to cluster these customers. We may follow this data process:
- Clean the data.
- Group the sessions, so we condense all of them in one row, which corresponds to one user.
- Cluster the users.
- Name the clusters
- Assign cluster labels to existing users.
- Create a prediction model to assign cluster labels to new users.
If you’re familiar with clustering techniques, you’ll find this straightforward; if you aren’t, you might have got lost around the grouping stage. The important part is that we have mapped the target problem clearly.
After applying the procedure to generate analogies, I decided to use a furniture shop as a source. Imagine you acquire a furniture shop and all the pieces are dusty and piled up in a warehouse. To set up the store, you do the following:
- Clean the pieces of furniture.
- Group different pieces to be sold together, as a set, if necessary.
- Classify the sets into categories.
- Come up with category names that you’ll display in the aisles.
- Carry the pieces of furniture to the corresponding aisle.
- When new items arrive, take them to the assigned aisle.
This analogy allows the audience to have a global view of the problem.
Notice how I could have also chosen a supermarket for the analogy, but the grouping of different pieces into a set mirrored the grouping of web sessions better.
Remember that you always have freedom in the source to adjust the problem to the target.
Step 5 — The sugar on top
You might have noticed that while the dating analogy had a bit of a story in it, the furniture shop was lacking one. Broadly speaking, if you don’t have conflict or character, you don’t have a story. The question is, can we create a story within our analogy? You bet.
This section is called “the sugar on top”, because this last step is neither easy nor necessary. For example, in the furniture shop analogy, you can make yourself or someone in your audience the new owner and emphasise how dusty the furniture is and the struggle to determine the names of the categories. That could turn the analogy into a story, but I don’t think it enhances it much.
Sometimes you’ll see the story clearly; often, you won’t see the need to introduce one. That’s completely OK. If you have the time to make all the pieces work together, go for it. That will give you the clarity of an analogy and the entertainment of a story. You might end up with one of those riveting and clear illustrations, like the bar scene from “A Beautiful Mind”.
As you can see, this is another dating analogy. They barely fail.
Analogies are one of the best tools for a data storyteller. Even though we’ve established they’re not necessarily stories, they’ll keep your audience entertained and receptive.
Remember to apply the following steps:
- Understand your problem in the target domain
- Brainstorm possible sources
- Assess and adjust
- Come back to the target
- (Optional) Storify your analogy.
Give it a go before your next presentation or your next chat at a bar. Your colleagues and dates will thank you for it.