During the second month of my PhD in biophysics, I went out with a girl who asked me the most terrifying question.
“What’s your PhD about?”
After a deep breath and a long sip of beer, I threw myself into an explanation about genetic switches and master equations, infusing my gestures with a passion that almost knocked my drink off the table. But the moment I got to gene expression and time derivatives, I saw the blood draining from her face.
She, like many stakeholders, was pretty clever, but lacked the necessary background knowledge.
The moment your non-data colleagues hear terms like “data cleaning” or “ROC-AUC”, their attention moves from your presentation to their dinner plans.
This is when analogies, which soften the impact of technical terms and show the audience more familiar concepts, come in handy.
When creating analogies, most come to us unannounced, halfway through a shower or while we debug our code with bleary eyes.
But sometimes they don’t come to us, so we must chase them. This article will give you a roadmap for that hunt.
Let’s first introduce the usual terminology for analogies in cognitive psychology:
The idea behind an analogy is to move the data project from the target to the source, explain the analogy version within that domain, and then return to the target, where we can put the insight to good use.
We often struggle to generate analogies because we don’t understand our data project well enough. So, before searching for a source, make sure you know your target, that is, your data project, inside out.
Outline your project or summarise it out loud — those are good ways to find gaps in your knowledge.
I’ll use an example during this process. Let's imagine we’re creating a prediction model to determine which users in our application will churn during the next quarter. To do so, we need to collect behavioural data from the users for a certain period. After a lot of trial and error, we find out that 2 weeks of behavioural data is enough to predict churn within the next 3 months.
That the observational window is much shorter than the prediction period might make sense to us after crunching numbers, but it isn’t obvious to anyone who hasn’t seen the data. They probably wonder, “is 2 weeks long enough?”
We might not have time during our presentation (or in this article) to go through the technical details, so we need to come up with an illustration to justify this.
It’s healthy to remember that, as data professionals, we must be familiar with all the technical details in our project. But as data storytellers, we also get to determine which details the audience needs.
In this case, we’re going to tell them that 2 weeks of behavioural data is enough to predict 3 months of churn in advance. They can do without the statistical details, because these will distract them from the main point.
This is the core step of the process, and the most difficult one. It’s even more difficult if you try to come up with the right analogy in one go. It’s much simpler, as many storytellers and creative people will tell you, to first come up with many options (step 2 in this article) and then select the right one (step 3).
So, for this second step, let’s brainstorm different options to find our source.
It’s OK to go over the top, to be melodramatic or plain stupid. It’s also OK to be completely wrong. We’ll let our critical minds do the sifting in the next step.
For our example of the prediction periods, we may brainstorm the following sources/analogies:
I’ve written down 3 analogies to illustrate the brainstorming process, but you’ll usually have dozens of them. Discarding the useless ones should be an easy task. Then, once you have a few left, you can evaluate them more carefully. Let’s have a look at the 3 options from the last section.
We might take this even further. As the key idea to convey is that a short period of time suffices to predict churn during a longer timespan, we don’t need to fixate on the 2 weeks and the 3 months.
We could say the girl looks at us for a second when we meet, and the reaction in her face indicates that something will go wrong somewhere along the night. If her expression is agreeable, the night might go well. This guarantees that the analogy works both in the positive and negative scenarios.
Sometimes, we won’t find a satisfactory analogy. In that case, we should repeat steps 2 and 3 until we hit the nail on the head, or until we hit the wall with ours.
This part of the process is key to make the most of any analogy, because it’ll allow your audience to become more familiar with the target problem, the one that matters in the end. To do so:
If I were presenting our example, I’d repeat a couple of times that we’re going to use 2 weeks of behavioural data to predict 3 months of churn, and that we have plenty of indicators within those 2 weeks to make that prediction successfully. I’d also emphasise that this applies to this particular model, and that this isn’t general.
Many stakeholders, especially C-levels, reason by analogy. In this case, they might use the dating example to understand other problems out there. They’ll do this successfully if they understand one needs to determine the appropriate observational period to make a prediction; they’ll do it poorly if they assume that 2 weeks is the period we should always use, or if they think that dating is a valid analogy for our product or our data.
The analogy we’ve covered is a short one. Sometimes, however, we may need to guide our audience through a long data process, where it’s easy to get lost. In that case, you may consider a global analogy as a skeleton for the whole presentation.
For example, let’s say we have a dataset made up of web sessions from different users of our platform, and we’d like to cluster these customers. We may follow this data process:
If you’re familiar with clustering techniques, you’ll find this straightforward; if you aren’t, you might have got lost around the grouping stage. The important part is that we have mapped the target problem clearly.
After applying the procedure to generate analogies, I decided to use a furniture shop as a source. Imagine you acquire a furniture shop and all the pieces are dusty and piled up in a warehouse. To set up the store, you do the following:
This analogy allows the audience to have a global view of the problem.
Notice how I could have also chosen a supermarket for the analogy, but the grouping of different pieces into a set mirrored the grouping of web sessions better.
Remember that you always have freedom in the source to adjust the problem to the target.
You might have noticed that while the dating analogy had a bit of a story in it, the furniture shop was lacking one. Broadly speaking, if you don’t have conflict or character, you don’t have a story. The question is, can we create a story within our analogy? You bet.
This section is called “the sugar on top”, because this last step is neither easy nor necessary. For example, in the furniture shop analogy, you can make yourself or someone in your audience the new owner and emphasise how dusty the furniture is and the struggle to determine the names of the categories. That could turn the analogy into a story, but I don’t think it enhances it much.
Sometimes you’ll see the story clearly; often, you won’t see the need to introduce one. That’s completely OK. If you have the time to make all the pieces work together, go for it. That will give you the clarity of an analogy and the entertainment of a story. You might end up with one of those riveting and clear illustrations, like the bar scene from “A Beautiful Mind”.
As you can see, this is another dating analogy. They barely fail.
Analogies are one of the best tools for a data storyteller. Even though we’ve established they’re not necessarily stories, they’ll keep your audience entertained and receptive.
Remember to apply the following steps:
Give it a go before your next presentation or your next chat at a bar. Your colleagues and dates will thank you for it.