Data-driven steering continues to call for a story to go with the numbers

Image Mabel Amber via Pixabay

The national government, municipalities, provinces, water boards and police; all are exploring and introducing new possibilities around data-driven steering. And there’s a lot to be said for that. Assumptions and opinions can be substantiated or invalidated on the basis of concrete figures. For directors and managers, figures are a pleasant guide to choices to be made and policies to be implemented. After all, you can’t disagree about concrete figures, can you?

Yet there are plenty of reasons not to get too comfortable with justifying the choices to be made on the basis of figures derived from data analysis. There is a lot involved in getting the quality of the information right. And with increasingly complex analyses using, for example, machine-learning algorithms, that quality is essential.

First of all, the more advanced the machine-learning algorithms become, the harder it is to fathom them. This applies not only to the outcome of the analysis, but also to the way in which it was obtained. This phenomenon is also known as black-box algorithms. The smarter our tools get, the less well we understand them, especially when they are self-learning systems.

Machine-learning algorithms, by the way, actually do nothing but count and categorize. That is, of course, very short of the mark, but contemporary computers are still ultimately best at what their name implies: counting. And in this case they do that by looking in large data files, sorted or unsorted, for possible connections. For example, between figures on vandalism and poverty, between figures on health and unemployment, and so on.

What is the right data for the issue?

But for many social issues it is not so easy to make a good analysis. For example, what exactly is the data you need to understand a theme such as “loneliness among the elderly”, if that is a political priority in your municipality?

Your instincts lead you to a combination of indicators such as ‘single’, ‘over sixty-five’ and ‘disabled’. But are those really them? And are they all? For example, what is the effect of having a dog on loneliness? And with the shifts in the state pension age, even ‘sixty-five-plus’, formerly the pre-eminent indicator of old age, is already no longer a fixed fact. We are getting older on average and participating in the work process to an increasingly advanced age.

Data quality is the basis

Apart from determining the indicators you need to grasp the essence of an issue, there is also the challenge of data availability and quality. Do you have these data sets at all, or have they never been captured within your organization? And are they stored in a usable format, to be analyzed and compared?

Then follows the question of whether the available data files are current, complete and correct? It won’t be the first time that, due to a faulty link, certain data from the chain turn out not to have been synchronized properly for a long time. With all the problems in the primary processes as a result. It is not rocket science to understand that the better the data sets, the better the quality of the outcome of an analysis. And furthermore, the more complex the chain, the more important it is that the data design and data management are in order.

“Figures are ultimately figures. Just like stories are ultimately stories.”

But then we’re not there yet. Even more elusive is the potential bias (aka “bias”) that may have crept into a dataset when building the information. Take, for example, vandalism in a particular neighborhood: if there are signs of vandalism, more patrols are conducted by enforcement officers, who will observe more expressions of vandalism (after all, they are more present in a neighborhood) and register this in the dataset. This then shows increasing vandalism in the neighborhood in question, with even more presence of enforcement as intervention. You can already feel that the neighborhood quickly comes off badly in the figures, and subsequently in the policy based on them!

Even reliable data is not enough

But even if your data is current, complete, and accurate, with no bias, there can still be unjustified connections, called correlations. If you’re lucky, the irrelevance of the connection is crystal clear. Take, for example, the numerically demonstrable correlation between per capita consumption of cheese and the number of people who died from choking on bedding in the same year. Based on actual open(exposed) data from the period of 2000-2009 in the United States, a correlation in the data between the two can be observed with 94% certainty, as Tyler Vigen illustrates on his website with several hilarious examples.

Source: Tyler Vigen, https://tylervigen.com/old-version.html
Fortunately, in this example we can quickly see that it is sheer nonsense to correlate these data in terms of meaning. But what if we compare data sets where this is not so easily identified as a ‘false-positive’? For example, by that they could have some content in common? It is important to realize that a numerical correlation does not mean that there is a causal relationship.

Rules of thumb as a helping hand

In short, in exploring and applying data-driven governance in government, it is essential to be aware of the pitfalls and lessons to be learned. We are really only at the beginning of this development, there is still an incredible amount to discover together. A few rules of thumb can help:

  • Be open and transparent in what you do with your data and how it is generated
  • Look for other forms of confirmation of the image that emerges from the data analysis, especially if it is complex in nature
  • Engage with residents, colleagues and experts to learn from and understand each other; including social context, personal stories and technical aspects of the technology
  • Do you get a stomach ache when making a choice that should be crystal clear based on numbers, or does your intuition say something else, be sure to give your feelings a place in your consideration

Figures are ultimately figures. Just like stories are ultimately stories. With all the technological possibilities, the temptation is great to embrace data analysis as the new basis for policy and decision making, but the question is whether this will allow us to do full justice to reality.

By continuing to explore, share and empathize with the story behind the numbers, both can strengthen each other in achieving data-driven governance. At the same time, by developing the right expertise in the field of data-science, data-analysis, machine-learning and other technology, we can work on a solid foundation for fulfilling the role and tasks of the government in the information society.


Want to know more about data-driven governance in practice? Starting in 2020, Futura Nova is offering the “New Dimensions in Data” course. Using case studies, we will discuss the opportunities and challenges that call for strategic choices and an appropriate action perspective for the role, positioning and functioning of government, both official and administrative.

Are you curious about the possibilities? Please get in touch to receive information about this training.