How to Understand Data and Information
In this current timeline not only do we have to deal with data and metrics in our current positions, but we must deal with it in the public sector as well. Everywhere you look there are charts and graphs and numbers that are supporting some narrative or the other. As you well know the numbers sometimes support the story and sometimes, they do not. So how can we start understanding the fundamental nature of sharing data that helps you understand a situation? First place is to start to understand how data is presented for you to consume. The best data is always clearly presented and unambiguous. We are seeing all kinds of different numbers however and during a pandemic not having the same baseline is a bad thing. I was once dragged into a boardroom argument at a major car manufacturer because they could not reach consensus on something incredibly straight forward. The company presented a yearly award for most cars sold to a dealership yet there were conflicting numbers being shared and they brought me in to validate what was going on. While I could see there was a disparity in the numbers it was not immediately clear why, then as I looked further, I realized that they had pulled from two different reporting systems. There were two separate dealerships that had sold the most cars but then I looked at the metadata. The problem was that one dealer reported cars sold as new cars and the other system was reporting new and used cars. Clearly both were correct but the lack of governance of the data made this a bigger deal than it needed to be. Now when we have a shared data set being interpreted by multiple news outlets what can we do to really make sure we are understanding what is being conveyed because most certainly a lot of the people presenting that data are not doing it correctly. Let us look at ways data can be managed to tell a story and not in a good way.
Often you will hear something like cases have doubled in a particular area and this will elicit concern but if you look closely at New Zealand where there are exceptionally low rates of transmission you could also say the same thing. If the numbers are low say as in 3 to 4 cases, then getting 8 cases does not seem to be of concern but you can also say that the numbers have doubled. One sells the story but does not define the reality. When looking at data the context is important as well as what particular outcome you are looking for. Which leads us to confirmation bias, this is a basic normal progression of confirming something you already believe. For instance, if you think all stores are closed on Sunday and see multiple stores closed you will believe that to reinforce your theory. Looking for evidence to support a theory may lead to a mountain of it but it should take one store being open to just end that theory. However, for the most part we tend to do this unconsciously and look at data to confirm what we think is happening. So if you are seeing charts that confirm or refute your beliefs you need to step it up a bit with understanding how to normalize data and make sure they did that before presenting facts to you.
Normalized data is really making sure we are comparing apples to apples. If you look at population in the USA there are approximately 330M people versus say the UK which has 66M. In relative terms we have 5X more people. With that in mind 33M people in the US is 10% but, in the UK, it is 50%. So, when we talk numbers we need to take this into account because we if we had that amount sick in both countries it would be a much bigger deal in the UK versus the US. We are at a point where depending on the news source you can take your pick on who is doing worse. So, take a look at the data and normalize it and then make your appraisal.
As numbers are shown the next thing to know is if it is a leading or lagging indicator. This to me is one of the biggest mistakes in business and in media. So, let us start with the business side first. Lagging indicators are often used on dashboards to measure performance. An example of a lagging indicator is your utility bill. You will go through the whole month not knowing what it is and then it shows up and you overused your air conditioner and now you must pay for it. You never knew what the bill would be until it showed up, no leading indicator of what was going to happen except a certain gut feeling you might be in trouble. This is how a lot of businesses look at revenue, the problem is that looking at revenue versus last year etc. might tell you that you are doing badly but it does not tell you what to do about it. If you were looking at something like bookings daily, you can probably gauge that better. So make sure you know what type of indicator you are looking at because when you are looking at how many people have died today it is really how many people died two weeks ago because it takes a while for the hospitals and coroners to report these things. So, the number means that the deaths were high two weeks ago but we do not know if the situation is better or worse today because there was a lag yet it is being reported as if it was today. If you know there a loosening of restrictions and people were able to go out more a month ago you can extrapolate as to what caused it but still not know the situation today because the indicator does not relate correctly. While lagging indicators can be useful, they can also cause you to make decisions in the current state when you do not have an accurate picture of what is happening. There are better indicators for that and let us start thinking about what that might look like. Testing for instance has been a huge deal in every country. The number of positive tests depends on the population being tested. If you look back at the UK example a million tests in the UK and the US mean different things. If you test more people you are going to get a higher number of positive tests so it’s not helpful to know unless you have more information. This is where you need to start looking at rates and hospitalizations. If you look at initial hospitalizations, how many people are in the ICU vs regular wards, # of patients on ventilators and deaths you start getting a better feel. How you look at an indicator can also change if it is a lagging or leading indicator (context is key!) because initial hospitalizations is a lagging indicator for new infections but becomes a leading indicator if you are looking at hospital capacity, ICU capacity and deaths. Nuances like this are often lost during the plethora of numbers leading to decisions based on lack of understanding. The occurrence of decisions made on data without understanding what the data is telling you is one of the key areas we see customers build dashboards that do not reflect their strategy. So, looking at the numbers during a pandemic? Look at how many patients are being hospitalized and what the current capacity is for your state. That will tell you not how things are going but how they are being managed. As you know you cannot manage what you cannot measure. This allows you to make rational decisions because in the end that is the context you need. So, to sum up, you need to understand the basics of data:
1. Context is key
2. Confirmation Bias
4. Lagging Indicators
5. Leading Indicators
6. When a lagging indicator becomes a leading indicator