A world awash with data

With no shortage of data and powerful AI tools to analyse them, many organisations failed to predict the consequences of Covid-19.

On February the 20th 2020, the price of crude oil was $60 a barrel while the Dow Jones Industrial Average went above 29,000. Nearly a month before these lofty valuations, the Chinese authorities had placed the City of Wuhan under strict lock down. A few weeks after the stock price highs, the Dow crashed 30% and oil was trading below $20 a barrel.

Despite an ocean of data, the world was still awash with oil and most asset classes were hammered.

How can it be, with the incredible progress of artificial intelligence and more data than we know what to do with, our prediction engines could be so faulty? What is going on here?

Data as the new oil is a terrible analogy. For sure, the world is drowning in oil and data both. They each bring riches to a select group of companies. But their similarities pale beside their differences.

There is a finite supply of oil unlike data, the creation of which is accelerating (estimated to be 2.5 quintillion bytes per day). Oil, whilst currently very cheap, has a measurable value and can be traded as a commodity. Oil can be stored and not lose value because of it. The US government stores 700m barrels of oil in 60 subterranean salt caverns, as part of its Strategic Petroleum Reserve. At today’s prices, this oil store has a known value, roughly $14 billion.

How much are data worth?

The European Union in 2019 estimated the valuation of personal data at €1 trillion, approximately 8% of the EU’s GDP. I don’t understand the basis for this estimate and besides, it is entirely meaningless. There is no credible or trusted “Data Economy”. There is no market where data can be easily bought and sold. It does happen but with a high degree of risk to the buyer.

IBM learned that lesson. That great public relations event, the Jeopardy game show in 2011, emboldened the then CEO, Ginny Rometty, to invest nearly $15 billion in Watson, it’s artificial intelligence platform. This included $2 billion to acquire Truven Health and a further $2.6 billion for The Weather Company. IBM was acquiring data to help build commercial offerings.

How did IBM put a value on that data? Could IBM have foreseen a lawsuit in January 2019 by the City of Los Angeles against The Weather Company, on grounds of data privacy? On a return on investment basis, IBM likely paid far too much. Unlike the Oil Futures, there is no reliable Data Exchange.

One man’s rubbish is another man’s treasure.

Oil takes millennia to form and unless refined, does not change much. Data, by comparison, is easily created and is very fungible. We give most of our personal data away for free. Seemingly to us, it has no inherent value. For the FAANGs (Facebook, Alphabet etc), data are a force multiplier which is why we get free, or discounted services, in exchange.

Artificial Intelligence platforms were supposed to be the new oil refineries. Oil refineries turn crude oil into useful products we can all use, like detergents, shampoo and heart valves! What does Big Tech’s AI turn our data into? More profit for itself, through better advertising, personalisation and visual recognition.

One could argue then that the proceeds from oil are more widely dispersed than from data.  Whatever trickle-down economic benefit the general population receives from oil is unlikely to be repeated in the age of artificial intelligence. The data they have has little value to anyone but their immediate competitors. Unlike oil, data are not openly traded or exchanged. Big Tech’s monopolies will likely get even bigger and more powerful.

For most ordinary businesses, the utility value of every additional one Terabyte of data could be outweighed by the cost of managing that data. Government regulations such as the EU’s GDPR or California’s Consumer Privacy Act, are easily manageable taxes for companies like Amazon and Alphabet. For the rest of us, storing data is increasingly expensive, despite the fall in hardware prices.

A wicked problem

Being in possession of lots of data is no insurance policy against loss. As I said earlier, data do not prevent bad forecasts and unreliable probability calculations, especially when tail risks materialise, as with the Covid-19 pandemic.

Data are important to individuals and businesses, large and small. However, the value of data is highly user dependent. If business leaders are to maximise this store of wealth, first they must identify what data is most important, then assess what they want to do with it. Only then, can they quantify how much to invest in managing and protecting that data.

It is estimated there will be 180 zettabytes of data by 2025. That’s 180, followed by 21 zeros. Data on their own, do not eliminate risk or give us certainty in our decision making. If we are not to drown in data, we must first learn to ask the right questions.