“. . . In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it.[…]”
“On Exactitude in Science”
Jorge Luis Borges
In the one-paragraph short story On Exactitude in Science, Argentinian writer Jorge Luis Borges imagines an empire where the science of cartography becomes so exact that only a map at the same scale as the empire itself will suffice. Indeed, the fictional empire has completely immersed itself in its excessive cartographic ambitions. Today, we nurture somewhat similar ambitions when we strive to represent our world through the data we collect. Big data is profoundly changing our world and how we make sense of it. The impact of data is extending rapidly on a truly massive scale; we are striving to use big data to transform whole industries, from marketing and sales to weather forecasts, from medical diagnoses to fraud detection and from business analysis to cyber security. Indeed, very much like Borges’ fictional Empire, we have come to believe that the more data we collect and analyse, the more knowledge we gain of the world and the people living in it.
Big is beautiful
The conviction now prevails that big data delivers actionable insights into nearly every aspect of life. Philip Evans and Patrick Forth, both senior partners at Boston Consulting Group and experts in digital transformation, contest in an article from BCG perspectives that “information is comprehended and applied through fundamentally new methods of artificial intelligence that seek insights through algorithms using massive, noisy data sets. Since larger data sets yield better insights, big is beautiful”. Along these lines, our hunger for data is consistently increasing and our digital ecosystem is fueling it: sensors, connected devices, social media and a growing number of clouds consistently produce data for us to collect and analyse. According to a study by the International Data Corporation (IDC), the digital universe will about double every two years. From 2005 to 2020, the volume of data will grow by a factor of 300, to 40 zettabytes of data. Noteworthy, a zettabyte has 21 zeros. In this world of exponential data growth, the ambition to accumulate data goes unchecked. As in Borges’ fictional empire, the outer limit is the scale of 1:1, a complete digital representation of our world.
Today, companies like IBM or LinkedIn are already pushing towards that limit. IBM is training its cognitive computing system Watson to be able to answer virtually any question. In order to do so, IBM Watson is collecting unprecedented amounts of data to form the biggest corpus of knowledge to date. At the beginning of this year, the company acquired Truven Health Analytics for USD 2.6 billion, bringing to its health unit a major repository of health data from thousands of hospitals, employers and state governments across the US. It was the fourth major acquisition of a health data company in IBM Watson’s 10-month life span, showing just how important a digital representation of patients, diagnoses, treatments and hospitals is to the computer giant’s artificial intelligence system. Watson’s intelligence can only live up to its full potential with a vast amount of data that can be applied to its algorithms. LinkedIn’s vision is equally ambitious: the enterprise is creating an Economic Graph, which is nothing less than a digital mapping of the global economy. It aims to include a profile for every one of the 3 billion members of the global workforce. It intends to digitally represent every company, their products and services, the economic opportunities they offer and the skills required to obtain those opportunities. LinkedIn plans to include a digital presence for every higher education organisation in the world. The Economic Graph is thus an extensive repository of all members of the global economy and the interrelations among them. While it is unlikely that every person will at some point be represented on LinkedIn, the company is nevertheless taking fast strides towards that goal. The network currently has over 400 million members and two new professionals signing up every second. Earlier this year, LinkedIn was acquired by Microsoft for over USD 26 billion, which can only hint at how valuable the mass of data which LinkedIn is collecting is and will be in the future.
However, the endeavors of these two companies are but the tip of the iceberg. Their pursuit of building a complete digital representation of their respective fields is emblematic of a more general aspiration today towards creating a complete model of our world. The visions of digital companies like LinkedIn, Watson, Facebook, Microsoft and Google are converging and recreating the cartographic ambitions of the Empire of Borges’ story. Representation and reality are already starting to overlap.
At an inflection point
Importantly, however, Borges’ story continues. Namely with the demise of the ambitious 1:1 map, because the map as such did not have an inherent purpose or value. In Borges’ fictional world, the next generations dispose of their forefathers’ map, as they are not gripped by the same ambition as their ancestors, and recognise that the map of the scale 1:1 is useless. They leave it to decompose and all that remains are its “tattered ruins”.
“[…] The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.”
Also today, anuncertainty as to the exact value and practicality of big data is rising. In a recent interview with Business Insider, Professor Patrick Wolfe, Executive Director of the University College of London’s Big Data Institute, warned that “the rate at which we are generating data is rapidly outpacing our ability to analyse it”. Just about 0.5% of all data is currently analysed, according to the International Data Corporation, and Wolfe says that percentage is shrinking as more data is being collected. Furthermore, adoption of big data technologies and applications is going rather slowly, meaning that the scenario described by Wolfe is most probably not going to change very quickly. This is because big data is essentially enterprise technology, or as New York-based venture capitalist Matt Turck has pointed out, big data, fundamentally, is similar to plumbing; something that powers solutions in the back but that no one really gets to see. And the deployment of new technologies in that world usually does not happen overnight. Apart from a few digital native companies like Google, Yahoo, Facebook or LinkedIn, most companies are waiting on the sidelines to see whether it is worth investing in the integration of big data technologies. In the early days of the big data hype, that is sometime between 2011 and 2014, these digital native companies were the driving forces behind big data because they were both heavy users and producers of data. More importantly, these companies had no previous systems, which meant that the early development and adoption of big data technologies was comparably easy, as no old infrastructure was in the way and the companies were free to design the functionality of the new systems around the data they were collecting. The early phase of big data technology was thus characterised by these companies essentially recruiting engineers to build the technologies they needed to make use of the unprecedented volumes of data they had. The broader set of companies that have been around for decades, including medium enterprises but also very large multinational corporations, are much more cautious about introducing big data technologies. This is because these companies already have an existing infrastructure, one that is not laid out to process huge volumes of data systematically, but that has done the trick for a long time. A deployment of big data technologies would mean a costly and risky investment in a line of technologies, people and processes. Especially because many big data technologies are invented by startups that big players do not necessarily like to provide an essential part of their IT infrastructure. Hence, the phase of broader deployment is rather slow and only just beginning today.
We are at a crucial moment today. Going forward, we will have to see whether the rapid growth of collected data will continue to outpace our capacity to actually put that data to use, or whether we can learn to effectively draw actionable intelligence from the masses of data. We are in the thick of an opportunity that will decide whether or not our exponentially accumulating collection of data will suffer the same fate as Borges’ map. Along these lines, the capability of gaining business-relevant knowledge from big data is becoming the ultimate competitive advantage of many businesses.
Uncovering the hidden patterns in data
Over the past year, opinions about how to extract the hidden value from data have mainly been centered on one approach. While the trend seems promising, it is problematic that the merits of alternative tools and technologies remain largely undiscovered.
According to Matt Turck, the biggest recent trend in big data analytics has been an increasing focus on artificial intelligence to help analyse massive amounts of data and derive predictive insights. Indeed, applied mathematics and artificial intelligence (including related fields such as machine learning or deep learning) are equally weighing in the balance and replace every other tool that might be brought to bear. It is the idea that with enough data, the numbers speak for themselves. To reiterate what Evans and Forth said, “big is beautiful”. This idea informs the culture of Silicon Valley and, by extension, that of many enterprises around the world.
Importantly, the recent resurgence of artificial intelligence is very much an offspring of big data. The algorithms that power deep learning and machine learning (the two areas of AI that get the most attention currently) were largely invented decades ago. Yet, it was not until there were massive volumes of data to feed them, that they lived up to expectations. Now, artificial intelligence is helping to extract the hidden value of big data in turn. Turck thus strongly represents the commonly held opinion that AI is the logical continuation of big data when he in a recent post on his blog writes: “The increasing focus on AI/machine learning in analytics corresponds to the logical next step of the evolution of big data: now that I have all this data, what insights am I going to extract from it? AI/machine learning is now precipitating a trend towards the emergence of the application layer of big data. The combination of big data and AI will drive incredible innovation across pretty much every industry.” Big data and AI are a perfect match as each enhances the power of the other; together they reinforce the idea that the bigger the data volume, the better.
Yet, that symbiosis conceals other approaches that might be applied to gain insights from big data. Alternative methodologies like ontologies, taxonomies and semantics are currently completely disregarded. Where predictive analytics and machine learning stand for size, ontologies, taxonomies and semantics stand for meaning and understanding. The idea of an ontology, for example, springs from philosophy. In general, it is the study of what kinds of things exist, what entities there are in the universe and their relations. In computer science, an ontology is a practical application of philosophical ontology. It is a model of entities and their relations in some particular domain of knowledge. It represents a domain of knowledge as a hierarchy of concepts, using a shared vocabulary to denote the types, properties and interrelationships of those concepts. If an ontology were used in the context of pharmaceuticals, for example, it would be full of information about classifications of diseases, disorders, body systems, diagnoses, treatments and their relationships to each other.
Ontologies enable the representation of knowledge: humans usually understand the correct meaning of a term, thanks to their background knowledge and the context in which a specific term is used. A machine lacks this ability, naturally. It can, however, “learn” about the semantic meaning of a term through the concepts and relations in an ontology. Powerful ontologies already exist in many domains, such as linguistics, medicine, geography, occupations, etc. (a non-exhaustive but well compiled list can be found on Wikipedia).
While tools such as ontologies might seem insignificant compared to the dimensions of artificial intelligence, they will play no lesser part in determining the competitive “data fitness” of companies. After the exponential growth of the digital universe over the last years, we have reached a degree of complexity that requires the insertion of a deep understanding of the matters at hand. While artificial intelligences are trained to learn about new things on their own through deep learning algorithms, their level of “understanding” is limited when it comes to implicit, ambiguous, contextual or highly complex meaning. In contrast, ontologies can represent even the most complex matters in a structured way, so a machine can understand. Hence, the use of technologies such as ontologies could enhance what AI and big data can do on their own. Yet, their potential is currently flouted for they do not exactly comply with the dominant paradigm. Furthermore, they take longer to develop and are therefore more expensive because they require a lot more input from human experts (who teach an ontology a particular domain of knowledge). Unfortunately, this means that alternative big data technologies such as ontologies are currently drawing the short straw. In order for that to change, it is predominantly the mindset of the business community that will have to change and allow for a wider set of technologies with different approaches. Departing to some extent from the idea that “big is beautiful” could be the first step to leverage the full power of big data.
Image by Elsa Leydier