2017 : WHAT SCIENTIFIC TERM OR CONCEPT OUGHT TO BE MORE WIDELY KNOWN?

maximilian_schich's picture
Associate Professor in Arts and Technology, The University of Texas at Dallas
Confusion

Commonly, confusion denotes bewildering uncertainty, often associated with delirium or even dementia. From the confusion of languages in the Genesis of the bible, to Genesis the band, broader audiences mostly encounter negative aspects of confusion. This short text aims to shed a different light on the concept: Confusion that can be both positive and negative, sometimes both at the same time; Confusion as a subject of scientific interest; Confusion as a phenomenon that can't be ignored, that requires scientific understanding, and that needs to be designed and moderated.

A convenient tool to measure confusion in a system is the so-called confusion matrix. It is used in linguistics and computer science, in particular machine learning. In principle, the confusion matrix is a table, where all criteria in the dimension of rows are compared to all criteria in the dimension of columns. A simple example is to compare all letters of the alphabet spoken by an English native, with the letters actually perceived by a German speaker. An English e will often be confused with the German i, resulting in a higher value in the matrix where the e row crosses the i column. Ideally, of course, letters are only confused with themselves, resulting in high values exclusively along the matrix diagonal. Actual confusion, in other words, is characterized by patterns of higher values off the matrix diagonal.

Unfortunately, one may say, the use of the confusion matrix is still mostly governed by what Richard Dawkins calls "the tyranny of the discontinuous mind." Processing the confusion matrix, scholars mostly derive secondary measures to quantify type-I and type-II errors, i.e. false positives and false negatives, as well as a number of similarly aggregate measures. In short, the confusion matrix is used to make classification by humans and artificial intelligence less confusing. A typical, and of course very useful example, is to compare a machine classification of images with the known ground truth. No doubt, quantifying the confusion of ducks and alligators, just like pedestrians and street signs is a crucial application that can save lives. Similarly, it is often useful to optimize classification systems in order to minimize the confusion of human curators. A good example would be the effort of the semantic web community to simplify global classification systems, such as the UMBEL ontology or the category system in Wikipedia, to allow for easy data collection and classification with minimal ambiguity. Nevertheless, the almost exclusive focus on optimization by minimizing confusion is unfortunate, as perfect discreteness of categories is not desirable in many real systems, from the function of genes and proteins to individual roles in society. Too little confusion between categories or groups and the system is in essence dead. Too much confusion and the system is overwhelmed by chaos. In a social network, total lack of confusion annihilates any base for communication between groups, while complete confusion would be equivalent to a meaningless cacophony of everything meaning everything.

Network science is increasingly curious regarding this situation, dealing with confusion using the concept of overlap in community finding. Multi-functional molecules, genes and proteins, for example, act as drugs and drug targets, where confusion needs to be moderated in order to hit the target, while minimizing unwanted side effects. Similar situations arise in social life. Only recently it has become possible in network science to deal with such phenomena in an efficient way. Network science initially mostly focused on identifying discrete communities, as finding them is much more simple in terms of computation. In such a perfect world where all communities are discrete, there is no confusion, or, one should better say, confusion is ignored. In such a perfect world the confusion or co-occurrence matrix can be sorted so that all communities form squares or rectangles along the matrix diagonal. In a more complicated case, neighboring communities are overlapping, forming sub-communities in between two almost discrete communities, say all people belonging to the same company while also belonging to the same family. It is easy to imagine more complicated cases. At the other end of the spectrum we find all-out complex overlap, which is hard to imagine or visualize in terms of sorting the matrix. It may well be true, however, that complex overlap is crucial for the survival of the system in question.

There is a known case, where confusion by design is desirable. A highly cited concept in material science, which was introduced 23 years ago in a news item, Greer's so-called principle of confusion applies to the formation of metallic glass. In short, the principle states that using a greater variety of metal atoms to form a glass is more convenient due to the resulting impurities giving the material less chance to crystallize. This allows for larger objects of glass with interesting material properties, such as being stronger than steel. The convenience of larger confusion is counter-intuitive, as it is increasingly harder to determine the material properties of a glass the larger the variety of metals involved. It would not be surprising to see something like Greer's principle of confusion applied to other systems as well.

While such questions await solution, as a take home, we should expect critical amounts of confusion in many real life systems, with the optimum in between but not identical with perfect discreteness or perfect homogeneity. Further identifying, understanding, and successfully moderating patterns of confusion in real systems is an ongoing challenge. Solving this challenge is likely essential in a great variety of fields, from materials and medicine to social justice and the ethics of artificial intelligence. Science will help us to clarify, if possible to embrace, and if necessary to avoid confusion. Of course, we should use caution, as the moderation of confusion can be used for peace and war, much like the rods in a nuclear reactor—with the difference that switching off confusion in a social system may be just as deadly.