WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES

A Talk with Jaron Lanier [11.18.03]

Introduction

In September, 2000, Jaron Lanier, a pioneer in virtual reality, musician, and the lead scientist for the National Tele-Immersion Initiative, weighed forth on Edge against "cybernetic totalism"."For the last twenty years," he wrote, in his "Half a Manifesto" (Edge #74), "I have found myself on the inside of a revolution, but on the outside of its resplendent dogma. Now that the revolution has not only hit the mainstream, but bludgeoned it into submission by taking over the economy, it's probably time for me to cry out my dissent more loudly than I have before." In his manifesto, he took on those "who seem to not have been educated in the tradition of scientific skepticism. I understand why they are intoxicated. There is a compelling simple logic behind their thinking and elegance in thought is infectious."

"There is a real chance," he continued, "that evolutionary psychology, artificial intelligence, Moore's Law fetishizing, and the rest of the package, will catch on in a big way, as big as Freud or Marx did in their times. Or bigger, since these ideas might end up essentially built into the software that runs our society and our lives. If that happens, the ideology of cybernetic totalist intellectuals will be amplified from novelty into a force that could cause suffering for millions of people." "Half a Manifesto" caused a stir, was one of Edge's most popular features, and has been widely reprinted.

Lately, Lanier has been looking at trends in software, and he doesn't like what he sees, namely "a macabre parody of Moore's Law". In this feature, which began as a discussion at a downtown New York restaurant last year, he continues his challenge to the ideas of philosopher Daniel C. Dennett, and raises the ante by taking issue with the seminal work in information theory and computer science of Claude Shannon, Alan Turing, John von Neumann, and Norbert Wiener.

—JB

JARON LANIER, a computer scientist and musician, is a pioneer of virtual reality, and founder and former CEO of VPL. He is currently the lead scientist for the National Tele-Immersion Initiative, and visiting scientist, SGI.

Jaron Lanier's Edge Bio Page

WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES

(JARON LANIER): There was a breathtaking moment at the birth of computer science and information theory in the mid-20th century when the whole field was small enough that it could be kept in one's head all at once. There also just happened to be an extraordinary generation of brilliant people who, in part because of the legacy of their importance to the military in World War II, were given a lot of latitude to play with these ideas. People like Shannon, Turing, von Neumann, Wiener, and a few others had an astonishing combination of breadth and depth that's humbling to us today-practically to the point of disorientation. It's almost inconceivable that people like Wiener and von Neumann could have written the books of philosophy that they did while at the same time achieving their technical heights. This is something that we can aspire to but will probably never achieve again.

What's even more humbling, and in a way terrifying, is that despite this stellar beginning and the amazing virtuosity of these people, something hasn't gone right. We clearly have proven that we know how to make faster and faster computers (as described by Moore's Law), but that isn't the whole story, alas. Software remains disappointing as we try to make it grow to match the capability of hardware.

If you look at trends in software, you see a macabre parody of Moore's Law. The expense of giant software projects, the rate at which they fall behind schedule as they expand, the rate at which large projects fail and must be abandoned, and the monetary losses due to unpredicted software problems are all increasing precipitously. Of all the things you can spend a lot of money on, the only things you expect to fail frequently are software and medicine. That's not a coincidence, since they are the two most complex technologies we try to make as a society. Still, the case of software seems somehow less forgivable, because intuitively it seems that as complicated as it's gotten lately, it still exists at a much lower order of tangledness than biology. Since we make it ourselves, we ought to be able to know how to engineer it so it doesn't get quite so confusing.

I've had a suspicion for a while that despite the astonishing success of the first generation of computer scientists like Shannon, Turing, von Neumann, and Wiener, somehow they didn't get a few important starting points quite right, and some things in the foundations of computer science are fundamentally askew. In a way I have no right to say this and it would be more appropriate to say it once I've actually got something to take its place, so let me just emphasize that this is speculative. But where might things have gone wrong?

The leaders of the first generation were influenced by the metaphor of the electrical communications devices that where in use in their lifetimes, all of which centered on the sending of signals down wires. This started, oddly enough, with predecessors of the fax machine, continuing in a much bigger way to the telegraph, which turned into the telephone, and then proceeded with devices that carry digital signals that were only machine readable. Similarly, radio and television signals were designed to be relayed to a single wire even if part of their passage was wireless. All of us are guided by our metaphors, and our metaphors are created by the world around us, so it's understandable that signals on wires would become the central metaphor of their day.

If you model information theory on signals going down a wire, you simplify your task in that you only have one point being measured or modified at a time at each end. It's easier to talk about a single point in some ways, and in particular it's easier to come up with mathematical techniques to perform analytic tricks. At the same time, though, you pay by adding complexity at another level, since the only way to give meaning to a single point value in space is time. You end up with information structures spread out over time, which leads to a particular set of ideas about coding schemes in which the sender and receiver have agreed on a temporal syntactical layer in advance.

If you go back to the original information theorists, everything was about wire communication. We see this, for example, in Shannon's work. The astonishing bridge that he created between information and thermodynamics was framed in terms of information on a wire between a sender and a receiver.

This might not have been the best starting point. It's certainly not a wrong starting point, since there's technically nothing incorrect about it, but it might not have been the most convenient or cognitively appropriate starting point for human beings who wished to go on to build things. The world as our nervous systems know it is not based on single point measurements, but on surfaces. Put another way, our environment has not necessarily agreed with our bodies in advance on temporal syntax. Our body is a surface that contacts the world on a surface. For instance, our retina sees multiple points of light at once.

We're so used to thinking about computers in the same light as was available at the inception of computer science that it's hard to imagine an alternative, but an alternative is available to us all the time in our own bodies. Indeed the branches of computer science that incorporated interactions with the physical world, such as robotics, probably wasted decades trying to pretend that reality could be treated as if it were housed in a syntax that could be conveniently encoded on a wire. Traditional robots converted the data from their sensors into a temporal stream of bits. Then the robot builders would attempt to find the algorithms that matched the inherent protocol of these bits. Progress was very, very slow. The latest better robots tend to come from people like Ron Fearing and his physiologist cohort Bob Full at Berkeley who describe their work as "biomimetic". They are building champion robots that in some cases could have been built decades ago were it not for the obsession with protocol-centric computer science. A biomimetic robot and its world meet on surfaces instead of at the end of a wire. Biomimetic robots even treat the pliability of their own building materials as an aspect of computation. That is, they are made internally of even more surfaces.

With temporal protocols, you can have only one of point of information that can be measured in a system at a time. You have to set up a temporal hierarchy in which the bit you measure at a particular time is meaningful based on "when" in a hierarchy of contexts you happen to occupy when you read the bit. You stretch information out in time and have past bits give context to future bits in order to create a coding scheme. This is the preferred style of classical information theory from the mid-twentieth century.

Note that this form of connection occurs not only between computers on the internet, but in a multitude of internal connections between parts of a program. When someone says a piece of software is "Object oriented", that means that the bits traveling on the many, many virtual wires inside the program are interpreted in a particular way. Roughly speaking, they are verb-like messages being sent to noun-like destinations, while the older idea was to send noun-like messages to verb-like destinations. But fundamentally the new and old ideas are similar in that they are simulations of vast tangles of telegraph wires.

The alternative, in which you have a lot of measurements available at one time on a surface, is called pattern classification. In pattern classification a bit is given meaning at least in part by other bits measured at the same time. Natural neural systems seem to be mostly pattern recognition oriented and computers as we know them are mostly temporal protocol adherence-oriented. The distinction between protocols and patterns is not absolute-one can in theory convert between them. But it's an important distinction in practice, because the conversion is often beyond us, either because we don't yet know the right math to use to accomplish it, or because it would take humongous hypothetical computers to the job.

In order to keep track of a protocol you have to devote huge memory and computational resources to representing the protocol rather than the stuff of ultimate interest. This kind of memory use is populated by software artifacts called data-structures, such as stacks, caches, hash tables, links and so on. They are the first objects in history to be purely syntactical.

As soon as you shift to less temporally-dependent patterns on surfaces, you enter into a different world that has its own tradeoffs and expenses. You're trying to be an ever better guesser instead of a perfect decoder. You probably start to try to guess ahead, to predict what you are about to see, in order to get more confident about your guesses. You might even start to apply the guessing method between parts of your own guessing process. You rely on feedback to improve your guesses, and in that there's a process that displays at least the rudiments of evolutionary self-improvement. Since the first generation of computer scientists liked to anthropomorphize computers (something I dislike), they used the word "memory" to describe their stacks and pointers, but neurological memory is probably more like the type of internal state I have just described for pattern-sensitive machines. Computational neuroscientists sometimes argue about how to decide when to call such internal state a "model" of the world, but whether it's a model or not, it's different than the characteristic uses of memory for protocol-driven software. Pattern-guessing memory use tends to generate different kinds of errors, which is what's most important to notice.

When you de-emphasize protocols and pay attention to patterns on surfaces, you enter into a world of approximation rather than perfection. With protocols you tend to be drawn into all-or-nothing high wire acts of perfect adherence in at least some aspects of your design. Pattern recognition, in contrast, assumes the constant minor presence of errors and doesn't mind them. My hypothesis is that this trade-off is what primarily leads to the quality I always like to call brittleness in existing computer software, which means that it breaks before it bends.

Of course we try to build some error-tolerance into computer systems. For instance, the "TCP" part of TCP/IP is the part that re-sends bits if there's evidence a bit might not have made it over the net correctly. That's a way of trying to protect one small aspect of a digital design from the thermal reality it's trying to resist. But that's only the easiest case, where the code is assumed to be perfect, so that it's easy to tell if a transmission was faulty. If you're worried that the code itself might also be faulty (and in large programs it always is), then error correction can lead to infinite regresses, which are the least welcome sort of error when it comes to developing information systems.

In the domain of multi-point surface sampling you have only a statistical predictability rather than an at least hypothetically perfect planability. I say "hypothetically", because for some reason computer scientists often seem unable to think about real computers as we observe them, rather than the ideal computers we wish we could observe. Evolution has shown us that approximate systems (living things, particularly those with nervous systems) can be coupled to feedback loops that improve their accuracy and reliability. They can become very good indeed. Wouldn't it be nicer to have a computer that's almost completely reliable almost all the time, as opposed to one that can be hypothetically perfectly accurate, in some hypothetical ideal world other than our own, but in reality is prone to sudden, unpredictable, and often catastrophic failure in actual use?

The reason we're stuck on temporal protocols is probably that information systems do meet our expectations when they are small. They only start to degrade as they grow. So everyone's learning experience is with protocol-centric information systems that function properly and meet their design ideals. This was especially true of the second generation of computer scientists, who for the first time could start to write more pithy programs, even though those programs were still small enough not to cause trouble. Ivan Sutherland, the father of computer graphics, wrote a program in the mid 1960s called "Sketchpad" all by himself as a student. In it he demonstrated the first graphics, continuous interactivity, visual programming, and on and on. Most computer scientists regard Sketchpad as the most influential program ever written. Every sensitive younger computer scientist mourns the passing of the days when such a thing was possible. By the 1970s, Seymour Papert had even small children creating little programs with graphical outputs in his computer language "LOGO". The operative word is "little." The moment programs grow beyond smallness, their brittleness becomes the most prominent feature, and software engineering becomes Sisyphean.

Computer scientists hate, hate thinking about the loss of idealness that comes with scale. But there it is. We've been able to tolerate the techniques developed at tiny scales to an extraordinary degree, given the costs, but at some future scale we'll be forced to re-think things. It's amazing how static the basic ideas of software have been since the period of late 1960s into the mid 1970s. We refuse to grow up, as it were. I must take a moment to rant about one thing. Rebellious young programmers today often devote their energies to recreating essentially old code (Unix components or Xerox PARC-style programs) in the context of the free software movement, and I don't dismiss that kind of idealism at all. But it isn't enough. An even more important kind of idealism is to question the nature of that very software, and in that regard the younger generations of computer scientists seem to me to be strangely complacent.

Given how brittle our real-world computer systems get when they get big, there's an immediate motivation to explore any alternative that might make them more reliable. I've suggested that we call the alternative approach to software that I've outlined above "Phenotropic." Pheno- refers to outward manifestations, as in phenotype. -Tropic originally meant "Turning," but has come to mean "Interaction." So Phenotropic means "The interaction of surfaces." It's not necessarily biomimetic, but who's to say, since we don't understand the brain yet. My colleague Christoph von der Marsburg, a neuroscientist of vision, has founded a movement called "Biological Computing" which exists mostly in Europe, and is more explicitly biomimetic, but is essentially similar to what some of us are calling "Phenotropics" here in the States.

There are two sides to Phenotropic investigation, one concerned with engineering and the other with scientific and philosophical explorations.

I suppose that the software engineering side of Phenotropics might seem less lofty or interesting, but software engineering is the empirical foundation of computer science. You should always resist the illusory temptations of a purely theoretical science, of course. Computer science is more vulnerable to these illusions than other kinds of science, since it has been constrained by layers of brittle legacy code that preserve old ideas at the expense of new ones.

My engineering concern is to try to think about how to build large systems out of modules that don't suffer as terribly from protocol breakdown as existing designs do. The goal is to have all of the components in the system connect to each other by recognizing and interpreting each other as patterns rather than as followers of a protocol that is vulnerable to catastrophic failures. One day I'd like to build large computers using pattern classification as the most fundamental binding principle, where the different modules of the computer are essentially looking at each other and recognizing states in each other, rather than adhering to codes in order to perfectly match up with each other. My fond hope, which remains to be tested, is that by building a system like this I can build bigger and more reliable programs than we know how to build otherwise. That's the picture from an engineering point of view.

In the last few years I've been looking for specific problems that might yield to a phenotropic approach. I've always been interested in surgical simulations. Two decades ago I collaborated with Dr. Joe Rosen, then of Stanford, now of Dartmouth, and Scott Fisher, then of NASA, now at USC, on the first surgical Virtual Reality simulation. It's been delightful to see surgical simulation improve over the years. It's gotten to the point where it can demonstrably improve outcomes. But the usual problems of large software plague it, as one might expect. We can't write a big enough program of any kind to write the big programs we need to for future surgical simulations.

One example of pattern recognition that I've found to be particularly inspiring came about via my colleague Christoph von der Marsburg, and some of his former students, especially Hartmut Neven. We all started to work together back when I was working with Tele-immersion and Internet2. I was interested in how to transfer the full three-dimensional facial features of someone from one city to another with low bandwidth in order to create the illusion (using fancy 3D displays) that the remote person was present in the same room. We used some visual pattern recognition techniques to derive points on a face, and tied these to a 3D avatar of the person on the other side. (An avatar is what a person looks like to others in Virtual Reality.) As luck would have it, a long time collaborator of mine named Young Harvil had been building fine quality avatar heads, so we could put this together fairly easily. It was super! You'd see this head that looked like a real person that also moved properly and conveyed expressions remarkably well. If you've seen the movie "Simone" you've seen a portrayal of a similar system.

Anyway, the face tracking software works really well. But how does it work?

You start with an image from a camera. Such an image is derived from the surface of a light-sensitive chip which makes a bunch of simultaneous adjacent measurements, just like a surface in a phenotropic system. The most common way to analyze this kind of surface information is to look at its spectrum. To do this, you make a virtual prism in software, using a mathematical technique first described two centuries ago by the great mathematician Fourier, and break the pattern into a virtual rainbow of spread-out subsignals of different colors or frequencies. But alas, that isn't enough to distinguish images. Even though a lot of images would break up into distinguishable rainbows because of the different distribution of colors present in them, you could easily be unlucky and have two different pictures that produced identical rainbows through a prism. So what to do?

You have to do something more to get at the layout of an image in space, and the techniques that seem to work best are based on "Wavelets," which evolved out of Dennis Gabor's work when he invented Holograms in the 1940s. Imagine that instead of one big prism breaking an image into a rainbow, you looked at the image through a wall of glass bricks, each of which was like a little blip of a prism. Well, there would be a lot of different sizes of glass bricks, even though they'd all have the same shape. What would happen is some of the individual features of the image, like the corner of your left eye, would line up with particular glass bricks of particular sizes. You make a list of these coincidences. You've now broken the image apart into pieces that capture some information about the spatial structure. It turns out that the human visual system does something a little like this, starting in the retina and most probably continuing in the brain.

But we're not done. How do you tell whether this list of glass bricks corresponds to a face? Well, of course what you do is build a collection of lists of bricks that you already know represent faces, or even faces of specific individuals, including how the features matching the bricks should be positioned relative to each other in space (so that you can rule out the possibility that the corner of your left eye could possibly occur at the end of your nose, for instance.) Once you have that collection, you can compare known glass brick breakdowns against new ones coming in from the camera and tell when you're looking at a face, or even a specific person's face.

This turns out to work pretty well. Remember when I mentioned that once you start to think Phenotropically, you might want to try to predict what the pattern you think you've recognized is about to look like, to test your hypothesis? That's another reason I wanted to apply this technique to controlling avatar heads. If you find facial features using the above technique and use the results to re-animate a face using an avatar head, you ought to get back something that looks like what the camera originally saw. Beyond that, you ought to be able to use the motion of the head and features to predict what's about to happen-not perfectly, but reasonably well-because each element of the body has a momentum just like a car. And like a car, what happens next is constrained not only by the momentum, but also by things you can know about mechanical properties of the objects involved. So a realistic enough avatar can serve as a tool for making predictions, and you can use the errors you discover in your predictions to tune details in your software. As long as you set things up efficiently, so that you can choose only the most important details to tune in this way, you might get a tool that improves itself automatically. This idea is one we're still testing; we should know more about it within a couple of years. If I wanted to treat computers anthropomorphically, like so many of my colleagues, I'd call this "artificial imagination."

Just as in the case of robotics, which I mentioned earlier, it's conceivable that workable techniques in machine vision could have appeared much earlier, but computer science was seduced by its protocol-centric culture into trying the wrong ideas again and again. It was hoped that a protocol existed out there in nature, and all you had to do was write the parser (an interpreter of typical hierarchical protocols) for it. There are famous stories of computer science graduate students in the 1960s being assigned projects of finding these magic parsers for things like natural language or vision. It was hoped that these would be quick single-person jobs, just like Sketchpad. Of course, the interpretation of reality turned out to require a completely different approach from the construction of small programs. The open question is what approach will work for large programs.

A fully phenotropic giant software architecture might consist of modules with user interfaces that can be operated either by other modules or by people. The modules would be small and simple enough that they could be reliably made using traditional techniques. A user interface for a module would remain invisible unless a person wanted to see it. When one module connects to another, it would use the same techniques a biomimetic robot would use to get around in the messy, unpredictable physical world. Yes, a lot of computer power would go into such internal interfaces, but whether that should be thought of as wasteful or not will depend on whether the improvement I hope to see really does appear when phenotropic software gets gigantic. This experiment will take some years to conduct.

Let's turn to some philosophical implications of these ideas. Just as computer science has been infatuated with the properties of tiny programs, so has philosophy been infatuated by the properties of early computer science.

Back in the 1980s I used to get quite concerned with mind-body debates. One of the things that really bothered me at that time was that it seemed to me that there was an observer problem in computer science. Who's to say that a computer is present? To a Martian, wouldn't a Macintosh look like a lava lamp? It's a thing that puts out heat and makes funny patterns, but without some cultural context, how do you even know it's a computer? If you say that a brain and a computer are in the same ontological category, who is recognizing either of them? Some people argue that computers display certain kinds of order and predictability (because of their protocol-centricity) and could therefore be detected. But the techniques for doing this wouldn't work on a human brain, because it doesn't operate by relying on protocols. So how could they work on an arbitrary or alien computer?

I pushed that question further and further. Some people might remember the "rain drops" argument. Sometimes it was a hailstorm, actually. The notion was to start with one of Daniel C. Dennett's thought experiments, where you replace all of your neurons one by one with software components until there are no neurons left to convert. At the end you have a computer program that has your whole brain recorded, and that's supposed to be the equivalent of you. Then, I proposed, why don't we just measure the trajectories of all of the rain drops in a rain storm, using some wonderful laser technology, and fill up a data base until we have as much data as it took to represent your brain. Then, conjure a gargantuan electronics shopping mall that has on hand every possible microprocessor up to some large number of gates. You start searching through them until you find all the chips that happen to accept the rain drop data as a legal running program of one sort or another. Then you go through all the chips which match up with the raindrop data as a program and look at the programs they run until you find one that just happens to be equivalent to the program that was derived from your brain. Have I made the raindrops conscious? That was my counter thought experiment. Both thought experiments relied on absurd excesses of scale. The chip store would be too large to fit in the universe and the brain would have taken a cosmologically long time to break down. The point I was trying to get across was that there's an epistemological problem.

Another way I approached the same question was to say, if consciousness were missing from the universe, how would things be different? A range of answers is possible. The first is that nothing would be different, because consciousness wasn't there in the first place. This would be Dan Dennett's response (at least at that time), since he would get rid of ontology entirely. The second answer is that the whole universe would disappear because it needed consciousness. That idea was characteristic of followers of some of John Archibald Wheeler's earlier work, who seemed to believe that consciousness plays a role in keeping things afloat by taking the role of the observer in certain quantum-scale interactions. Another answer would be that the consciousness-free universe would be similar but not identical, because people would get a little duller. That would be the approach of certain cognitive scientists, suggesting that consciousness plays a specific, but limited practical function in the brain.

And then there's another answer, which initially might sound like Dennett's: that if consciousness were not present, the trajectories of all particles would remain identical. Every measurement you could make in the universe would come out identically. However, there would be no "gross", or everyday objects. There would be neither apples nor houses, nor brains to perceive them. Neither would there be words or thoughts, though the electrons and chemical bonds that would otherwise comprise them would remain the just the same as before. There would only be the particles that make up everyday things, in exactly the same positions they would otherwise occupy. In other words, consciousness is an ontology that is overlaid on top of these particles. If there were no consciousness the universe would be perfectly described as being nothing but particles.

Here's an even clearer example of this point of view: There's no reason for the present moment to exist except for consciousness. Why bother with it? Why can we talk about a present moment? What does it mean? It's just a marker of this subjectivity, this overlaid ontology. Even though we can't specify the present moment very well, because of the spatial distribution of the brain, general relativity, and so on, the fact that we can refer to it even approximately is rather weird. It must mean the universe, or at least some part of it, like a person, is "doing something" in order to distinguish the present moment from other moments, by being conscious or embracing non-determinism in some fundamental way.

I went in that direction and became mystical about everyday objects. From this point of view, the extremes of scale are relatively pedestrian. Quantum mechanics is just a bunch of rules and values, while relativity and cosmology are just a big metric you live on, but the in-between zone is where things get weird. An apple is bizarre because there's no structure to make the apple be there; only the particles that comprise it should be present. Same for your brain. Where does the in-between, everyday scale come from? Why should it be possible to refer to it at all?

As pattern recognition has started to work, this comfortable mysticism has been challenged, though perhaps not fatally. An algorithm can now recognize an apple. One part of the universe (and it's not even a brain) can now respond to another part in terms of everyday gross objects like apples. Or is it only mystical me who can interpret the interaction in that light? Is it still possible to say that fundamental particles simply move in their courses and there wasn't necessarily an apple or a computer or a recognition event?

Of course, this question isn't easy to answer! Here's one way to think about it. Let's suppose we want to think of nature as an information system. The first question you'd ask is how it's wired together.

One answer is that all parts are consistently wired to each other, or uniformly influential to all others. I've noticed a lot of my friends and colleagues have a bias to want to think this way. For instance, Stephen Wolfram's little worlds have consistent bandwidths between their parts. A very different example comes from Seth Lloyd and his "ultimate laptop," in which he thought of various pieces of physicality (including even a black hole) as if they were fundamentally doing computation and asked how powerful these purported computers might be.

But let's go back to the example of the camera and the apple. Suppose poor old Shroedinger's Cat has survived all the quantum observation experiments but still has a taste for more brushes with death. We could oblige it by attaching the cat-killing box to our camera. So long as the camera can recognize an apple in front of it, the cat lives.

What's interesting is that what's keeping this cat alive is a small amount of bandwidth. It's not the total number photons hitting the camera that might have bounced off the apple, or only the photons making it through the lens, or the number that hit the light sensor, or even the number of bits of the resulting digitized image. Referring to the metaphor I used before, it's the number of glass bricks in the list that represents how an apple is recognized. We could be talking about a few hundred numbers, maybe less, depending on how well we represent the apple. So there's a dramatic reduction in bandwidth between the apple and the cat.

I always liked Bateson's definition of information: "A difference that makes a difference." It's because of that notion of information that we can talk about the number of bits in a computer in the way we usually do instead of the stupendously larger number of hypothetical measurements you could make of the material comprising the computer. It's also why we can talk about the small number of bits keeping the cat alive. Of course if you're a mystic when it comes to everyday-scale objects, you're still not convinced there ever was a cat or a computer.

But it might be harder for a mystic to dismiss the evolution of the cat. One of the problems with, say, Wolfram's little worlds is that all the pieces stay uniformly connected. In evolution as we have been able to understand it, the situation is different. You have multiple agents that remain somewhat distinct from one another long enough to adapt and compete with one another.

So if we want to think of nature as being made of computation, we ought to be able to think about how it could be divided into pieces that are somewhat causally isolated from one another. Since evolution has happened, it would seem our universe supports that sort of insulation.

How often is the "causal bandwidth" between things limited, and by how much? This is starting to sound a little like a phenotropic question!

One possibility is that when computer science matures, it's also going to be the physics of everyday-sized objects that influence each other via limited information flows. Of course, good old Newton might seem to have everyday-sized objects covered already, but not in the sense I'm proposing here. Every object in a Newtonian model enjoys consistent total bandwidth with every other object, to the dismay of people working on n-body problems. This is the famous kind of problem in which you try to predict the motions of a bunch of objects that are tugging on one another via gravity. It's a notoriously devilish problem, but from an information flow point of view all n of the bodies are part of one object, albeit a generally inscrutable one. They only become distinct (and more often predictable) when the bandwidth of causally relevant information flow between them is limited.

N-body problems usually concern gravity, in which everything is equally connected to everything, while the atoms in an everyday object are for the most part held together by chemistry. The causal connections between such objects is often limited. They meet at surfaces, rather than as wholes, and they have interior portions that are somewhat immune to influence.

There are a few basic ideas in physics that say something about how the universe is wired, and one of them is the Pauli exclusion principle, which demands that each fermion occupy a unique quantum niche. Fermions are the particles like electrons and protons that make up ordinary objects, and the Pauli rule forces them into structures.

Whenever you mention the Pauli principle to a good physicist, you'll see that person get a misty, introspective look and then say something like, Yes, this is the truly fundamental, under-appreciated idea in physics. If you put a fermion somewhere, another fermion might be automatically whisked out of the way. THAT one might even push another one out of its way. Fermions live in a chess-like world, in which each change causes new structures to appear. Out of these structures we get the solidity of things. And limitations on causal connection between those things.

A chemist reading my account of doubting whether everyday objects are anything other than the underlying particles might say, "The boundary of an everyday object is determined by the frontier of the region with the strong chemical bonds." I don't think that addresses the epistemological issue, but it does say something about information flow.

Software is frustratingly non-Fermionic, by the way. When you put some information in memory, whatever might have been there before doesn't automatically scoot out of the way. This sad state of affairs is what software engineers spend most of their time on. There is a hidden tedium going on inside your computer right now in which subroutines are carefully shuttling bit patterns around to simulate something like a Pauli principle so that the information retains its structure.

Pattern classification doesn't avoid this problem, but it does have a way to sneak partially around it. In classical protocol-based memory, you place syntax-governed bits into structures and then you have to search the structures to use the bits. If you're clever, you pre-search the structures like Google does to make things faster.

The memory structures created by biomimetic pattern classification, like the glass brick list that represents the apple, work a little differently. You keep on fine tuning this list with use, so that it has been influenced by its past but doesn't exhaustively record everything that's happened to it. So it just sits there and improves and doesn't require as much bit shuttling.

The Pauli principle has been joined quite recently by a haunting new idea about the fundamental bandwidth between things called "Holography," but this time the discovery came from studying cosmology and black holes instead of fundamental particles. Holography is an awkward name, since it is only metaphorically related to Gabor's holograms. The idea is that the two-dimensional surface area surrounding a portion of a universe limits the amount of causal information, or information that can possibly matter, that can be associated with the volume inside the surface. When an idea is about a limitation of a value, mathematicians call it a "bound", and "holography" is the name of the bound that would cover the ultimate quantum gravity version of the information surface bound we already know about for sure, which is called the Bekenstein Bound. In the last year an interesting variant has appeared called the Bousso Bound that seems to be even more general and spooky, but of course investigations of these bounds is limited by the state of quantum gravity theories (or maybe vice versa), so we have to wait to see how this will all play out.

Even though these new ideas are still young and in flux, when you bring them up with a smart quantum cosmologist these days, you'll see the same glassy-eyed reverence that used to be reserved for the Pauli principle. As with the Pauli principle, holography tells you what the information flow rules are for hooking up pieces of reality, and as with Pauli exclusion, holography places limits on what can happen that end up making what does happen more interesting.

These new bounds are initially quite disturbing. You'd think a volume would tell you how much information it could hold, and it's strange to get the answer instead from the area of the surface that surrounds it. (The amount of information is 1/4 the area in Planck units, by the way, which should sound familiar to people who have been following work on how to count entropy on the surfaces of black holes.) Everyone is spooked by what Holography means. It seems that a profoundly fundamental description of the cosmos might be in the terms of bandwidth-limiting surfaces.

It's delightful to see cosmology taking on a vaguely phenotropic quality, though there isn't any indication as yet that holography will be relevant to information science on non-cosmological scales.

What can we say, then, about the bandwidth between everyday objects? As in the case of the apple-recognizing camera that keeps the cat alive, there might be only a small number of bits of information flow that really matter, even though there might be an incalculably huge number of measurements that could be made of the objects that are involved in the interaction. A small variation in the temperature of a small portion of the surface of the apple will not matter, nor will a tiny spec of dirt on the lens of the camera, even though these would both be as important as any other measures of state in a fully-connected information system.

Stuart Kauffman had an interesting idea that I find moving. He suggests that we think of a minimal life form as being a combination of a Carnot cycle and self-replication. I don't know if I necessarily agree with it, but it's wonderful. The Carnot cycle originally concerned the sequence in which temperature and pressure were managed in a steam engine to cause repeated motion. One portion of the engine is devoted to the task of getting the process to repeat- and this might be called the regulatory element. If you like, you can discern the presence of analogs to the parts of a Carnot cycle in all kinds of structures, not just in steam engines. They can be found in cells, for instance. The Carnot cycle is the basic building block of useful mechanisms in our thermal universe, including in living organisms.

But here's what struck me. In my search to understand how to think about the bandwidths connecting everyday objects it occurred to me that if you thought of dividing the universe into Carnot cycles, you'd find the most causally important bandwidths in the couplings between some very specific places: the various regulatory elements. Even if two observers might dispute how to break things down into Carnot cycles, it would be harder to disagree about where these regulatory elements were.

Why would that matter? Say you want to build a model of a cell. Many people have built beautiful, big, complicated models of cells in computers. But which functional elements do you care about? Where do you draw the line between elements? What's your ontology? There's never been any real principle. It's always just done according to taste. And indeed, if you have different people look at the same problem and make models, they'll generally come up with somewhat divergent ontologies based on their varying application needs, their biases, the type of software they're working with, and what comes most easily to them. The notions I've been exploring here might provide at least one potential opening for thinking objectively about ontology in a physical system. Such an approach might someday yield a generalized way to summarize causal systems- and this would fit in nicely with a phenotropic engineering strategy for creating simulations.

It's this hope that has finally convinced me that I should perhaps start believing in everyday objects like cats and apples again.

Reality Club Discussion

Daniel C. Dennett

Philosopher; Austin B. Fletcher Professor of Philosophy, Co-Director, Center for Cognitive Studies, Tufts University; Author, From Bacteria to Bach and Back

Dear Mr. Smart,

I think you are right about the relation of speed and efficiciency to parallel processing (see, e.g., my somewhat dated essay "Fast Thinking" in The Intentional Stance, 1987) but I took Jaron to be taking himself to be proposing something much more radical. Your idea that timing differences by themselves could play a large informational role is certainly plausible, for the reasons you state and others. And if a serial simulation of such a parallel system did throw away all that information, it would be crippled. I take it that your idea is that the timing differences would start out as just being intrinsic to the specific hardware that happened to be in place, and hence not informative at the outset, but that with opportunistic tuning, of the sort that an evolutionary algorithm could achieve, such a parallel system could exploit these features of its own hardware.

So I guess I agree that Dylan overstated the case, though not as much as Jaron did. If Jaron had put it the way you do, and left off the portentous badmouthing of our heroes, he would have had a better reception, from me at least.

Steve Grand

Founder, Cyberlife Research; author, Creation: Life and How to Make It

I admit I didn't understand the latter half of Jaron's paper, so I can't yet comment on it, but I'd like to respond to a few of Dylan's comments with a plea not to be quite so dismissive.

[Dylan writes] "...because any parallel machine can be simulated on a serial machine with only a negligible loss of efficiency. In other words, the findings of Turing and von Neumann apply to both serial and parallel machines, so it makes no difference which type you focus on."

It's true that in principle any parallel discrete time machine can be implemented on a serial machine, but I think Dylan's "negligible loss of efficiency" comment was waving rather an airy hand over something quite important. Serializing a parallel process requires a proportional increase in computation time, and sometimes such quantitative changes have qualitative consequences—after all, the essential difference between a movie and a slide show is merely quantitative, but because a neural threshold is crossed at around 24 frames/second there's also a fairly profound qualitative difference to us as observers. More importantly, this is why continuous time processes can't always be serialized, since they can lead to a Zeno's Paradox of infinite computation over infinitesimal time slices.

Speaking from a purely practical point of view, time matters. In my work I routinely model parallel systems consisting of a few hundred thousand neurons. I can model these in serial form, luckily, but it's only barely feasible to do so in real time, and I can't slow down gravity for the benefit of my robot. Moore's Law isn't going to help me much either. I'd far rather have access to a million tiny processors than one big one, and the compromises I have to make at the moment (specifically the artifacts that serialization introduces) can really cloud my perception of the kinds of spatial computation I'm trying, with such grotesque inefficiency, to simulate.

Which brings me to the question of whether it "makes no difference which type you focus on".

Turing's famous machine undoubtedly made us focus very heavily on "definite methods"—i.e. algorithms—and algorithms are not the only ways to solve problems. Turing himself realized this, which is perhaps why he did a little work on "unorganized machines" (something akin to neural networks). Many systems involving simultaneous interactions can be satisfactorily approximated in a serial computer, but it doesn't follow that this is the best way of thinking about them, or that solutions of this type might even occur to us while we're wearing serial, discrete time blinkers.

I agree with Jaron that the digital computer has so deeply ingrained itself in our consciousness that we find it hard to see that there are other ways to compute. I'd happily lay a Long Bet that Moore's Law becomes utterly irrelevant in the not-too-distant future, when we suddenly discover new ways to compute things that don't require a stepwise architecture, and I'd agree with Jaron that this new way is likely to be based on spatial patterns (although not pattern recognition).

Sound, incidentally, isn't entirely processed as a temporal stream. Brains can't avoid the fact that sound waves arrive serially, but since speech recognition requires so much contextual and out-of-sequence processing, I bet the brain does its utmost to convert this temporal stream into a spatial form, so that its elements can be overlapped, compared and integrated.

The very first thing that the cochlea does is convert sound frequency into a spatial representation, and this type of coding is retained in the auditory cortex. In fact everything in the cortex seems to be coded spatially. Some parts use very concrete coordinate frames, such as retinotopic or somatotopic coordinates, or shoulder-centred motion vectors, while other parts (such as the Temporal lobes) seem to employ more abstract coordinate spaces, such as the space of all fruit and vegetables.

My AI research leads me to suspect that some of the most crucial components of cortical computation rely on the mathematics of shapes and surfaces inside such coordinate frames—a kind of geometric computation, as opposed to a numerical, sequential one. Luckily for me, you can implement at least most of these spatial transformations using a serial computer, but I find I have to think very distinctly in two modes: as a programmer when creating the palette of neurons and neuromodulators, and then as a... what? a biologist? an artist? a geometer? ...when thinking about the neural computations. The former mindset doesn't work at all well in the latter environment. Connectionism gave us a very distorted view of the brain, as if it were a neat, discrete wiring diagram, when in reality it's more accurate to describe brain tissue as a kind of structured gel.

As Jaron points out, Gabor wavelets and Fourier transforms are (probably) commonplace in the brain. The orientation detectors of primary visual cortex are perhaps best described as Gabor filters, sensitive to both orientation and spatial frequency, even though conventional wisdom sees them as rather more discrete and tidy "edge detectors". The point spread function of nervous tissue is absolutely huge, so signals tend to smear out really quickly in the brain, and yet we manage to perceive objects smaller than the theoretical visual acuity of the retina, so some very distributed, very fuzzy, yet rather lossless computation seems to be going on.

We've only relatively recently "rediscovered" the power of such spatial and convolved forms of computation—ironically in digital signal processors. These are conventional von Neumann-style serial processors, but the kind of computation going on inside them is very much more overlapping and fuzzy, albeit usually one-dimensional. Incidentally, optical holograms can perform convolution, deconvolution and Fourier transforms, among other things, at the speed of light, acting on massively parallel data sets. It's true that we can do the same thing (somewhat more slowly) on a digital computer, but I have a strong feeling that these more distributed and spatial processes are best thought about in their own terms, and only later, if ever, translated into serial form. Such "holographic" processes may well be where the next paradigm shift in computation comes from.

Sometimes what you can see depends on how you look at it, and we shouldn't underestimate the power of a mere shift in viewpoint when it comes to making breakthroughs. Try recognizing an apple from the serial trace of an oscilloscope attached to a video camera that is pointed at an apple, and this fact becomes obvious.

I have to say I couldn't really find anything new in what Jaron says—if anything it seems to be harking back to pre-digital ideas, which is no bad thing—but I definitely don't think such concepts should be dismissed out of hand.

Dylan Evans

Founder and CEO of Projection Point; Author, The Utopia Experiment

To John Smart:

Thanks for your comments and questions. A general overview of computational complexity theory, including the question of serial vs. parallel computing, can be found in Algorithmics: The Spirit of Computing, by David Harel (Addison-Wesley, 3rd edition 2003).

Let me take your questions sequentially:

1. You are right to think that there would be a scale-up problem when using serial systems to simulate a mammalian brain in all its "microarchitectural uniqueness", but this does not contradict my point about simulating anyparallel machine on a serial machine "with only negligible loss of efficiency", for two reasons:

(a) By "negligible", I meant only a polynomial time difference. This is "negligible' in terms of computational complexity theory but not always negligible in the context of a particular technological application at a particular time. An engineer wanting to simulate a mammalian brain today might use a massively parallel machine such as a Beowulf cluster, because for him or her the difference between a year and two days is very significant. But in ten year's time, advances in computing speed might reduce this difference to, say, that between 3 days and 20 minutes.

(b) More importantly, I think your question is premised on a fundamental misunderstanding of classical AI. In classical AI, computers are used not to model the brain in all its molecular glory, but rather to model the mind—to understand the software, in other words, rather than the hardware. By software, I mean algorithms. And it is here that the research into sequential and parallel processing really becomes relevant. For while a parallel machine might work very differently to a serial machine as far as the hardware is concerned (and will therefore employ algorithms specially tailored for parallel architectures), there are no problems that a parallel machine can solve which a serial machine cannot. So we can run equivalent algorithms (equivalent in the sense that they solve the same algorithmic problem) on brains and serial computers. Brains are parallel machines made of very slow components (neurons), while serial computers are sequential machines made of very fast components (silicon circuits). So now the time differences you mention are not so clear. Some people (eg. Nicolelis) have already used serial computers to compute the algorithms running on the brain faster than the brain itself does.

2. Neural networks are, as far as I'm concerned, a huge red-herring. They may make good models of the brain, but they tell us absolutely nothing about the mind. In other words, they are a useful tool for neuroscientists, but not for cognitive scientists or those in AI, who wish to discover what algorithms the brain is running, not the architecture on which it runs them. Besides, all neural networks at the moment are simulations that are written in software of an essentially serial nature which runs on serial processors. Every neural network can, in principle, be reduced to either (a) an algebraic equation or (b) a set of coupled differential equations. From the point of view of someone who wants to understand how the mind works, it is much more important to understand what these equations are, and this may be done more easily and transparently by coding these equations directly than by dressing them up in a Gordian neural network.

You are right that today's serial processing efforts are essentially "elegant prosthetic extensions of top-down human symbolic manipulation", but this doesn't mean that they are not the best way to understand the rest of the mind. In fact, it is precisely because they are extensions of our powers for symbolic manipulation that they constitute such a good way to understand the rest of the mind, rather than merely to simulate it. This is an important point: if you built a neural network that was a perfect model of the brain, in all its detail, that would not tell you very much about the mind. On the other hand, given a representation in a language like C++ of the algorithms running in the brain, you would have a complete understanding of the mind, and you could trace every subroutine down to the last loop. It would be perfectly transparent, in the sense that a good mathematical proof is transparent.

So, I can see why neural networks would be of great relevance to your research in developmental biology, but I hope you can also see why they don't actually help very much if one's aim to discover the algorithms that constitute human intelligence.

To finish, I enjoyed your speculations about forthcoming developments in computer technology. I hope you are right! But creating intelligent artefacts will not necessarily tell us much about the human mind, especially if the artefacts are allowed to evolve in such a way as to become as opaque as all the other examples of evolution we see around us!

Nicholas Humphrey

Emeritus Professor of Psychology, London School of Economics; Visiting Professor of Philosophy, New College of the Humanities; Senior Member, Darwin College, Cambridge; Author, Soul Dust

Human consciousness as an ontology overlaid on the world? No gross, or everyday objects, without it .. neither apples nor houses? "I went in that direction," Lanier says, "and became mystical about everyday objects."

The poet, Rilke, went the same way (Ninth Elegy, Duino Elegies, Leishman translation, 1922):

... all this
that's here, so fleeting, seems to require us and strangely
concerns us... Are we, perhaps, here just for saying: House,
Bridge, Fountain, Gate, Jug, Fruit tree, Window, —
possibly: Pillar, Tower?... but for saying, remember,
oh, for such saying as never the things themselves
hoped so intensely to be.

But, then, as another poet, W H Auden, said of poets: "The reason why it is so difficult for a poet not to tell lies is that in poetry all facts and all beliefs cease to be true or false and become interesting possibilities"

Best,

Nick

Clifford Pickover

Author, The Math Book, The Physics Book, and The Medical Book trilogy

Jaron Lanier certainly covers the gamut, from consciousness, to brains, to computers of the future. I would like to counter by asking the group a question that has been on my mind lately: Would you pay $2000 for a "Turbing"? Let me explain what I mean....

In 1950, Alan Turing proposed that if a computer could successfully mimic a human during an informal exchange of text messages, then, for most practical purposes, the computer might be considered intelligent. This soon became known as the "Turing test," and it since led to endless academic debate.

Opponents of Turing's behavioral criterion of intelligence argue that it is not relevant. This camp suggests that it is important that the computer demonstrates cognitive ability regardless of behavior. They say that computers can never have real thoughts or mental states of their own. The computers can merely simulate thought and intelligence. If such a machine passes the Turing Test, this only proves that it is good at simulating a thinking entity.

Holders of this position also sometimes suggest that only organic things can be conscious. If you believe that only flesh and blood can support consciousness, then it would be very difficult to create conscious machines. But to my way of thinking, there's no reason to exclude the possibility of non-organic sentient beings. If you could make a copy of your brain with the same structure but using different materials, the copy would think it was you.

I call these "humanlike" entities Turing-beings or "Turbings." If our thoughts and consciousness do not depend on the actual substances in our brains but rather on the structures, patterns, and relationships between parts, then Turbings could think. But even if they do not really think but rather act as if they are thinking, would you pay $2000 for a Turbing—a Rubik's-cube sized device that would converse with you in a way that was indistinguishable from a human? Why?

Marvin Minsky

Mathematician; computer scientist; Professor of Media Arts and Sciences, MIT; cofounder, MIT's Artificial Intelligence Laboratory; author, The Emotion Machine

I agree with both critics (Dylan Evans and Dan Dennett).

Papert and I once proved that, in general, parallel processes end up using more computational steps than do serial processes that perform the same computations. And that, in fact when some processes have to wait until certain other ones complete their jobs, the amount of computation will tend to be larger by a factor proportional to the amount of parallelism.

Of course, in cases in which almost all the subcomputations are more independent, the total time consumed can be much less (again in proportion to the amount of parallelism)—but the resources and energy consumed will still be larger. Of course, for most animals, speed is what counts; otherwise Dylan Evans is right, and Lanier's analysis seems in serious need of a better idea.

Here is the presumably out-of-print reference: Marvin Minsky and Seymour Papert, "On Some Associative, Parallel and Analog Computations, in Associative Information Techniques", in E.L. Jacks, ed., American Elsevier Publishing, Inc., 1971, pp. 27-47.

Jaron Lanier

Computer Scientist; Musician; Author, Who Owns The Future?

It's a great thing to face a tough technical crowd. So long as you don't let it get to you, it's the most efficient way to refine your ideas, find new collaborators, and gain the motivation to prove critics wrong.

In this instance, though, I think the critical response misfired.

To understand what I mean, readers can perform a simple exercise. Use a text search tool and apply it to my comments on "Gordian software." See if you can find an instance of the word "Parallel." You will find that the word does not appear.

That's odd, isn't it? You've just read some scathing criticisms about claims I'm said to have made about parallel computer architectures, and it might seem difficult to make those claims without using the word.

It's possible to imagine a non-technical reader confusing what I was calling "surfaces" with something else they might have read about, which is called parallel computation. Both have more than one dimension. But that's only a metaphorical similarity. Any technically educated reader would be hard-pressed to make that mistake.

For non-technical readers who want to know why they're different: "Surfaces" are about approximation. They simulate the sampling process by which digital systems interact with the physical world and apply that form of connection to the internal world of computer architecture. They are an alternative to what I called the "high wire act of perfect protocol adherence" that is used to make internal connections these days. Parallel architectures, at least as we know them, require the highest of high wire acts. In a parallel designs whole new classes of tiny errors with catastrophic consequences must be foreseen in order to be avoided. Surfaces use the technique of approximation in order to reduce the negative effects of small errors. Parallel architecturesare not implied by the fuzzy approach to architecture my piece explored.

It didn't occur to me that anyone would confuse these two very different things, so I made no mention of parallel architectures at all.

The first respondent named Dylan Evans reacted as if I'd made claims about parallel architectures. It is possible that Evans is making the case that I'm inevitably or inadvertently talking about something that I don't think I'm talking about, but the most likely explanation is that a misunderstanding took place. Perhaps I was not clear enough, or perhaps he made assumptions about what I would say and that colored his reading. Dan Dennett then endorsed his remarks. There's probably a grain of legitimate criticism, at least in Dennett's mind, and perhaps someday I'll hear it.

Steve Grand then addressed some of the ideas about parallelism brought up by other respondents, but also pointed out that many of the ideas in my piece were not new, which is correct, and something that I made clear. What was new was not the techniques themselves but the notion of applying techniques that have recently worked well in robotics to binding in modular software architectures. I also hoped to write what I think is the first non-technical explanation of some of these techniques, like the wavelet transform.

At this point, it seemed the discussion was getting back on track. But then Marvin Minsky posted an endorsement of Dennett's endorsement of Evans. Marvin was an essential mentor to me when I younger and I simply had to ask him what was going on. I would like to quote his response:

"Oops. In fact, I failed to read the paper and only read the critics, etc. Just what I tell students never to do: first read the source to see whether or not the critics have (probably) missed the point."

There is a certain competitive, sometimes quite macho dynamic in technical discussions, especially when someone is saying something unfamiliar. I expect that and wouldn't participate in this forum if I was too delicate to take the heat. Once in a while, though, that dynamic gets the better of us and we're drawn off topic.

What I'd like to do at this point is add some background to my argument and refer to some other researchers addressing similar concerns in different ways, because I think this will help to frame what I'm doing and might help readers who are having trouble placing my thoughts in the context of other ideas.

Computer science is like rocket science in this sense: You don't know if the rocket works until you launch it. No matter how lovely and elegant it might be on the ground, you really want to test how it performs once launched. The analog to launching a rocket in computer science is letting a software idea you've seen work on a small scale grow to a large scale. As I pointed out, it's been relatively easy in the history of computer science to make impressive little programs, but hard to make useful large programs. Anyone with eyes to see will acknowledge that most of our lovely rockets are misfiring.

An essential historical document is the book, The Mythical Man Month by Fred Brooks. Brooks was a student of Ivan Sutherland's and wrote this book when the first intimations of the software scaling problem became clear.

A good introduction to the current mainstream response to what has unquestionably become a crisis is the Nov. 2003 issue of MIT's Technology Review magazine, which is themed on this topic. There you can read up on some of the most visible recent ideas on how to address the problem. It's natural to expect a range of proposals on how to respond to a crisis. The proposals reported in TR seem too conservative to me. They are for the most part saying something like, "This time we'll do what we did before but with more discipline and really, really paying attention to how we could screw up." My guess is that we've already seen how disciplined groups of people are capable of being when they make giant software and should accept that as a given rather than hoping it will change.

One doesn't have to hope that one idea will fix everything to search for radical new ideas that might help to some degree. A one-liner that captures the approach described in the "Gordian" piece is that I want to recast some techniques that are working for robots and apply them to the innards of software architectures. I'm not the only radical looking at the problem of scalability. A completely different approach, for instance, is taken by Cordell Green and others who are trying to scale up the idea of logic-based specification as a way to make error-free programs. Yet another batch of ideas can be found in the June issue of Scientific American; see the cover story, which actually does describe a way to apply parallel computation to this problem.

Whether radical or not, a wide range of approaches is called for because the problem is both long-standing and important.

This is implicit in Nicholas Humphrey's response to the second portion of the essay, which was about philosophy rather than software architecture. Just as it's natural for computer scientists to wonder what makes a mind, it's also natural to wonder what makes an object, in the ordinary sense of the word. This is our rediscovery of old questions in our new light.

George Dyson

Science Historian; Author, Analogia

The latest manifesto from Jaron Lanier raises important points. However, it is unfair to attribute to Alan Turing, Norbert Wiener, or John von Neumann (& perhaps Claude Shannon) the limitations of unforgiving protocols and Gordian codes. These pioneers were deeply interested in probabilistic architectures and the development of techniques similar to what Lanier calls phenotropic codes. The fact that one particular computational subspecies became so successful is our problem (if it's a problem) not theirs.

People designing or building computers (serial or parallel; flexible or inflexible; phenotropic or not) are going to keep talking about wires, whether in metaphor or in metal, for a long time to come. As Danny Hillis has explained: "memory locations are simply wires turned sideways in time." If there's a metaphor problem, it's a more subtle one, that we still tend to think that we'resending a coded message to another location, whereas what we're actually doing is replicating the code on the remote host.

In the 1950s it was difficult to imagine hardware ever becoming reliable enough to allow running megabyte strings of code. Von Neumann's "Reliable Organization of Unreliable Elements" (1951) assumed reliable code and unreliable switches, not, as it turned out, the other way around. But the result is really the same (and also applies to coding reliable organisms using unreliable nucleic acids, conveying reliable meaning using unreliable language, and the seemingly intractable problem of assigning large software projects to thousands of people at once).

Von Neumann fleshed out these ideas in a series of six lectures titled "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components" given at Cal Tech on January 4-15, 1952. This formed a comprehensive manifesto for a program similar to Lanier's, though the assumption was that the need for flexible, probabilistic logic would be introduced by the presence of sloppy hardware, not sloppy code. "The structures that I describe are reminiscent of some familiar patterns in the nervous system," he wrote to Warren Weaver on 29 January 1952.

The pioneers of digital computing did not see everything as digitally as some of their followers do today. "Besides," argued von Neumann in a long letter to Norbert Wiener, 29 November 1946 (discussing the human nervous system and a proposed program to attempt to emulate such a system one cell at a time), "the system is not even purely digital (i.e. neural): It is intimately connected to a very complex analogy; (i.e. humoral or hormonal) system, and almost every feedback loop goes through both sectors, if not through the 'outside' world (i.e. the world outside the epidermis or within the digestive system) as well." Von Neumann believed in the reality of cats and apples too.

Turing's universal machine, to prove a mathematical point, took an extreme, linear view of the computational universe, but this does not mean that higher-dimensional surfaces were ignored. Von Neumann, while orchestrating the physical realization of Turing's machine, thought more in terms of matrices (and cellular inhabitants thereof) than tapes. Remember that the original IAS computer (the archetype "von Neumann machine") consisted of a 32 x 32 x 40 matrix, with processing performed in parallel on the 40-bit side. In an outline for the manuscript of a general theory of automata left unfinished at von Neumann's death, Chapter 1 is labeled "Turing!" Chapter 2 is labeled "Not Turing!" Template-based addressing was a key element in von Neumann's overall plan.

In the computational universe of Turing, von Neumann, and Lanier (which we are all agreed corresponds to, but does not replace, the real world) there are two kinds of bits: bits that represent differences in space, and bits that represent differences in time. Computers (reduced to their essence by Turing and later Minsky) translate between these two kinds of bits, moving between structure (memory) and sequence (code) so freely that the distinction is nearly obscured. What troubles Jaron Lanier is that we have suddenly become very good at storing large structures in memory, but remain very poor at writing long sequences of code. This will change.

I'm not immersed in the world of modern software to the same extent as Jaron Lanier, so it may just be innocence that leads me to take a more optimistic view. If multi-megabyte codes always worked reliably, then I'd be worried that software evolution might stagnate and grind to a halt. Because they so often don't work (and fail, for practical purposes, unpredictably, and in the absence of hardware faults) I'm encouraged in my conviction that real evolution (not just within individual codes, but much more importantly, at the surfaces and interfaces between them) will continue to move ahead. The shift toward template-based addressing, with its built-in tolerance for ambiguity, is the start of the revolution we've been waiting for, I think. It all looks quite biomimetic to me.

Steven R. Quartz

Neuroscientist; Professor of Philosophy, Caltech; Co-author, Cool

I have considerable sympathy for Lanier's complaints, although I disagree with the how he's analyzed the situation. I do think he's right that there's something deeply — probably fundamentally — wrong with the current best model of software and computation. But, the problems aren't simply with the von Neumann architectures Lanier criticizes.

Most approaches to parallel computation are equally bad and would need to be solved by Lanier's alternative model. My own attempts to parallelize — note the not coincidental alliteration to "paralyze" — code for one of Cray's parallel supercomputers, the T3D, made it all too clear to me that parallel computation suffered from critical problems that have never been solved (does anyone remember C*?).

Nor does there seem to be much prospect in the near term that they will be solved. Roughly, the problem is, as the number of processors increases, the harder it is to allocate facets of the problem to the processors in an efficient manner. In practice, most processors in a massively parallel computer end up sitting idle waiting for others to finish their task. Beyond this load balancing problem, forget about trying to debug parallel code.

So, what's wrong?

First, I'd respond to Lanier's comments with a historical note. I think the idea that von Neumann and others were misled by technological metaphors gets things the wrong way around. It is clear from von Neumann's speculations in the First Draft on EDVAC that he was utilizing the then state of the art computational neurobiology — McColloch and Pitts’ (1943) results on Turing equivalence for computation in the brain — as grounds for the digital design of the electronic computer. In other words, it was theoretical work in neural computation that influenced the technology, not the other way around. While much has been made of the differences between synchronous serial computation and asynchronous neural computation, the really essential point of similarity is the nonlinearity of both neural processing and the switching elements Shannon explored, which laid the foundation for McColloch and Pitt's application of computational theory to the brain.

In fact, I'd suggest that the real limitation of contemporary computation is the incomplete understanding of nonlinear processing in the brain. We still lack the fundamentals of nonlinear processing in brains: we don't know how information is encoded, why neurotransmitter release is so low a probability event, how dendrites compute, whether local volumes of neural tissue compute via diffuse molecules such as nitric oxide, and a host of other fundamental issues. Taking a hint from von Neumann's own reliance on the theoretical neurobiology of the day, these are the fundamental issues that ought to inform an alternative computational theory.

I have my doubts that a better understanding of processing in the brain will lead to Lanier's surface-based model, as temporal codes are fundamental properties of neural computation. In addition, although Lanier dismisses "signals on wires" computation, the brain is mostly wires (axons), whose optimization in terms of their minimization is a likely key to how the brain processes information.

Finally, I missed where exactly consciousness comes into Lanier's discussion. Personally, I think consciousness is vastly overrated (not my own, of course, but it's role in a science of cognition) — no one has really come up with any argument for what difference it makes and the overwhelming majority of information processing in the brain is subconscious.

There's a lot of work to be done getting a foothold into subconscious information processing before consciousness becomes an issue, and it only will when someone comes up with a solid argument for why it makes a difference. So far, no one has made that argument, which lends support to the possibility that consciousness is epiphenomenal and will never play a role in theorizing about cognition and behavior.

Lee Smolin

Physicist, Perimeter Institute; Author, Einstein's Unfinished Revolution

Reading the critics of Jaron Lanier's essay, in which he speculates about a new form of a computer, based on different principles than those that underlie the standard programmable digital computer, I wonder how people might have reacted, shortly after the invention of the wheel, if some ancestor of Jaron had proposed to invent a new form of transportation that was not a wheel. "Not a wheel!" one can hear them snorting. "Why everyone knows that any device to convey goods must depend on some arrangement of wheels. Not only that, the great thinker van N proved that any arrangement of wheels, whether in parallel or in serial, is equivalent to a single larger wheel, in terms of its ability to move goods."

"No," said the clearly frustrated proto-Jaron, "What I have in mind does involve lashing some logs together, but instead of rolling them, my idea is to put them into the river and simply put the goods on top and float them down to the next camp. So no wheels, and no need to abide by the great van N's theorem on wheel capacity."

The answer then must have been, "Well, we've never heard of such a thing, but try it and see if it works." It seems to me that that's what Jaron's critics might be saying to him, instead of arguing that a boat, as a form of transportation, must roll on wheels.

So it seems to me the question being debated can be framed like this: Is a computer something like a wheel? Is there really only one kind of computer, just like there is really only one kind of wheel? One can arrange them in many ways, in series and in parallel, but in the end, once the wheel or the computer has been invented, they will all work the same way. Even millennia later, wheels are wheels, period. Or, is the computer something more general, like a mode of transportation or a musical instrument. There are many different kinds of musical instruments, which produce sound by means of many different principles. Is it possible that there are actually many different kinds of computers, which will accomplish informational tasks for us by as many different principles as musical instruments produce sounds? In that case, is the problem that the critics are beating their drums, while Jaron is trying to blow the first horn?

Charles Simonyi

Software Engineer, Computer Scientist, Entrepreneur, Philanthropist

I am very happy to see a lot of interesting comments in response to Jaron Lanier's paper. My complaint is with the vast range of Jaron's concerns from the practical software engineering of Fred Brooks, to the issues of consciousness. Maybe his point is that looking far enough one can also solve the more immediate practical problems.

My focus is closer to Fred Brooks’ than to Daniel Dennett's and from that perspective I could comment on the MIT Technology Magazine issue on "Extreme Programming" which featured, among others, the technology that my company, Intentional Software Corporation has been promoting. In his reply to the comments, Jaron referred to the ideas presented in the magazine as "mainstream" and "conservative". I wish that were the case—at least for intentional software. But let me illustrate just how radical the intentional idea is by describing how it applies to the Gordian Software problem.

I am amazed how many software discussions center on essentially implementation questions, while no one seems care much about what the Problem to be solved really is. The implicit assumption is that the Problem will be first described only by some mathematical language—assembly, Cobol, Java, graphical programming, design patterns, or even logic-based specifications. This is as if the Problem had not existed before a software implementation. What did people do before, one might ask?

The obvious fact is that before computerization, people used to use their consciousness and intelligence to represent (and maybe even solve after a fashion) the Problem. For example, instead of using computer software, architects or accountants used to make drawings or balance the ledgers "by hand" that is by using their intelligence. So the two demonstrated representations for problems are: human intelligence, or an effectively machine executable software implementation.

Gordian software is a child of this false dichotomy where there is no machine-accessible representation of the problem other than the implementation. For the implementation is manifestly not the Problem, it is complex interweaving of the Problem with information technology: the scale, the platforms, the languages, the standards, the algorithms, the security and privacy concerns, and so on. This interweaving creates a horrible explosion of the size of the description because it includes not just all of the problem and all of the technological principles at play, but every and all instances where the two may interact. So the size of the description is proportional to the size of a product space, not the sum of two problem spaces. This is manifestly expensive but also very destructive to any desired human or mechanical processing of the description—to put it bluntly, programmers act as steganographers, in effect encrypting or making inaccessible the useful information by embedding it in massive amounts of implementation detail.

The radical idea of Intentional Software is to focus attention on the Problem owners—let's call them Subject Matter Experts—and on the interface between them and the programmers who are the implementation experts. We will assist the SME's to express their problem in their notation, in their terms. The result will be "intentional" in that it will represent what they intend to accomplish, even though it will "lack" or rather it will be free of the semantic details that are key to any implementation. We will then ask the programmers to write a generator/transformer from the intentional description to a traditional implementation will all the desired properties—speed, compatibility, standards, and so on. So the Problem will be represented as one factor, and it can be made effective by the application of the generator, the second factor, that represents the implementation aspects of the solution.

The amount of new technology that is required is modest: basically we need a special editor—a sort of super Power Point—that assists the SME's to record and maintain their intentions and also the meta-data—the schemas—about their notations and terms.

The difference in the approach from the programmer's point of view is almost superficial. In the absurd—but not unprecedented–limiting case, where the SME's contribute just the product name, the programmers simply have to embed their contribution—prepared as before—into a simple "generator" framework parameterized by the product name string intention. Nothing is gained by that, and it is a historical curiosity that some problems were solved just by programmers. But we can see how additional useful contributions from the SME's could then successively introduce more variablility into the output of the generator, and create a more effective balance between the amount of contributions from the SME's and from the programmers while maintaining the key invariants:

1. The intentions remain free of implementation semantics—that means SME's do not have to learn programming. Furthermore the intentional description is "compact"—it is as large and complex as the Problem itself, and not combinatorially larger. The compactness in turn promotes the SMEs’ ability to interact with it, to perfect it.

2. Changes made by an SME to the intentional description can result in a new artifact at machine speeds and at essentially machine precision—by the application of the generator and without the participation of a programmer.

3. Changes to generator by the programmer can change aspects of the implementation at the cost that is measured in implementation space and not in problem space or the combinatorial product space of the two as it is the case with the current technique.

It is not difficult to see how other key issues of software engineering would also become more tractable if such factoring could be employed—maintenance, bugs, aspects, reuse, programmer training, or "user programming", could be all re-interpreted in their simpler and purer environments.

It is harder to see how this factoring can be enabled and facilitated by tools, services, or training, and what new problems that are unique to intentional programming might emerge. The good new is that there is more and more attention paid both to the software engineering problem and also to the intentional and other generative schemes as possible solutions. It is also encouraging that in specific areas these ideas have been flourishing for quite a while. Most game programs, just to mention one area, are created using multiple levels of domain-specific encodings and mechanical program generation.

As an aside I note for the Edge audience that DNA is an intentional program—it lacks implementation detail and it is given implementation detail only by the well-known generators, which range from the ribosome through the phenotype to the whole ecosystem. So the DNA does not concern itself with how the organism works, it rather describes how the organism should be built or, really, what the "problem" really is. Because DNA is intentional, its length is short relative to its result—indeed the length of human genome belies its cosmic importance by being shorter than the source codes of many human software artifacts of more modest accomplishments.

Another key feature of the encoding is that it is "easy" to change, that is an important fraction of possible changes are also meaningful changes; this made evolution possible—or rather this is a feature of evolved things. Had the code included implementation detail—that is if it had been more like a "blueprint" as in the popular metaphor or if it had been more like a software program—then it could not have evolved naturally and people hoping for some sign of an intelligent designer would have had their smoking amino acid.

Dylan Evans

Founder and CEO of Projection Point; Author, The Utopia Experiment

I was saddened to see Edge publish the confused ramblings of Jaron Lanier (Edge #128). I offer the following comments with some hesitation, as they may serve to endow Lanier's nonsense with an importance they do not deserve:

1. Lanier's main objection to the work of Turing, von Neumann, and the other members of `the first generation of computer scientists' seems to boil down to the fact that they all focused on serial machines (sequential processors), while Lanier thinks that surfaces (parallel processors) would have been a better starting point. This completely misses one of the most important insights of Turing and von Neumann - namely, that the distinction between serial and parallel processors is trivial, because any parallel machine can be simulated on a serial machine with only a negligible loss of efficiency. In other words, the findings of Turing and von Neumann apply to both serial and parallel machines, so it makes no difference which type you focus on. At one point, Lanier seems to admit this, when he states that `the distinction between protocols and patterns is not absolute - one can in theory convert between them', but in the very next sentence he goes on to say that `it's an important distinction in practice, because the conversion is often beyond us'. The latter sentence is false - it is incredibly easy to simulate parallel devices on serial machines. Indeed, virtually every parallel device ever `built' has been built in software that runs on a serial machine.

2. Lanier claims that parallel machines are somehow more biological or `biomimetic' than serial machines, because 'the world as our nervous systems know it is not based on single point measurements, but on surfaces'. Unfortunately for Lanier, the body is an ambiguous metaphor. True, it has surfaces - sensors that are massively parallel - such as retinas (to use Lanier's example). But it also has wires - sensory systems that are serial - the clearest example of which is hearing. Indeed, the fundamental technology that enabled human civilisation - language - first arose as an acoustic phenomenon because the serial nature of language was most easily accommodated by a serial sensory system. The birth of writing represented the first means of transforming an originally parallel modality (vision) into a serial device. In fact, progress almost always consists in moving from parallel devices to serial ones, not vice versa. Even the `biomimetic robots' that Lanier admires are serial machines at heart.

3. Lanier waxes lyrical about his alternative approach to software, which he dubs 'phenotropic'. But he fails to say whether this software will run on serial machines or not. If it will, then it won't represent the fundamental breakthrough that Lanier seems to think it will. If it won't run on serial processors, then where is the parallel machine that it will run on? Until Lanier can produce such a parallel machine, and show it to be exponentially faster than the serial machines we currently have, his claims will have to be regarded as the kind of pie-in-the-sky that he accuses most computer scientists of indulging in. Real computer scientists, of course, do not really indulge in pie-in-the-sky. The reason that some of them talk about 'ideal computers' rather than 'real computers as we observe them' has nothing to do with a tendency to fantasise, as Lanier implies. Rather, it is because they are interested in discovering the laws governing all computers, not just the ones we currently build.

Best wishes,

Dylan

John Smart

chairman of the Institute for Accelerating Change

To Dylan Evans:

You made the following statement in your response:

the distinction between serial and parallel processors is trivial, because any parallel machine can be
simulated on a serial machine with only a negligible loss of efficiency.

I found that statement fascinating. I've heard it vaguely before, and it exposes a hole in my understanding and intuition, if true. I was wondering if you could point me to a reference that discusses this further. My training is in biological and systems sciences, with only a few semesters of undergraduate computer science, so I'd appreciate any general overview you might recommend.

I also have two specific questions, which I am hopeful you can address with a sentence or two:

1. I would expect connectionist architectures such as neural networks and their variants to be simulable on serial machines for small numbers of nodes with only a negligible loss of efficiency. But how could that scale up to millions or billions of nodes without requiring inordinate time to run the simulations? Isn't there a combinatorial explosion and processing bottleneck here?

I just can't believe, unless you understand some interpretation of Turing and Von Neumann, et. al. that I've never learned, that there wouldn't be a scale up problem using serial systems to simulate all the possible nuances of the "digital controlled analog" synaptic circuitry in a mammalian brain, with all its microarchitectural uniqueness.

A related and equally important problem, to my mind, involves the timing differences between differentiated circuits operating in parallel. Neurons have learned to encode information in the varying timing and frequency of their pulses. Various models (e.g. Edelman's "reentrant" thalamocortical loops) suggest to me that nets of differentiated neurons would be very likely to have learned to encode a lot of useful information in the differential rates of their computation. Therefore, even if there is only a "negligible" slowdown in the serial simulation of a particular set of neurons, if it were real, it would seem to me to throw away a lot of what may be the most important information that massively parallel systems like the brain have harnessed: how to utilize the stably emergent, embodied, subtly different rates of convergence of pattern recognition among different specialized neural systems.

2. Our brain apparently uses trillions of synaptic connections, each of which has been randomly tuned to slightly different representations of reality bits (as in visual processing), in order to discover, in a process of neural convergence, a number of emergent gestalt perceptions, then aren't we going to need massive self-constructing connectionist capability in order to emulate this in the hardware space?

Teuvo Kohonen (one of the pioneers of Self-Organizing Maps in neural networks) once said something similar to this to me, and he expects his field to take off once we are doing most of our neural net implementation in hardware, not software.

For what it's worth, I am currently entertaining the model, borrowed from developmental biology (including the developing brain) that about 90-95% of complexity in any interesting system is driven by bottom up, chaotically deterministic processes (which must fully explore their phase space and then selectively converge), and about 5-10% involves a critical set of top down controls. These top down controls are tuned into the parameters and boundary conditions of the developing system (as with the special set of developmental genes that guide metazoan elaboration of form). Serial processing in human brains seems to me to be a top down process, one that emerged from a bottom up process of evolutionary exploration, one that is very important but only the tip of the iceberg, so to speak. The limited degree of serial and symbolic processing that our brains can do, versus their massive unconscious "competitions" of protothoughts, seems to me to be a balance we can see in all complex adaptive systems. (Calvin's Cerebral Code provides some early speculations on that, as does Edelman's Neural Darwinism).

I see today's serial programming efforts essentially as elegant prosthetic extensions of top down human symbolic manipulation (the way a hammer is an extension of the hand), but some time after 2020, when we've reached a local limit in shrinking the chips, there will for the first time be a market for multichip architectures (e.g., evolvable hardware can be commercially viable at that point), and it is at that point that I expect to see commercially successful biologically inspired bottom up driven architectures. IIt is at that point that I expect technology to transition from being primarily an extension and amplifier of human drives to becoming a self-organizing, and increasingly autonomous computational substrate in its own right.

Neural nets controlled by a hardware description language that had the capacity to tune up the way it harnessed randomness in network construction, and to pass on those finely tuned parameters, in the same way that DNA does, would seem to me to be a minimum criterion for applying the phrase "biologically inspired." But this seems to be something we are still decades away from implementing (beyond toy research models). I would see such systems, once they have millions of nodes and have matured a bit, as potential candidates for developing higher level intelligence, even though the humans tending them at that time may still have only a limited appreciation of the architectures needed for such intelligence.

This may be more than you want to address, but any responses you (or any of the other thinkers on this thread) might share would be much appreciated, as I'm in a bit of cognitive dissonance now given your interesting statement and Dan Dennett's implicit support of it (below). Thanks again for any help you may offer in clarifying your statements.

Daniel C. Dennett

Philosopher; Austin B. Fletcher Professor of Philosophy, Co-Director, Center for Cognitive Studies, Tufts University; Author, From Bacteria to Bach and Back

I read Dylan's response to Jaron's piece, and Dylan has it right. I'm not tempted to write a reply, even though Jaron has some curious ideas about what my view is (or might be—you can tell he's not really comfortable attributing these views to me, they way he qualifies it). And what amazes me is that he can't see that he's doing exactly the thing he chastises the early AI community for doing: getting starry-eyed about a toy model that might—might—scale up and might not. There are a few interesting ideas in his ramblings, but it's his job to clean them up and present them in some sort of proper marching order, not ours. Until he does this, there's nothing to reply to.

Dan

WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES

WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES

Reality Club Discussion

Contributors

Beyond Edge

Books

Events

Tags

Conversations at Edge

WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES

Reality Club Discussion

News

What's Related

People

Contributors

Beyond Edge

Books

Events

Tags

Conversations at Edge