Re: WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES A Talk with Jaron Lanier


Responses by Dylan Evans, Daniel C. Dennett, Steve Grand, Nicholas Humphrey, Clifford Pickover, Marvin Minsky, Lanier replies, George Dyson, Steven R. Quartz, Lee Smolin, Charles Simonyi, John Smart, Daniel C. Dennett, Dylan Evans


Dylan Evans

I was saddened to see Edge publish the confused ramblings of Jaron Lanier (Edge #128). I offer the following comments with some hesitation, as they may serve to endow Lanier's nonsense with an importance they do not deserve:

1. Lanier's main objection to the work of Turing, von Neumann, and the other members of `the first generation of computer scientists' seems to boil down to the fact that they all focused on serial machines (sequential processors), while Lanier thinks that surfaces (parallel processors) would have been a better starting point. This completely misses one of the most important insights of Turing and von Neumann - namely, that the distinction between serial and parallel processors is trivial, because any parallel machine can be simulated on a serial machine with only a negligible loss of efficiency. In other words, the findings of Turing and von Neumann apply to both serial and parallel machines, so it makes no difference which type you focus on. At one point, Lanier seems to admit this, when he states that `the distinction between protocols and patterns is not absolute - one can in theory convert between them', but in the very next sentence he goes on to say that `it's an important distinction in practice, because the conversion is often beyond us'. The latter sentence is false - it is incredibly easy to simulate parallel devices on serial machines. Indeed, virtually every parallel device ever `built' has been built in software that runs on a serial machine.

2. Lanier claims that parallel machines are somehow more biological or `biomimetic' than serial machines, because 'the world as our nervous systems know it is not based on single point measurements, but on surfaces'. Unfortunately for Lanier, the body is an ambiguous metaphor. True, it has surfaces - sensors that are massively parallel - such as retinas (to use Lanier's example). But it also has wires - sensory systems that are serial - the clearest example of which is hearing. Indeed, the fundamental technology that enabled human civilisation - language - first arose as an acoustic phenomenon because the serial nature of language was most easily accommodated by a serial sensory system. The birth of writing represented the first means of transforming an originally parallel modality (vision) into a serial device. In fact, progress almost always consists in moving from parallel devices to serial ones, not vice versa. Even the `biomimetic robots' that Lanier admires are serial machines at heart.

3. Lanier waxes lyrical about his alternative approach to software, which he dubs 'phenotropic'. But he fails to say whether this software will run on serial machines or not. If it will, then it won't represent the fundamental breakthrough that Lanier seems to think it will. If it won't run on serial processors, then where is the parallel machine that it will run on? Until Lanier can produce such a parallel machine, and show it to be exponentially faster than the serial machines we currently have, his claims will have to be regarded as the kind of pie-in-the-sky that he accuses most computer scientists of indulging in. Real computer scientists, of course, do not really indulge in pie-in-the-sky. The reason that some of them talk about 'ideal computers' rather than 'real computers as we observe them' has nothing to do with a tendency to fantasise, as Lanier implies. Rather, it is because they are interested in discovering the laws governing all computers, not just the ones we currently build.

Best wishes,

Dylan

DYLAN EVANS is Research Officer in Evolutionary Robotics, Centre for Biomimetics and Natural Technology, Department of Mechanical Engineering,  University of Bath. His book, Introducing Evolutionary Psychology, was required reading for the main actors in "The Matrix".


Daniel C. Dennett

I read Dylan's response to Jaron's piece, and Dylan has it right. I'm not tempted to write a reply, even though Jaron has some curious ideas about what my view is (or might be—you can tell he's not really comfortable attributing these views to me, they way he qualifies it).  And what amazes me is that he can't see that he's doing exactly the thing he chastises the early AI community for doing: getting starry-eyed about a toy model that might—might—scale up and might not.  There are a few interesting ideas in his ramblings, but it's his job to clean them up and present them in  some sort of proper marching order, not ours. Until he does this, there's nothing to reply to.

Dan

DANIEL C. DENNETT is University Professor, Professor of Philosophy, and Director of the Center for Cognitive Studies at Tufts University. He is the author of Consciousness Explained; Darwin's Dangerous Idea; and Freedom Evolves.


Steve Grand

I admit I didn't understand the latter half of Jaron's paper, so I can't yet comment on it, but I'd like to respond to a few of Dylan's comments with a plea not to be quite so dismissive.

[Dylan writes] "...because any parallel machine can be simulated on a serial machine with only a negligible loss of efficiency. In other words, the findings of Turing and von Neumann apply to both serial and parallel machines, so it makes no difference which type you focus on."

It's true that in principle any parallel discrete time machine can be implemented on a serial machine, but I think Dylan's "negligible loss of efficiency" comment was waving rather an airy hand over something quite important. Serializing a parallel process requires a proportional increase in computation time, and sometimes such quantitative changes have qualitative consequences—after all, the essential difference between a movie and a slide show is merely quantitative, but because a neural threshold is crossed at around 24 frames/second there's also a fairly profound qualitative difference to us as observers. More importantly, this is why continuous time processes can't always be serialized, since they can lead to a Zeno's Paradox of infinite computation over infinitesimal time slices.

Speaking from a purely practical point of view, time matters. In my work I routinely model parallel systems consisting of a few hundred thousand neurons. I can model these in serial form, luckily, but it's only barely feasible to do so in real time, and I can't slow down gravity for the benefit of my robot. Moore's Law isn't going to help me much either. I'd far rather have access to a million tiny processors than one big one, and the compromises I have to make at the moment (specifically the artifacts that serialization introduces) can really cloud my perception of the kinds of spatial computation I'm trying, with such grotesque inefficiency, to simulate.

Which brings me to the question of whether it "makes no difference which type you focus on".

Turing's famous machine undoubtedly made us focus very heavily on "definite methods"—i.e. algorithms—and algorithms are not the only ways to solve problems. Turing himself realized this, which is perhaps why he did a little work on "unorganized machines" (something akin to neural networks). Many systems involving simultaneous interactions can be satisfactorily approximated in a serial computer, but it doesn't follow that this is the best way of thinking about them, or that solutions of this type might even occur to us while we're wearing serial, discrete time blinkers.

I agree with Jaron that the digital computer has so deeply ingrained itself in our consciousness that we find it hard to see that there are other ways to compute. I'd happily lay a Long Bet that Moore's Law becomes utterly irrelevant in the not-too-distant future, when we suddenly discover new ways to compute things that don't require a stepwise architecture, and I'd agree with Jaron that this new way is likely to be based on spatial patterns (although not pattern recognition).

Sound, incidentally, isn't entirely processed as a temporal stream. Brains can't avoid the fact that sound waves arrive serially, but since speech recognition requires so much contextual and out-of-sequence processing, I bet the brain does its utmost to convert this temporal stream into a spatial form, so that its elements can be overlapped, compared and integrated.

The very first thing that the cochlea does is convert sound frequency into a spatial representation, and this type of coding is retained in the auditory cortex. In fact everything in the cortex seems to be coded spatially. Some parts use very concrete coordinate frames, such as retinotopic or somatotopic coordinates, or shoulder-centred motion vectors, while other parts (such as the Temporal lobes) seem to employ more abstract coordinate spaces, such as the space of all fruit and vegetables.

My AI research leads me to suspect that some of the most crucial components of cortical computation rely on the mathematics of shapes and surfaces inside such coordinate frames—a kind of geometric computation, as opposed to a numerical, sequential one. Luckily for me, you can implement at least most of these spatial transformations using a serial computer, but I find I have to think very distinctly in two modes: as a programmer when creating the palette of neurons and neuromodulators, and then as a... what? a biologist? an artist? a geometer? ...when thinking about the neural computations. The former mindset doesn't work at all well in the latter environment. Connectionism gave us a very distorted view of the brain, as if it were a neat, discrete wiring diagram, when in reality it's more accurate to describe brain tissue as a kind of structured gel.

As Jaron points out, Gabor wavelets and Fourier transforms are (probably) commonplace in the brain. The orientation detectors of primary visual cortex are perhaps best described as Gabor filters, sensitive to both orientation and spatial frequency, even though conventional wisdom sees them as rather more discrete and tidy "edge detectors". The point spread function of nervous tissue is absolutely huge, so signals tend to smear out really quickly in the brain, and yet we manage to perceive objects smaller than the theoretical visual acuity of the retina, so some very distributed, very fuzzy, yet rather lossless computation seems to be going on.

We've only relatively recently "rediscovered" the power of such spatial and convolved forms of computation—ironically in digital signal processors. These are conventional von Neumann-style serial processors, but the kind of computation going on inside them is very much more overlapping and fuzzy, albeit usually one-dimensional. Incidentally, optical holograms can perform convolution, deconvolution and Fourier transforms, among other things, at the speed of light, acting on massively parallel data sets. It's true that we can do the same thing (somewhat more slowly) on a digital computer, but I have a strong feeling that these more distributed and spatial processes are best thought about in their own terms, and only later, if ever, translated into serial form. Such "holographic" processes may well be where the next paradigm shift in computation comes from.

Sometimes what you can see depends on how you look at it, and we shouldn't underestimate the power of a mere shift in viewpoint when it comes to making breakthroughs. Try recognizing an apple from the serial trace of an oscilloscope attached to a video camera that is pointed at an apple, and this fact becomes obvious.

I have to say I couldn't really find anything new in what Jaron says—if anything it seems to be harking back to pre-digital ideas, which is no bad thing—but I definitely don't think such concepts should be dismissed out of hand.

STEVE GRAND is an aritifical life researcher and creator of Lucy, a robot babay orangutan. He is the founder of Cyberlife Research and the author of Creation: Life and How to Make It.


Nicholas Humphrey

Human consciousness as an ontology overlaid on the world? No gross, or everyday objects, without it .. neither apples nor houses? "I went in that direction," Lanier says, "and became mystical about everyday objects."

The poet, Rilke, went the same way (Ninth Elegy, Duino Elegies, Leishman translation, 1922):

... all this
that's here, so fleeting, seems to require us and strangely
concerns us... Are we, perhaps, here just for saying: House,
Bridge, Fountain, Gate, Jug, Fruit tree, Window, —
possibly: Pillar, Tower?... but for saying, remember,
oh, for such saying as never the things themselves
hoped so intensely to be.

But, then, as another poet, W H Auden, said of poets: "The reason why it is so difficult for a poet not to tell lies is that in poetry all facts and all beliefs cease to be true or false and become interesting possibilities"

Best,

Nick

NICHOLAS HUMPHREY, School Professor at the London School of Economics is a theoretical psychologist and author of A History of the Mind, Leaps of Faith, and The Mind Made Flesh.


Clifford Pickover

Jaron Lanier certainly covers the gamut, from consciousness, to brains, to computers of the future. I would like to counter by asking the group a question that has been on my mind lately: Would you pay $2000 for a "Turbing"? Let me explain what I mean....

In 1950, Alan Turing proposed that if a computer could successfully mimic a human during an informal exchange of text messages, then, for most practical purposes, the computer might be considered intelligent. This soon became known as the "Turing test," and it since led to endless academic debate.

Opponents of Turing's behavioral criterion of intelligence argue that it is not relevant. This camp suggests that it is important that the computer demonstrates cognitive ability regardless of behavior. They say that computers can never have real thoughts or mental states of their own. The computers can merely simulate thought and intelligence. If such a machine passes the Turing Test, this only proves that it is good at simulating a thinking entity.

Holders of this position also sometimes suggest that only organic things can be conscious. If you believe that only flesh and blood can support consciousness, then it would be very difficult to create conscious machines. But to my way of thinking, there's no reason to exclude the possibility of non-organic sentient beings. If you could make a copy of your brain with the same structure but using different materials, the copy would think it was you.

I call these "humanlike" entities Turing-beings or "Turbings." If our thoughts and consciousness do not depend on the actual substances in our brains but rather on the structures, patterns, and relationships between parts, then Turbings could think. But even if they do not really think but rather act as if they are thinking, would you pay $2000 for a Turbing—a Rubik's-cube sized device that would converse with you in a way that was indistinguishable from a human? Why?

CLIFFORD PICKOVER is a research staff member at IBM's T. J. Watson Research Center, in Yorktown Heights, New York. His books include Time : A Traveler's Guide; Surfing Through Hyperspace; and Black Holes: A Traveler's Guide.


Marvin Minsky

I agree with both critics (Dylan Evans and Dan Dennett).

Papert and I once proved that, in general, parallel processes end up using more computational steps than do serial processes that perform the same computations. And that, in fact when some processes have to wait until certain other ones complete their jobs, the amount of computation will tend to be larger by a factor proportional to the amount of parallelism.

Of course, in cases in which almost all the subcomputations are more independent, the total time consumed can be much less (again in proportion to the amount of parallelism)—but the resources and energy consumed will still be larger. Of course, for most animals, speed is what counts; otherwise Dylan Evans is right, and Lanier's analysis seems in serious need of a better idea.

Here is the presumably out-of-print reference: Marvin Minsky and Seymour Papert, "On Some Associative, Parallel and Analog Computations, in Associative Information Techniques", in E.L. Jacks, ed., American Elsevier Publishing, Inc., 1971, pp. 27-47.

MARVIN MINSKY, mathematician and computer scientist at MIT, is a leader of the second generation of computer scientists and one of the fathers of AI. He is the author The Society of Mind.


Jaron Lanier

It's a great thing to face a tough technical crowd. So long as you don't let it get to you, it's the most efficient way to refine your ideas, find new collaborators, and gain the motivation to prove critics wrong.

In this instance, though, I think the critical response misfired.

To understand what I mean, readers can perform a simple exercise. Use a text search tool and apply it to my comments on "Gordian software." See if you can find an instance of the word "Parallel." You will find that the word does not appear.

That's odd, isn't it? You've just read some scathing criticisms about claims I'm said to have made about parallel computer architectures, and it might seem difficult to make those claims without using the word.

It's possible to imagine a non-technical reader confusing what I was calling "surfaces" with something else they might have read about, which is called parallel computation. Both have more than one dimension. But that's only a metaphorical similarity. Any technically educated reader would be hard-pressed to make that mistake.

For non-technical readers who want to know why they're different: "Surfaces" are about approximation. They simulate the sampling process by which digital systems interact with the physical world and apply that form of connection to the internal world of computer architecture. They are an alternative to what I called the "high wire act of perfect protocol adherence" that is used to make internal connections these days. Parallel architectures, at least as we know them, require the highest of high wire acts. In a parallel designs whole new classes of tiny errors with catastrophic consequences must be foreseen in order to be avoided. Surfaces use the technique of approximation in order to reduce the negative effects of small errors. Parallel architectures are not implied by the fuzzy approach to architecture my piece explored.

It didn't occur to me that anyone would confuse these two very different things, so I made no mention of parallel architectures at all.

The first respondent named Dylan Evans reacted as if I'd made claims about parallel architectures. It is possible that Evans is making the case that I'm inevitably or inadvertently talking about something that I don't think I'm talking about, but the most likely explanation is that a misunderstanding took place. Perhaps I was not clear enough, or perhaps he made assumptions about what I would say and that colored his reading. Dan Dennett then endorsed his remarks. There's probably a grain of legitimate criticism, at least in Dennett's mind, and perhaps someday I'll hear it.

Steve Grand then addressed some of the ideas about parallelism brought up by other respondents, but also pointed out that many of the ideas in my piece were not new, which is correct, and something that I made clear. What was new was not the techniques themselves but the notion of applying techniques that have recently worked well in robotics to binding in modular software architectures. I also hoped to write what I think is the first non-technical explanation of some of these techniques, like the wavelet transform.

At this point, it seemed the discussion was getting back on track. But then Marvin Minsky posted an endorsement of Dennett's endorsement of Evans. Marvin was an essential mentor to me when I younger and I simply had to ask him what was going on. I would like to quote his response:

"Oops. In fact, I failed to read the paper and only read the critics, etc. Just what I tell students never to do: first read the source to see whether or not the critics have (probably) missed the point."

There is a certain competitive, sometimes quite macho dynamic in technical discussions, especially when someone is saying something unfamiliar. I expect that and wouldn't participate in this forum if I was too delicate to take the heat. Once in a while, though, that dynamic gets the better of us and we're drawn off topic.

What I'd like to do at this point is add some background to my argument and refer to some other researchers addressing similar concerns in different ways, because I think this will help to frame what I'm doing and might help readers who are having trouble placing my thoughts in the context of other ideas.

Computer science is like rocket science in this sense: You don't know if the rocket works until you launch it. No matter how lovely and elegant it might be on the ground, you really want to test how it performs once launched. The analog to launching a rocket in computer science is letting a software idea you've seen work on a small scale grow to a large scale. As I pointed out, it's been relatively easy in the history of computer science to make impressive little programs, but hard to make useful large programs. Anyone with eyes to see will acknowledge that most of our lovely rockets are misfiring.

An essential historical document is the book, The Mythical Man Month by Fred Brooks. Brooks was a student of Ivan Sutherland's and wrote this book when the first intimations of the software scaling problem became clear.

A good introduction to the current mainstream response to what has unquestionably become a crisis is the Nov. 2003 issue of MIT's Technology Review magazine, which is themed on this topic. There you can read up on some of the most visible recent ideas on how to address the problem. It's natural to expect a range of proposals on how to respond to a crisis. The proposals reported in TR seem too conservative to me. They are for the most part saying something like, "This time we'll do what we did before but with more discipline and really, really paying attention to how we could screw up." My guess is that we've already seen how disciplined groups of people are capable of being when they make giant software and should accept that as a given rather than hoping it will change.

One doesn't have to hope that one idea will fix everything to search for radical new ideas that might help to some degree. A one-liner that captures the approach described in the "Gordian" piece is that I want to recast some techniques that are working for robots and apply them to the innards of software architectures. I'm not the only radical looking at the problem of scalability. A completely different approach, for instance, is taken by Cordell Green and others who are trying to scale up the idea of logic-based specification as a way to make error-free programs. Yet another batch of ideas can be found in the June issue of Scientific American; see the cover story, which actually does describe a way to apply parallel computation to this problem.

Whether radical or not, a wide range of approaches is called for because the problem is both long-standing and important.

This is implicit in Nicholas Humphrey's response to the second portion of the essay, which was about philosophy rather than software architecture. Just as it's natural for computer scientists to wonder what makes a mind, it's also natural to wonder what makes an object, in the ordinary sense of the word. This is our rediscovery of old questions in our new light.


George Dyson

The latest manifesto from Jaron Lanier raises important points. However, it is unfair to attribute to Alan Turing, Norbert Wiener, or John von Neumann (& perhaps Claude Shannon) the limitations of unforgiving protocols and Gordian codes. These pioneers were deeply interested in probabilistic architectures and the development of techniques similar to what Lanier calls phenotropic codes. The fact that one particular computational subspecies became so successful is our problem (if it's a problem) not theirs.

People designing or building computers (serial or parallel; flexible or inflexible; phenotropic or not) are going to keep talking about wires, whether in metaphor or in metal, for a long time to come. As Danny Hillis has explained: "memory locations are simply wires turned sideways in time." If there's a metaphor problem, it's a more subtle one, that we still tend to think that we're sending a coded message to another location, whereas what we're actually doing is replicating the code on the remote host.

In the 1950s it was difficult to imagine hardware ever becoming reliable enough to allow running megabyte strings of code. Von Neumann's "Reliable Organization of Unreliable Elements" (1951) assumed reliable code and unreliable switches, not, as it turned out, the other way around. But the result is really the same (and also applies to coding reliable organisms using unreliable nucleic acids, conveying reliable meaning using unreliable language, and the seemingly intractable problem of assigning large software projects to thousands of people at once).

Von Neumann fleshed out these ideas in a series of six lectures titled "Probabilistic Logics and the Synthesis of Reliable Organisms from Unreliable Components" given at Cal Tech on January 4-15, 1952. This formed a comprehensive manifesto for a program similar to Lanier's, though the assumption was that the need for flexible, probabilistic logic would be introduced by the presence of sloppy hardware, not sloppy code. "The structures that I describe are reminiscent of some familiar patterns in the nervous system," he wrote to Warren Weaver on 29 January 1952.

The pioneers of digital computing did not see everything as digitally as some of their followers do today. "Besides," argued von Neumann in a long letter to Norbert Wiener, 29 November 1946 (discussing the human nervous system and a proposed program to attempt to emulate such a system one cell at a time), "the system is not even purely digital (i.e. neural): It is intimately connected to a very complex analogy; (i.e. humoral or hormonal) system, and almost every feedback loop goes through both sectors, if not through the 'outside' world (i.e. the world outside the epidermis or within the digestive system) as well." Von Neumann believed in the reality of cats and apples too.

Turing's universal machine, to prove a mathematical point, took an extreme, linear view of the computational universe, but this does not mean that higher-dimensional surfaces were ignored. Von Neumann, while orchestrating the physical realization of Turing's machine, thought more in terms of matrices (and cellular inhabitants thereof) than tapes. Remember that the original IAS computer (the archetype "von Neumann machine") consisted of a 32 x 32 x 40 matrix, with processing performed in parallel on the 40-bit side. In an outline for the manuscript of a general theory of automata left unfinished at von Neumann's death, Chapter 1 is labeled "Turing!" Chapter 2 is labeled "Not Turing!" Template-based addressing was a key element in von Neumann's overall plan.

In the computational universe of Turing, von Neumann, and Lanier (which we are all agreed corresponds to, but does not replace, the real world) there are two kinds of bits: bits that represent differences in space, and bits that represent differences in time. Computers (reduced to their essence by Turing and later Minsky) translate between these two kinds of bits, moving between structure (memory) and sequence (code) so freely that the distinction is nearly obscured. What troubles Jaron Lanier is that we have suddenly become very good at storing large structures in memory, but remain very poor at writing long sequences of code. This will change.

I'm not immersed in the world of modern software to the same extent as Jaron Lanier, so it may just be innocence that leads me to take a more optimistic view. If multi-megabyte codes always worked reliably, then I'd be worried that software evolution might stagnate and grind to a halt. Because they so often don't work (and fail, for practical purposes, unpredictably, and in the absence of hardware faults) I'm encouraged in my conviction that real evolution (not just within individual codes, but much more importantly, at the surfaces and interfaces between them) will continue to move ahead. The shift toward template-based addressing, with its built-in tolerance for ambiguity, is the start of the revolution we've been waiting for, I think. It all looks quite biomimetic to me.

GEORGE DYSON, who lives in Bellingham, Washington, is author of Baidarka, Darwin Among the Machines; and Project Orion.


Steven R. Quartz

I have considerable sympathy for Lanier's complaints, although I disagree with the how he's analyzed the situation. I do think he's right that there's something deeply — probably fundamentally — wrong with the current best model of software and computation. But, the problems aren't simply with the von Neumann architectures Lanier criticizes.

Most approaches to parallel computation are equally bad and would need to be solved by Lanier's alternative model. My own attempts to parallelize — note the not coincidental alliteration to "paralyze" — code for one of Cray's parallel supercomputers, the T3D, made it all too clear to me that parallel computation suffered from critical problems that have never been solved (does anyone remember C*?).

Nor does there seem to be much prospect in the near term that they will be solved. Roughly, the problem is, as the number of processors increases, the harder it is to allocate facets of the problem to the processors in an efficient manner. In practice, most processors in a massively parallel computer end up sitting idle waiting for others to finish their task. Beyond this load balancing problem, forget about trying to debug parallel code.

So, what's wrong?

First, I'd respond to Lanier's comments with a historical note. I think the idea that von Neumann and others were misled by technological metaphors gets things the wrong way around. It is clear from von Neumann's speculations in the First Draft on EDVAC that he was utilizing the then state of the art computational neurobiology — McColloch and Pitts’ (1943) results on Turing equivalence for computation in the brain — as grounds for the digital design of the electronic computer. In other words, it was theoretical work in neural computation that influenced the technology, not the other way around. While much has been made of the differences between synchronous serial computation and asynchronous neural computation, the really essential point of similarity is the nonlinearity of both neural processing and the switching elements Shannon explored, which laid the foundation for McColloch and Pitt's application of computational theory to the brain.

In fact, I'd suggest that the real limitation of contemporary computation is the incomplete understanding of nonlinear processing in the brain. We still lack the fundamentals of nonlinear processing in brains: we don't know how information is encoded, why neurotransmitter release is so low a probability event, how dendrites compute, whether local volumes of neural tissue compute via diffuse molecules such as nitric oxide, and a host of other fundamental issues. Taking a hint from von Neumann's own reliance on the theoretical neurobiology of the day, these are the fundamental issues that ought to inform an alternative computational theory.

I have my doubts that a better understanding of processing in the brain will lead to Lanier's surface-based model, as temporal codes are fundamental properties of neural computation. In addition, although Lanier dismisses "signals on wires" computation, the brain is mostly wires (axons), whose optimization in terms of their minimization is a likely key to how the brain processes information.

Finally, I missed where exactly consciousness comes into Lanier's discussion. Personally, I think consciousness is vastly overrated (not my own, of course, but it's role in a science of cognition) — no one has really come up with any argument for what difference it makes and the overwhelming majority of information processing in the brain is subconscious.

There's a lot of work to be done getting a foothold into subconscious information processing before consciousness becomes an issue, and it only will when someone comes up with a solid argument for why it makes a difference. So far, no one has made that argument, which lends support to the possibility that consciousness is epiphenomenal and will never play a role in theorizing about cognition and behavior.

STEVEN R. QUARTZ is Director of the Social Cognitive Neuroscience Laboratory at Caltech and co-author (with Terrence Sejnowski) of Liars, Lovers, and Heroes: What the New Brain Science Reveals About How We Become Who We Are.


Lee Smolin

Reading the critics of Jaron Lanier's essay, in which he speculates about a new form of a computer, based on different principles than those that underlie the standard programmable digital computer, I wonder how people might have reacted, shortly after the invention of the wheel, if some ancestor of Jaron had proposed to invent a new form of transportation that was not a wheel. "Not a wheel!" one can hear them snorting. "Why everyone knows that any device to convey goods must depend on some arrangement of wheels. Not only that, the great thinker van N proved that any arrangement of wheels, whether in parallel or in serial, is equivalent to a single larger wheel, in terms of its ability to move goods."

"No," said the clearly frustrated proto-Jaron, "What I have in mind does involve lashing some logs together, but instead of rolling them, my idea is to put them into the river and simply put the goods on top and float them down to the next camp. So no wheels, and no need to abide by the great van N's theorem on wheel capacity."

The answer then must have been, "Well, we've never heard of such a thing, but try it and see if it works." It seems to me that that's what Jaron's critics might be saying to him, instead of arguing that a boat, as a form of transportation, must roll on wheels.

So it seems to me the question being debated can be framed like this: Is a computer something like a wheel? Is there really only one kind of computer, just like there is really only one kind of wheel? One can arrange them in many ways, in series and in parallel, but in the end, once the wheel or the computer has been invented, they will all work the same way. Even millennia later, wheels are wheels, period. Or, is the computer something more general, like a mode of transportation or a musical instrument. There are many different kinds of musical instruments, which produce sound by means of many different principles. Is it possible that there are actually many different kinds of computers, which will accomplish informational tasks for us by as many different principles as musical instruments produce sounds? In that case, is the problem that the critics are beating their drums, while Jaron is trying to blow the first horn?

LEE SMOLIN, a theoretical physicist, is a founding member and research physicist at the Perimeter Institute in Waterloo Canada. He is the author of The Life of The Cosmos and Three Roads to Quantum Gravity.


Charles Simonyi

I am very happy to see a lot of interesting comments in response to Jaron Lanier's paper. My complaint is with the vast range of Jaron's concerns from the practical software engineering of Fred Brooks, to the issues of consciousness. Maybe his point is that looking far enough one can also solve the more immediate practical problems.

My focus is closer to Fred Brooks’ than to Daniel Dennett's and from that perspective I could comment on the MIT Technology Magazine issue on "Extreme Programming" which featured, among others, the technology that my company, Intentional Software Corporation has been promoting. In his reply to the comments, Jaron referred to the ideas presented in the magazine as "mainstream" and "conservative". I wish that were the case—at least for intentional software. But let me illustrate just how radical the intentional idea is by describing how it applies to the Gordian Software problem.

I am amazed how many software discussions center on essentially implementation questions, while no one seems care much about what the Problem to be solved really is. The implicit assumption is that the Problem will be first described only by some mathematical language—assembly, Cobol, Java, graphical programming, design patterns, or even logic-based specifications. This is as if the Problem had not existed before a software implementation. What did people do before, one might ask?

The obvious fact is that before computerization, people used to use their consciousness and intelligence to represent (and maybe even solve after a fashion) the Problem. For example, instead of using computer software, architects or accountants used to make drawings or balance the ledgers "by hand" that is by using their intelligence. So the two demonstrated representations for problems are: human intelligence, or an effectively machine executable software implementation.

Gordian software is a child of this false dichotomy where there is no machine-accessible representation of the problem other than the implementation. For the implementation is manifestly not the Problem, it is complex interweaving of the Problem with information technology: the scale, the platforms, the languages, the standards, the algorithms, the security and privacy concerns, and so on. This interweaving creates a horrible explosion of the size of the description because it includes not just all of the problem and all of the technological principles at play, but every and all instances where the two may interact. So the size of the description is proportional to the size of a product space, not the sum of two problem spaces. This is manifestly expensive but also very destructive to any desired human or mechanical processing of the description—to put it bluntly, programmers act as steganographers, in effect encrypting or making inaccessible the useful information by embedding it in massive amounts of implementation detail.

The radical idea of Intentional Software is to focus attention on the Problem owners—let's call them Subject Matter Experts—and on the interface between them and the programmers who are the implementation experts. We will assist the SME's to express their problem in their notation, in their terms. The result will be "intentional" in that it will represent what they intend to accomplish, even though it will "lack" or rather it will be free of the semantic details that are key to any implementation. We will then ask the programmers to write a generator/transformer from the intentional description to a traditional implementation will all the desired properties—speed, compatibility, standards, and so on. So the Problem will be represented as one factor, and it can be made effective by the application of the generator, the second factor, that represents the implementation aspects of the solution.

The amount of new technology that is required is modest: basically we need a special editor—a sort of super Power Point—that assists the SME's to record and maintain their intentions and also the meta-data—the schemas—about their notations and terms.

The difference in the approach from the programmer's point of view is almost superficial. In the absurd—but not unprecedented–limiting case, where the SME's contribute just the product name, the programmers simply have to embed their contribution—prepared as before—into a simple "generator" framework parameterized by the product name string intention. Nothing is gained by that, and it is a historical curiosity that some problems were solved just by programmers. But we can see how additional useful contributions from the SME's could then successively introduce more variablility into the output of the generator, and create a more effective balance between the amount of contributions from the SME's and from the programmers while maintaining the key invariants:

1. The intentions remain free of implementation semantics—that means SME's do not have to learn programming. Furthermore the intentional description is "compact"—it is as large and complex as the Problem itself, and not combinatorially larger. The compactness in turn promotes the SMEs’ ability to interact with it, to perfect it.

2. Changes made by an SME to the intentional description can result in a new artifact at machine speeds and at essentially machine precision—by the application of the generator and without the participation of a programmer.

3. Changes to generator by the programmer can change aspects of the implementation at the cost that is measured in implementation space and not in problem space or the combinatorial product space of the two as it is the case with the current technique.

It is not difficult to see how other key issues of software engineering would also become more tractable if such factoring could be employed—maintenance, bugs, aspects, reuse, programmer training, or "user programming", could be all re-interpreted in their simpler and purer environments.

It is harder to see how this factoring can be enabled and facilitated by tools, services, or training, and what new problems that are unique to intentional programming might emerge. The good new is that there is more and more attention paid both to the software engineering problem and also to the intentional and other generative schemes as possible solutions. It is also encouraging that in specific areas these ideas have been flourishing for quite a while. Most game programs, just to mention one area, are created using multiple levels of domain-specific encodings and mechanical program generation.

As an aside I note for the Edge audience that DNA is an intentional program—it lacks implementation detail and it is given implementation detail only by the well-known generators, which range from the ribosome through the phenotype to the whole ecosystem. So the DNA does not concern itself with how the organism works, it rather describes how the organism should be built or, really, what the "problem" really is. Because DNA is intentional, its length is short relative to its result—indeed the length of human genome belies its cosmic importance by being shorter than the source codes of many human software artifacts of more modest accomplishments.

Another key feature of the encoding is that it is "easy" to change, that is an important fraction of possible changes are also meaningful changes; this made evolution possible—or rather this is a feature of evolved things. Had the code included implementation detail—that is if it had been more like a "blueprint" as in the popular metaphor or if it had been more like a software program—then it could not have evolved naturally and people hoping for some sign of an intelligent designer would have had their smoking amino acid.

CHARLES SIMONYI, cofounder of Intentional Software Corporation, formerly worked as Director of Application Development and Chief Architect at Microsoft Corporation where in 1981 he started the development of microcomputer application programs and hired and managed teams who developed Microsoft Excel, Multiplan, Word, and other applications.


John Smart

To Dylan Evans:

You made the following statement in your response:

the distinction between serial and parallel processors is trivial, because any parallel machine can be
simulated on a serial machine with only a negligible loss of efficiency.

I found that statement fascinating. I've heard it vaguely before, and it exposes a hole in my understanding and intuition, if true. I was wondering if you could point me to a reference that discusses this further. My training is in biological and systems sciences, with only a few semesters of undergraduate computer science, so I'd appreciate any general overview you might recommend.

I also have two specific questions, which I am hopeful you can address with a sentence or two:

1. I would expect connectionist architectures such as neural networks and their variants to be simulable on serial machines for small numbers of nodes with only a negligible loss of efficiency. But how could that scale up to millions or billions of nodes without requiring inordinate time to run the simulations? Isn't there a combinatorial explosion and processing bottleneck here?

I just can't believe, unless you understand some interpretation of Turing and Von Neumann, et. al. that I've never learned, that there wouldn't be a scale up problem using serial systems to simulate all the possible nuances of the "digital controlled analog" synaptic circuitry in a mammalian brain, with all its microarchitectural uniqueness.

A related and equally important problem, to my mind, involves the timing differences between differentiated circuits operating in parallel. Neurons have learned to encode information in the varying timing and frequency of their pulses. Various models (e.g. Edelman's "reentrant" thalamocortical loops) suggest to me that nets of differentiated neurons would be very likely to have learned to encode a lot of useful information in the differential rates of their computation. Therefore, even if there is only a "negligible" slowdown in the serial simulation of a particular set of neurons, if it were real, it would seem to me to throw away a lot of what may be the most important information that massively parallel systems like the brain have harnessed: how to utilize the stably emergent, embodied, subtly different rates of convergence of pattern recognition among different specialized neural systems.

2. Our brain apparently uses trillions of synaptic connections, each of which has been randomly tuned to slightly different representations of reality bits (as in visual processing), in order to discover, in a process of neural convergence, a number of emergent gestalt perceptions, then aren't we going to need massive self-constructing connectionist capability in order to emulate this in the hardware space?

Teuvo Kohonen (one of the pioneers of Self-Organizing Maps in neural networks) once said something similar to this to me, and he expects his field to take off once we are doing most of our neural net implementation in hardware, not software.

For what it's worth, I am currently entertaining the model, borrowed from developmental biology (including the developing brain) that about 90-95% of complexity in any interesting system is driven by bottom up, chaotically deterministic processes (which must fully explore their phase space and then selectively converge), and about 5-10% involves a critical set of top down controls. These top down controls are tuned into the parameters and boundary conditions of the developing system (as with the special set of developmental genes that guide metazoan elaboration of form). Serial processing in human brains seems to me to be a top down process, one that emerged from a bottom up process of evolutionary exploration, one that is very important but only the tip of the iceberg, so to speak. The limited degree of serial and symbolic processing that our brains can do, versus their massive unconscious "competitions" of protothoughts, seems to me to be a balance we can see in all complex adaptive systems. (Calvin's Cerebral Code provides some early speculations on that, as does Edelman's Neural Darwinism).

I see today's serial programming efforts essentially as elegant prosthetic extensions of top down human symbolic manipulation (the way a hammer is an extension of the hand), but some time after 2020, when we've reached a local limit in shrinking the chips, there will for the first time be a market for multichip architectures (e.g., evolvable hardware can be commercially viable at that point), and it is at that point that I expect to see commercially successful biologically inspired bottom up driven architectures. IIt is at that point that I expect technology to transition from being primarily an extension and amplifier of human drives to becoming a self-organizing, and increasingly autonomous computational substrate in its own right.

Neural nets controlled by a hardware description language that had the capacity to tune up the way it harnessed randomness in network construction, and to pass on those finely tuned parameters, in the same way that DNA does, would seem to me to be a minimum criterion for applying the phrase "biologically inspired." But this seems to be something we are still decades away from implementing (beyond toy research models). I would see such systems, once they have millions of nodes and have matured a bit, as potential candidates for developing higher level intelligence, even though the humans tending them at that time may still have only a limited appreciation of the architectures needed for such intelligence.

This may be more than you want to address, but any responses you (or any of the other thinkers on this thread) might share would be much appreciated, as I'm in a bit of cognitive dissonance now given your interesting statement and Dan Dennett's implicit support of it (below). Thanks again for any help you may offer in clarifying your statements.

JOHN SMART is a developmental systems theorist who studies science and technological culture with an emphasis on accelerating change, computational autonomy and a topic known in futurist circles as the technological singularity.


Daniel C. Dennett

Dear Mr. Smart,

I think you are right about the relation of speed and efficiciency to parallel processing (see, e.g., my somewhat dated essay "Fast Thinking" in The Intentional Stance, 1987) but I took Jaron to be taking himself to be proposing something much more radical. Your idea that timing differences by themselves could play a large informational role is certainly plausible, for the reasons you state and others. And if a serial simulation of such a parallel system did throw away all that information, it would be crippled. I take it that your idea is that the timing differences would start out as just being intrinsic to the specific hardware that happened to be in place, and hence not informative at the outset, but that with opportunistic tuning, of the sort that an evolutionary algorithm could achieve, such a parallel system could exploit these features of its own hardware.

So I guess I agree that Dylan overstated the case, though not as much as Jaron did. If Jaron had put it the way you do, and left off the portentous badmouthing of our heroes, he would have had a better reception, from me at least.


Dylan Evans

To John Smart:

Thanks for your comments and questions. A general overview of computational complexity theory, including the question of serial vs. parallel computing, can be found in Algorithmics: The Spirit of Computing, by David Harel (Addison-Wesley, 3rd edition 2003).

Let me take your questions sequentially:

1. You are right to think that there would be a scale-up problem when using serial systems to simulate a mammalian brain in all its "microarchitectural uniqueness", but this does not contradict my point about simulating anyparallel machine on a serial machine "with only negligible loss of efficiency", for two reasons:

(a) By "negligible", I meant only a polynomial time difference. This is "negligible' in terms of computational complexity theory but not always negligible in the context of a particular technological application at a particular time. An engineer wanting to simulate a mammalian brain today might use a massively parallel machine such as a Beowulf cluster, because for him or her the difference between a year and two days is very significant. But in ten year's time, advances in computing speed might reduce this difference to, say, that between 3 days and 20 minutes.

(b) More importantly, I think your question is premised on a fundamental misunderstanding of classical AI. In classical AI, computers are used not to model the brain in all its molecular glory, but rather to model the mind—to understand the software, in other words, rather than the hardware. By software, I mean algorithms. And it is here that the research into sequential and parallel processing really becomes relevant. For while a parallel machine might work very differently to a serial machine as far as the hardware is concerned (and will therefore employ algorithms specially tailored for parallel architectures), there are no problems that a parallel machine can solve which a serial machine cannot. So we can run equivalent algorithms (equivalent in the sense that they solve the same algorithmic problem) on brains and serial computers. Brains are parallel machines made of very slow components (neurons), while serial computers are sequential machines made of very fast components (silicon circuits). So now the time differences you mention are not so clear. Some people (eg. Nicolelis) have already used serial computers to compute the algorithms running on the brain faster than the brain itself does.

2. Neural networks are, as far as I'm concerned, a huge red-herring. They may make good models of the brain, but they tell us absolutely nothing about the mind. In other words, they are a useful tool for neuroscientists, but not for cognitive scientists or those in AI, who wish to discover what algorithms the brain is running, not the architecture on which it runs them. Besides, all neural networks at the moment are simulations that are written in software of an essentially serial nature which runs on serial processors. Every neural network can, in principle, be reduced to either (a) an algebraic equation or (b) a set of coupled differential equations. From the point of view of someone who wants to understand how the mind works, it is much more important to understand what these equations are, and this may be done more easily and transparently by coding these equations directly than by dressing them up in a Gordian neural network.

You are right that today's serial processing efforts are essentially "elegant prosthetic extensions of top-down human symbolic manipulation", but this doesn't mean that they are not the best way to understand the rest of the mind. In fact, it is precisely because they are extensions of our powers for symbolic manipulation that they constitute such a good way to understand the rest of the mind, rather than merely to simulate it. This is an important point: if you built a neural network that was a perfect model of the brain, in all its detail, that would not tell you very much about the mind. On the other hand, given a representation in a language like C++ of the algorithms running in the brain, you would have a complete understanding of the mind, and you could trace every subroutine down to the last loop. It would be perfectly transparent, in the sense that a good mathematical proof is transparent.

So, I can see why neural networks would be of great relevance to your research in developmental biology, but I hope you can also see why they don't actually help very much if one's aim to discover the algorithms that constitute human intelligence.

To finish, I enjoyed your speculations about forthcoming developments in computer technology. I hope you are right! But creating intelligent artefacts will not necessarily tell us much about the human mind, especially if the artefacts are allowed to evolve in such a way as to become as opaque as all the other examples of evolution we see around us!


Back to WHY GORDIAN SOFTWARE HAS CONVINCED ME TO BELIEVE IN THE REALITY OF CATS AND APPLES: A Talk with Jaron Lanier

|Top|