249 — July 7, 2008
THE REALITY CLUB
HARVARD BUSINESS REVIEW
WALL STREET JOURNAL
ON CHRIS ANDERSON'S THE END OF THEORY [6.30.08]
George Dyson, Kevin Kelly, Stewart Brand, W. Daniel Hillis, Sean Carroll, Jaron Lanier, Joseph Traub, John Horgan, Bruce Sterling, Douglas Rushkoff, Oliver Morton, Daniel Everett, Gloria Origgi, Lee Smolin, Joel Garreau
GEORGE DYSON [6.29.08]
For a long time we have been stuck on the idea that the brain somehow contains a "model" of reality, and that Artificial Intelligence will be achieved when we figure out how to model that model within a machine. What's a model? We presume two requirements: a) Something that works; and b) Something we understand. You can have (a) without (b). Our large, distributed, petabyte-scale creations are starting to grasp reality in ways that work just fine but that we don't necessarily understand.
Just as we may eventually take the brain apart, neuron by neuron, and never find the model, we may discover that true AI came into existence without anyone ever developing a coherent model of reality or an unambiguous theory of intelligence. Reality, with all its ambiguities, does the job just fine. It may be that our true destiny as a species is to build an intelligence that proves highly successful, whether we understand how it works or not.
The massively-distributed collective associative memory that constitutes the "Overmind" (or Kevin's OneComputer) is already forming associations, recognizing patterns, and making predictions—though this does not mean thinking the way we do, or on any scale that we can comprehend.
The sudden flood of large data sets and the opening of entirely new scientific territory promises a return to the excitement at the birth of (modern) Science in the 17th century, when, as Newton, Boyle, Hooke, Petty, and the rest of them saw it, it was "the Business of Natural Philosophy" to find things out. What Chris Anderson is hinting at is that Science will increasingly belong to a new generation of Natural Philosophers who are not only reading Nature directly, but are beginning to read the Overmind.
Will this make the scientific method obsolete? No. We are still too close to the beginning of the scientific method to say anything about its end. As Sir Robert Southwell wrote to William Petty, on 28 September 1687, shortly before being elected president of the Royal Society, "Intuition of truth may not Relish soe much as Truth that is hunted downe. "
KEVIN KELLY [6.29.08]
There's a dawning sense that extremely large databases of information, starting in the petabyte level, could change how we learn things. The traditional way of doing science entails constructing a hypothesis to match observed data or to solicit new data. Here's a bunch of observations; what theory explains the data sufficiently so that we can predict the next observation?
It may turn out that tremendously large volumes of data are sufficient to skip the theory part in order to make a predicted observation. Google was one of the first to notice this. For instance, take Google's spell checker. When you misspell a word when googling, Google suggests the proper spelling. How does it know this? How does it predict the correctly spelled word? It is not because it has a theory of good spelling, or has mastered spelling rules. In fact Google knows nothing about spelling rules at all.
Instead Google operates a very large dataset of observations which show that for any given spelling of a word, x number of people say "yes" when asked if they meant to spell word "y. " Google's spelling engine consists entirely of these datapoints, rather than any notion of what correct English spelling is. That is why the same system can correct spelling in any language.
In fact, Google uses the same philosophy of learning via massive data for their translation programs. They can translate from English to French, or German to Chinese by matching up huge datasets of humanly translated material. For instance, Google trained their French/English translation engine by feeding it Canadian documents which are often released in both English and French versions. The Googlers have no theory of language, especially of French, no AI translator. Instead they have zillions of datapoints which in aggregate link "this to that" from one language to another.
Once you have such a translation system tweaked, it can translate from any language to another. And the translation is pretty good. Not expert level, but enough to give you the gist. You can take a Chinese web page and at least get a sense of what it means in English. Yet, as Peter Norvig, head of research at Google, once boasted to me, "Not one person who worked on the Chinese translator spoke Chinese. " There was no theory of Chinese, no understanding. Just data. (If anyone ever wanted a disproof of Searle's riddle of the Chinese Room, here it is. )
If you can learn how to spell without knowing anything about the rules or grammar of spelling, and if you can learn how to translate languages without having any theory or concepts about grammar of the languages you are translating, then what else can you learn without having a theory?
Chris Anderson is exploring the idea that perhaps you could do science without having theories.
There may be something to this observation. Many sciences such as astronomy, physics, genomics, linguistics, and geology are generating extremely huge datasets and constant streams of data in the petabyte level today. They'll be in the exabyte level in a decade. Using old fashioned "machine learning, " computers can extract patterns in this ocean of data that no human could ever possibly detect. These patterns are correlations. They may or may not be causative, but we can learn new things. Therefore they accomplish what science does, although not in the traditional manner.
What Anderson is suggesting is that sometimes enough correlations are sufficient. There is a good parallel in health. A lot of doctoring works on the correlative approach. The doctor may not ever find the actual cause of an ailment, or understand it if he/she did, but he/she can correctly predict the course and treat the symptom. But is this really science? You can get things done, but if you don't have a model, is it something others can build on?
We don't know yet. The technical term for this approach in science is Data Intensive Scalable Computation (DISC). Other terms are "Grid Datafarm Architecture" or "Petascale Data Intensive Computing. " The emphasis in these techniques is the data-intensive nature of computation, rather than on the computing cluster itself. The online industry calls this approach of investigation a type of "analytics. " Cloud computing companies like Google, IBM, and Yahoo, and some universities have been holding workshops on the topic. In essence these pioneers are trying to exploit cloud computing, or the OneMachine, for large-scale science. The current tools include massively parallel software platforms like MapReduce and Hadoop (See: "A Cloudbook For The Cloud"), cheap storage, and gigantic clusters of data centers. So far, very few scientists outside of genomics are employing these new tools. The intent of the NSF's Cluster Exploratory program is to match scientists owning large databased-driven observations with computer scientists who have access and expertise with cluster/cloud computing.
My guess is that this emerging method will be one additional tool in the evolution of the scientific method. It will not replace any current methods (sorry, no end of science!) but will compliment established theory-driven science. Let's call this data intensive approach to problem solving Correlative Analytics. I think Chris Anderson squanders a unique opportunity by titling his thesis "The End of Theory" because this is a negation, the absence of something. Rather it is the beginning of something, and this is when you have a chance to accelerate that birth by giving it a positive name. A non-negative name will also help clarify the thesis. I am suggesting Correlative Analytics rather than No Theory because I am not entirely sure that these correlative systems are model-free. I think there is an emergent, unconscious, implicit model embedded in the system that generates answers. If none of the English speakers working on Google's Chinese Room have a theory of Chinese, we can still think of the Room as having a theory. The model may be beyond the perception and understanding of the creators of the system, and since it works it is not worth trying to uncover it. But it may still be there. It just operates at a level we don't have access to.
But the models' invisibility doesn't matter because they work. It is
not the end of theories, but the end of theories we understand. George Dyson
says this much better in his reponse to Chris Anderson's article (see above).
So far Correlative Analytics, or the Google Way of Science, has primarily been deployed in sociological realms, like language translation, or marketing. That's where the zillionic data has been. All those zillions of data points generated by our collective life online. But as more of our observations and measurements of nature are captured 24/7, in real time, in increasing variety of sensors and probes, science too will enter the field of zillionics and be easily processed by the new tools of Correlative Analytics. In this part of science, we may get answers that work, but which we don't understand. Is this partial understanding? Or a different kind of understanding?
Perhaps understanding and answers are overrated. "The problem with computers, " Pablo Picasso is rumored to have said, "is that they only give you answers. " These huge data-driven correlative systems will give us lots of answers—good answers—but that is all they will give us. That's what the OneComputer does—gives us good answers. In the coming world of cloud computing perfectly good answers will become a commodity. The real value of the rest of science then becomes asking good questions.
STEWART BRAND [6.29.08]
Digital humanity apparently crossed from one watershed to another over the last few years. Now we are noticing. Noticing usually helps. We'll converge on one or two names for the new watershed and watch what induction tells us about how it works and what it's good for.
W. DANIEL HILLIS [6.30.08]
I am a big fan of Google, and I love looking for mathematical patterns in data, but Chris Anderson's essay "The End of Theory: Will the Data Deluge Makes the Scientific Method Obsolete?" sets up a false distinction. He claims that using a large collection of data to "view data mathematically first and establish a context for it later" is somehow different from "the way science has worked for hundreds of years. " I disagree.
Science always begins by looking for patterns in the data, and the first simple models are always just extrapolations of what we have seen before. Astronomers were able to accurately predict the motions of planets long before Newton's theories. They did this by gathering lots of data and looking for mathematical patterns.
The "new" method that Chris Anderson describes has always been the starting point: gather lots of data, and assume it is representative of other situations. This works well as long as we do not try to extrapolate too far from what has been observed. It is a very simple kind of model, a model that says, "what we will see next will be very much what we have seen so far". This is usually a good guess.
Existing data always gives us our first hypothesis. Humans and other animals are probably hard-wired for that kind of extrapolation. Mathematical tools like differential equations and statistics were developed to help us do a better job of it. These tools of science have been used for centuries and computers have let us apply them to larger data sets. They have also allowed us to collect more data to extrapolate. The data-based methods that we apply to petabytes are the methods that we have always tried first.
The experimental method (hypothesize, model, test) is what allows science to get beyond what can be extrapolated from existing data. Hypotheses are most interesting when they predict something that is different from what we have seen so far. For instance, Newton's model could predict the paths of undiscovered planets, whereas the old-fashioned data-based models could not. Einstein's model, in turn, predicted measurements that would have surprised Newton. Models are interesting precisely because they can take us beyond the data.
Chris Anderson says that "this approach to science—hypothesize, model, test—is becoming obsolete". No doubt the statement is intended to be provocative, but I do not see even a little bit of truth in it. I share his enthusiasm for the possibilities created by petabyte datasets and parallel computing, but I do not see why large amounts of data will undermine the scientific method. We will begin, as always, by looking for simple patterns in what we have observed and use that to hypothesize what is true elsewhere. Where our extrapolations work, we will believe in them, and when they do not, we will make new models and test their consequences. We will extrapolate from the data first and then establish a context later. This is the way science has worked for hundreds of years.
Chris Anderson is correct in his intuition that something is different about these new large databases, but he has misidentified what it is. What is interesting is that for the first time we have significant quantitative data about the variation of individuals: their behavior, their interaction, even their genes. These huge new databases give us a measure of the richness of the human condition. We can now look at ourselves with the tools we developed to study the stars.
SEAN CARROLL [6.30.08]
What Good is a Theory?
Early in the 17th century, Johannes Kepler proposed his Three Laws of Planetary Motion: planets move in ellipses, they sweep out equal areas in equal times, and their periods are proportional to the three-halves power of the semi-major axis of the ellipse. This was a major advance in the astronomical state of the art, uncovering a set of simple relations in the voluminous data on planetary motions that had been collected by his mentor Tycho Brahe.
Later in that same century, Sir Isaac Newton proposed his theory of mechanics, including both his Laws of Motion and the Law of Universal Gravitation (the force due to gravity falls as the inverse square of the distance). Within Newton's system, one could derive Kepler's laws—rather than simply positing them—and much more besides. This was generally considered to be a significant step forward. Not only did we have rules of much wider-ranging applicability than Kepler's original relations, but we could sensibly claim to understand what was going on. Understanding is a good thing, and in some sense is the primary goal of science.
Chris Anderson seems to want to undo that. He starts with a truly important and exciting development—giant new petascale datasets that resist ordinary modes of analysis, but which we can use to uncover heretofore unexpected patterns lurking within torrents of information—and draws a dramatically unsupported conclusion—that the age of theory is over. He imagines a world in which scientists sift through giant piles of numbers, looking for cool things, and don't bother trying to understand what it all means in terms of simple underlying principles.
Well, we can do that. But, as Richard Nixon liked to say, it would be wrong. Sometimes it will be hard, or impossible, to discover simple models explaining huge collections of messy data taken from noisy, nonlinear phenomenon. But it doesn't mean we shouldn't try. Hypotheses aren't simply useful tools in some potentially-outmoded vision of science; they are the whole point. Theory is understanding, and understanding our world is what science is all about.
JARON LANIER [6.30.08]
The point of a scientific theory is not that an angel will appreciate it. It's purpose is human comprehension. Science without a quest for theories means science without humans.
Scientists are universally thrilled with the new, big connected computing resources. I am aware of no dissent on that point.
The only idea in Chris Anderson's piece that falls outside that happy zone of consensus is that we shouldn't want to understand our own work when we use the new resources.
He finds it exciting that we could do something that works without understanding why. This is precisely what wouldn't be exciting. Some folk remedies work and we don't know why. Science is about understanding. Insight is more exciting than folk remedies.
Anderson pretends it's useless to be a human. Machines should now do the thinking, and be the heroes of discovery.
I say "pretends" because I don't believe he is being sincere. I think it's a ploy to get a certain kind of attention. Hearing anti-human rhetoric has the same sting as a movie plot about a serial killer. Some deep, moral part of each of us is so offended that we can't turn off our attention.
JOSEPH TRAUB [6.30.08]
I agree with Danny Hillis that large amounts of data will not undermine the scientific method. Indeed, scientific laws encode a hugh amount of data. Think Maxwell's equations or Kepler's laws for example. Why does Chris Anderson think that with still more data, laws (what he calls theory) will become less important?
JOHN HORGAN [7.2.08]
My first reaction to Chris Anderson's essay was, not another End-of-Something-Big prophecy. Anderson also recycles rhetoric from chaos, complexity, AI. Ever more powerful computers are going to find hidden patterns in bigger and bigger data sets and are going to revolutionize science! You don't need a computer to plot the boom-bust cycle of these claims. But the notion that computers will obviate theory and even understanding provokes a few thoughts:
Lots of groups already engage in problem-solving without understanding. Economists employ purely numerical methods for predicting markets, and mathematicians construct "computer proofs" based on massive calculations rather than comprehensible logic. This is less science than engineering. Engineers aren't looking for Truth. They're looking for a solution to a problem. Whatever works, works.
You could argue that since the advent of quantum mechanics, modern physics has also delivered prediction without understanding. Quantum theory is phenomenally successful, almost too much so for its own good, at predicting the results of accelerator experiments. But as Niels Bohr used to say, anyone who says he understands quantum theory doesn't know the first thing about it.
But I doubt that number-crunching computers will ever entirely replace human experts, as Anderson implies. The physicists at the Large Hadron Collider have to write programs to help their computers sift through the deluge of data for potentially significant events. IBM's massive number-crunching allowed Deep Blue to beat Gary Kasparov. But human chess experts also incorporated their knowledge into Deep Blue's software to make it more efficient at finding optimal moves. I bet Google's language translation programs incorporate a lot of human expertise.
Chris Anderson seems to think computers will reduce science to pure induction, predicting the future based on the past. This method of course can't predict black swans, anomalous, truly novel events. Theory-laden human experts can't foresee black swans either, but for the foreseeable future, human experts will know how to handle black swans more adeptly when they appear.
BRUCE STERLING [7.2.08]
Science Fiction Swiftly Outmoded by "Petabyte Fiction"
I'm as impressed by the prefixes "peta" and "exa" as the next guy. I'm also inclined to think that search engines are a bigger, better deal that Artificial Intelligence (even if Artificial Intelligence had ever managed to exist outside science fiction). I also love the idea of large, cloudy, yet deep relationships between seemingly unrelated phenomena—in literature, we call those gizmos "metaphors. " They're great!
Yet I do have to wonder why—after Google promptly demolished advertising—Chris Anderson wants Google to pick on scientific theory. Advertising is nothing like scientific theory. Advertising has always been complete witch-doctor hokum. After blowing down the house of straw, Google might want to work its way up to the bricks (that's a metaphor).
Surely there are other low-hanging fruit that petabytes could fruitfully harvest before aspiring to the remote, frail, towering limbs of science. (Another metaphor—I'm rolling here. )
For instance: political ideology. Everyone knows that ideology is closely akin to advertising. So why don't we have zillionics establish our political beliefs, based on some large-scale, statistically verifiable associations with other phenomena, like, say, our skin color or the place of our birth?
The practice of law. Why argue cases logically, attempting to determine the facts, guilt or innocence? Just drop the entire legal load of all known casework into the petabyte hopper, and let algorithms sift out the results of the trial. Then we can "hang all the lawyers, " as Shakespeare said. (Not a metaphor. )
Love and marriage. I can't understand why people still insist on marrying childhood playmates when a swift petabyte search of billions of potential mates worldwide is demonstrably cheaper and more effective.
Investment. Quanting the stock market has got to be job one for petabyte tech. No human being knows how the market moves—it's all "triple witching hour, " it's mere, low, dirty superstition. Yet surely petabyte owners can mechanically out-guess the (only apparent) chaos of the markets, becoming ultra-super-moguls. Then they simply buy all of science and do whatever they like with it. The skeptics won't be laughing then.
Graphic design. This one's dead easy. You just compare the entire set of pixels on a projected page of Wired to the entire set of all pixels of all paper pages ever scanned by Google. Set the creatometer on stun and generate the ultimate graphic image. Oh, and the same goes for all that music digitized in your iPod, only much more so. Why mix songs on "random" when you can tear songs down to raw wavelengths in an awesome petabyte mash-up? Then you can patent that instead of just copyrighting it.
Finally—and I'm getting a bit meta here—the ultimate issue of Edge. Rather than painfully restricting Edge responses to accredited scientists and their culturati hangers-on, the Third Culture will conquer the Earth when all Internet commentaries of any kind ever are analyzed for potentially Edgy responses, much as Google can translate Estonian to Klingon in one single step!
The result is the ultimate scientific-critical cultural thesis! It isn't a "Grand Unified Theory"—(theory is so over, because you can never print Google's databanks on a T-shirt). Still—bear with me here—metaphorically, I visualize this Petabyte Edge as a kind of infinite, Cantorian, post-human intellectual debate, a self-generating cyberculture that delicately bites its own dragon tail like a Chinese Ouroboros, chewing the nature of the distant real with a poetic crystalline clarity, while rotating and precessing on its own scaly axis, inside an Internet cloud the size of California.
DOUGLAS RUSHKOFF [7.2.08]
First off, I don't think Google has been proven "right. " Just effective for the moment. Once advertising itself is revealed to be a temporary business model, then Google's ability to correctly exploit the trajectory of a declining industry will itself be called into question. Without greater context, Google's success is really just a tactic. It's not an extension of human agency (or even corporate agency) but strategic stab based on the logic of a moment. It is not a guided effort, but a passive response. Does it work? For the moment. Does it lead? Not at all.
Likewise, to determine human choice or make policy or derive science from the cloud denies all of these fields the presumption of meaning.
I watched during the 2004 election as market research firms crunched data in this way for the Kerry and Bush campaigns. They would use information unrelated to politics to identify households more likely to contain "swing" voters. The predictive modeling would employ data points such as whether the voters owned a dog or cat, a two-door or four-door car, how far they traveled to work, how much they owed on their mortgage, to determine what kind of voters were inside. These techniques had no logic to them. Logic was seen as a distraction. All that mattered was the correlations, as determined by computers poring over data.
If it turned out that cat-owners with two door cars were more likely to vote a certain way or favor a certain issue, then pollsters could instruct their canvassers which telephone call to make to whom. Kids with dvd players containing ads customized for certain households would show up on the doorsteps of homes, play the computer-assembled piece, leave a flyer, and head to the next one.
Something about that process made me cynical about the whole emerging field of bottom-up, anti-taxonomy.
I'm all for a good "folksonomy, " such as when kids tag their favorite videos or blog posts. It's how we know which YouTube clip to watch; we do a search and then look for the hit with the most views. But the numbers most certainly do not speak for themselves. By forgetting taxonomy, ontology, and psychology, we forget why we're there in the first place. Maybe the video consumer can forget those disciplines, but what about the video maker?
When I read Anderson's extremely astute arguments about the direction of science, I find myself concerned that science could very well take the same course as politics or business. The techniques of mindless petabyte churn favor industry over consideration, consumption over creation, and—dare I say it—mindless fascism over thoughtful self-government. They are compatible with the ethics-agnostic agendas of corporations much more than they are the more intentional applied science of a community or civilization.
For while agnostic themselves, these techniques are not without bias. While their bias may be less obvious than that of human scientists trained at elite institutions, their bias is nonetheless implicit in the apparently but falsely post-mechanistic and absolutely open approach to data and its implications. It is no more truly open than open markets, and ultimately biased in their favor. Just because we remove the limits and biases of human narrativity from science, does not mean other biases don't rush in to fill the vacuum.
OLIVER MORTON [7.2.08]
Chris Anderson's provocations spur much thought—I'll limit it to two specifics and two generalities. The first specific is that Anderson mischaracterises particle physics. The problem with particle physics is not data poverty—it is theoretical affluence. The Tevatron, and LEP before it, have produced amounts of data vast for their times—data is in rich supply. The problem is that the standard model explains it all. The drive beyond the standard model is not a reflection of data poverty, but of theory feeding on theory because the data are so well served.
That's not to say there's not a Googlesque angle to take here—there's a team looking at Fermilab data in what I understand to be an effectively "theory agnostic" way (see "Particle physicists hunt for the unexpected" by my Nature colleague Sarah Tomlin)—but it's not a mainstream thing. (And a minor add: a theory such as Newton's, which allows practitioners to predict with accuracy the positions of small fast-flying lumps of rock decades into the future in a solar system 10 larger than the rocks in question may be incomplete, but "crude" it's not. )
The second mischaracterisation is of biology. To suggest that seeing the phenotype as an interplay of genome and environment is in some way new knowledge or theoretically troubling just isn't so. But that's really all that talk of epigenetics and gene protein interactions amounts to. It's really unclear to me in what serious sense biology is "further" form a model today than it was 50 years ago. There are now models of biology that explain more of it than was explicable then, and no model for it all.
As to the general points, I don't think Feyerabend's far-from-normative account of the scientific method—"anything goes"—is the last word on the subject. But it is closer to the truth than saying that science always moves forward by models, or by any other single strategy. Science as a process of interested discovery is more than the tools it uses at any given time or disciplinary place.
And I guess my other point is "petabytes—phwaah". Sure, a petabyte is a big thing—but the number of ways one can ask questions far bigger. I'm no mathematician, and will happily take correction on this, but as I see it one way of understanding a kilobit is as a resource that can be exhausted—or maybe a space that can be collapsed—with 10 yes or no questions: that's what 2  is. For a kilobyte raise the number to 13. For a petabyte raise it to 53. Now in many cases 53 is a lot of questions. But in networks of thousands of genes, really not so much.
For understanding biology, you need to think much bigger. It's possible I described the beginnings of the way to go in "A Machine With a Mind of Its Own", an article I wrote for Wired on the robot scientist at the university of Aberystwyth, and I was pleased recently to hear that that program has started making genuine non-trivial discoveries. But maybe to do real justice to such things you will need millions or billions of experiments chosen by such algorithms—data generating data, rather than data generating knowledge; the sort of future portrayed in Vernor Vinge's Rainbows End, with its unspeakably large automated subterranean labs in San Diego.
PS Anyone who doesn't appreciate the irony in John Horgan's "not another End-of-Something-Big prophecy" should.
DANIEL EVERETT [7.3.08]
Chris Anderson's essay makes me wonder about linguistics in the age of Petabytes. In the early days of linguistic theory in the United States, linguists were, as all scientists, concerned with the discovery of regularities. Anthropologist Ruth Benedict first called regularities in human ways of giving meaning to the world "patterns in culture". Later, Edward Sapir, Kenneth Pike, and others looked for patterns in language, especially in the American Indian languages that became the focus of American linguistics and so differentiated it from inchoate linguistic studies by European scholars. Having just finished a field research guide, my own pedagogical emphasis for new field researchers is largely the same as the earliest studies of the indigenous languages of the Americas—enter a community that speaks an unstudied language and follow standard inductive procedures for finding regularities and patterns. Once the patterns have been discovered, write them up as rules, note the exceptions, and there you have it—a grammar.
But there are two ways in which linguists are becoming dissatisfied with this methodology as a result of issues that connect with Chris Anderson's thesis. First, linguists have begun to question the relevance of distinguishing rules from lists. Second, they have begun to ask whether in fact the child proceeds like a little linguist in learning their language with induction and deduction procedures built in to them genetically, or whether in fact child language learning takes place very differently than the way linguists study new languages in the field.
The difference between rules and lists, intentional vs. extensional sets, is the confrontation of the rule of law against lawlessness. As humans we are motivated by our evolution to categorize. We are deeply dissatisfied with accounts of data that look more like lists and "mere statistics" than generalizations based upon the recognition of law-like behavior. And yet as many have begun to point out, some of the most interesting facts about languages, especially the crucial facts that distinguish one language from another, often are lists, rather than rules (or schemas). People have to learn lists in any language. Since they have to do this, is there any reason to propose a second type of learning or acquisition in the form of rules, whether these are proposed to be genetically motivated or not?
More intriguingly, do children acquire language based on a genetically limited set of hypotheses or do they treat language like the internet and function as statistical calculators, little "Googlers"? Connectionist psychologists at Carnegie Mellon, Stanford, and other universities have urged related hypotheses on us for years, though linguists have been slow to embrace them.
Linguistics has a great deal of work over the coming years to resituate itself in the age of Petabytes. Statistical generalizations over large amounts of data may be more useful in some ways, at least as we use such as parallel tools, than the armchair reflection on small amounts of data that characterizes earlier models in the human sciences. It very well may be, in fact to many of us it is looking more likely, that previous models based mainly on induction or genes are unable to explain what it is that we most want to explain—how children learn languages and how languages come to differ in interesting ways while sharing deep similarities.
GLORIA ORIGGI [7.3.08]
I agree with Daniel Hillis that Chris Anderson's point, although provocative and timely, is not exactly breakthrough news.
Science has always taken advantage of correlations in order to gain predictive power. Social science more than other sciences: we have few robust causal mechanisms that explain why people behave in such or such a way, or why wars break out, but a lot of robust correlations—for which we lack a rationale—that it is better to take into account if we want to gain insight on a phenomenon.
If the increase of child mortality rates happened to be correlated with the fall of the Soviet Empire (as it has been shown) this is indeed relevant information, even if we lack a causal explanation for it. Then, we look for a possible causal mechanism that sustains this correlation. Good social science finds causal mechanisms that are not completely ad hoc and sustain generalizations in other cases. Bad social science sticks to interpretations that often just confirm the ideological biases of the scientist.
Science depicts, predicts and explains the world: correlations may help prediction, they may also depict the world in a new way, as an entangled array of petabytes, but they do not explain anything if they aren't sustained by a causal mechanism. The explanatory function of science, that is, answering the "Why" questions, may be just a small ingredient of the whole enterprise: and indeed, I totally agree with Anderson that the techniques and methods of data gathering may be completely transformed by the density of information available and the existence of statistical algorithms that filter this information with a tremendous computing capacity.
So, no nostalgia for the good old methods if the new techniques of data gathering are more efficient to predict events. And no nostalgia for the "bad" models if the new techniques are good enough to give us insight (take AI vs. search engines, for example). So, let's think about the Petabyte era as an era in which the "context of discovery", to use the old refrain of philosophy of science, is hugely mechanized by the algorithmic treatment of enormous amounts of data, whereas the "context of justification" still pertains to the human ambition of making sense of the world around us.
This leaves room for the "Why"-questions, that is, why are some of the statistical correlations extracted by the algorithms so damn good? We know that they are good because we have the intuition that they work, they give us the correct answer, but this "reflective equilibrium" between Google ranked answers to our queries and our intuition that the ranking is satisfying is still in need of explanation. In the case of PageRank, it seems to me that the algorithm incorporates a model of the Web as a structured social network in which each link from a node to another one is interpreted as a "vote" from that node to the other. This sounds to me as "theory", as a method of extraction of information that, even if it is realized by machines, is realized on the basis of a conceptualization of reality that aims at getting it right.
A new science may emerge in the Petabyte era, that is, a science that tries to answer the question of how the processes of collective intelligence made possible by the new, enormous amount of data that can be easily combined by powerful algorithms are sound. It may be a totally new, "softer" science, uninhibited at last by the burden of the rigor of "quantitative methods", that make scientific papers so boring to read, that leaves to algorithms this burden and lets the minds free to move around the data in the most creative way. Science may become a cheaper game from the point of view of the investment for discovering new facts: but, as a philosopher, I do not think that cheap intellectual games are less challenging or less worth playing.
LEE SMOLIN [7.3.08]
To see what to think about Anderson's hypothesis that computers storing and processing vast amounts of data will replace the need to formulate hypotheses and theories one can look at whether it has any relevance to how supercomputers are actually being used in contemporary physics.
One example that comes to mind is gravitational wave astronomy, where a large signal to noise ratio makes it impossible to simply observe gravitational waves from the outputs of the detectors. Instead, the vast data streams created by the LIGO, VIRGO and other gravitational wave antennas are scanned by computers against templates for waveforms created by theorists modeling possible sources. These sources, such as inspiraling and merging of pairs of black holes and neutron stars, require themselves computationally intensive simulation on supercomputers to produce the needed templates.
What has been the experience after several decades of this work? While no gravitational waves have so far been identified, the detectors are up and running, as are the programs that generate the templates for the waveforms from the supercomputer simulations of sources. To reach this stage has required large amounts of computation, but that has at every stage been guided by theoretical knowledge and analytic approximations. The key issues that arose were resolved by theorists who succeeded in understanding what was going wrong and right in their simulations because they were able to formulate hypotheses and test them against analytical calculations.
While I don't work in this field, it has been clear to me over the decades I have been watching its development that progress was made by good physicists doing what good physicists always do, which is build up intuitive stories and pictures in their minds, which lead them to testable hypotheses. The fact that the hypotheses were about what was happening in their computer simulations, rather than about data coming from observations, did not change the fact that the same kinds of creativity and intuitive thinking were at work as is traditional in non-computational science.
There is a similar story in cosmology, where computer simulations of structure formation are part of an arsenal of tools, some computational, some analytic, some intuitive, which are always being challenged by and checked against each other. And there is a similar story in numerical studies of hadronic physics, where there is an interplay of results and ideas between supercomputer simulations and analytic approximations. There also, the key obstacles which arose had to do with issues of physical principle, such as how certain symmetries in the theory are broken in the numerical models. It has taken a great deal of creative, intuitive physical thinking over 30 years to overcome these obstacles, leading recently to better agreement between theory and experiment.
As a result of watching the development of these and other numerically intensive fields, it is clear to me that while numerical simulation and computation are welcome tools, they are helpful only when they are used by good scientists to enhance their powers of creative reasoning. One rarely succeeds by "throwing a problem onto a computer", instead it takes years and even decades of careful development and tuning of a simulation to get it to the point where it yields useful output, and in every case where it has done so it was because of sustained, creative theoretical work of the kind that has been traditionally at the heart of scientific progress.
JOEL GARREAU [7.6.08]
Maybe things are different in physics and biology. But in my experience studying culture, values and society, data lags reality by definition—they are a snapshot of the past. And when human reality does not conveniently line up with established ways of thinking, the data can lag by years, if not decades.
Data are an artifact of selection, which means they reflect an underlying hypothesis or they wouldn't have been collected. For example, in my work I discovered a frightening lack of timely data to "prove" either my hypothesis that North America was behaving as if it were nine separate civilizations or economies that rarely were bounded by political jurisdictions of nation, state, or county. It was equally problematic coming up with the data to prove that places like Silicon Valley were becoming the modern iteration of "city" even when the millions of square feet of big buildings were right before your eyes. It wasn't until those "nine nations" or "edge city" models began to be seen as useful by others that people started to go through the very great deal of trouble to verify them by collecting data in a fashion that ignored all previous boundaries. Life is not obligated to follow data and it frequently does not.
Now come thinkers producing hypotheses that one can map social and cultural change onto Moore's Law. It will be interesting to see when or if the data shows up to support their predictions. Ray Kurzweil and the Singularitans see an exponential curve that eventually leads to a perfection of the human condition analogous to the Christian version of "heaven." Pessimists like Bill Joy, Francis Fukuyama, Susan Greenfield and Martin Rees see a mirror image curve leading quickly to something like "hell." These are both credible scenarios. But the data lag. It is hard to find "proof" that we are arriving at one or the other, even though they are each based on nice smooth technodeterministic curves, the likes of which rarely if ever have been a prominent artifact of human history. Lord knows how you would demonstrate through data the arrival of the "prevail" scenario described by Jaron Lanier and others. That scenario is based on the notion that a prominent aspect of future history is that the increase in our challenges is being matched quickly enough by imaginative, ornery, cussed, collective, bottom-up human responses, setting events off in unpredictable directions. If graphed, the outcome—like so much of the raw material of history—would probably appear about as organized as a plate of spaghetti.
I would love to think that data lagging behind hypotheses—much less reality—is about to change. (At last! A crystal ball!) But I eagerly await a demonstration.
Return to THE END OF THEORY By Chris Anderson
MIND & BRAIN
The Mirror Neuron Revolution: Explaining What Makes Humans Social
Marco Iacoboni, a neuroscientist at the University of California at Los Angeles, is best known for his work on mirror neurons, a small circuit of cells in the premotor cortex and inferior parietal cortex. What makes these cells so interesting is that they are activated both when we perform a certain action—such as smiling or reaching for a cup—and when we observe someone else performing that same action. In other words, they collapse the distinction between seeing and doing. In recent years, Iacoboni has shown that mirror neurons may be an important element of social cognition and that defects in the mirror neuron system may underlie a variety of mental disorders, such as autism. His new book, Mirroring People: The Science of How We Connect to Others, explores these possibilities at length. Mind Matters editor Jonah Lehrer chats with Iacoboni about his research. ...
LEHRER: What first got you interested in mirror neurons? Did you immediately grasp their explanatory potential?
IACOBONI: I actually became interested in mirror neurons gradually. [Neuroscientist] Giacomo Rizzolatti and his group [at the University of Parma in Italy] approached us at the UCLA Brain Mapping Center because they wanted to expand the research on mirror neurons using brain imaging in humans. I thought that mirror neurons were interesting, but I have to confess I was also a bit incredulous. We were at the beginnings of the science on mirror neurons. The properties of these neurons are so amazing that I seriously considered the possibility that they were experimental artifacts. In 1998 I visited Rizzolatti’s lab in Parma, I observed their experiments and findings, talked to the anatomists that were studying the anatomy of the system and I realized that the empirical findings were really solid. At that point I had the intuition that the discovery of mirror neurons was going to revolutionize the way we think about the brain and ourselves. However, it took me some years of experimentation to fully grasp the explanatory potential of mirror neurons in imitation, empathy, language, and so on—in other words in our social life.
What the Internet is doing to our brains
...I think I know what's going on. For more than a decade now, I've been spending a lot of time online, searching and surfing and sometimes adding to the great databases of the Internet. The Web has been a godsend to me as a writer. Research that once required days in the stacks or periodical rooms of libraries can now be done in minutes. A few Google searches, some quick clicks on hyperlinks, and I've got the telltale fact or pithy quote I was after. Even when I'm not working, I'm as likely as not to be foraging in the Web's info-thickets—reading and writing e-mails, scanning headlines and blog posts, watching videos and listening to podcasts, or just tripping from link to link to link. (Unlike footnotes, to which they're sometimes likened, hyperlinks don't merely point to related works; they propel you toward them.)
For me, as for others, the Net is becoming a universal medium, the conduit for most of the information that flows through my eyes and ears and into my mind. The advantages of having immediate access to such an incredibly rich store of information are many, and they've been widely described and duly applauded. "The perfect recall of silicon memory," Wired's Clive Thompson has written, "can be an enormous boon to thinking." But that boon comes at a price. As the media theorist Marshall McLuhan pointed out in the 1960s, media are not just passive channels of information. They supply the stuff of thought, but they also shape the process of thought. And what the Net seems to be doing is chipping away my capacity for concentration and contemplation. My mind now expects to take in information the way the Net distributes it: in a swiftly moving stream of particles. Once I was a scuba diver in the sea of words. Now I zip along the surface like a guy on a Jet Ski.
I'm not the only one. When I mention my troubles with reading to friends and acquaintances—literary types, most of them—many say they're having similar experiences. The more they use the Web, the more they have to fight to stay focused on long pieces of writing. ...
Science Saturday: Summer Doldrums Edition
In his studies of entropy and the irreversibility of time, Caltech physicist Sean Carroll is exploring the idea that our universe is part of a larger structure.
Caltech physicist Sean M. Carroll has been wrestling with the mystery of time. Most physical laws work equally well going backward or forward, yet time flows only in one direction. Writing in this month’s Scientific American, Carroll suggests that entropy, the tendency of physical systems to become more disordered over time, plays a crucial role. Carroll sat down recently at Caltech to explain his theory. ...
Are you saying that our universe came from some other universe?
Right. It came from a bigger space-time that we don't observe. Our universe came from a tiny little bit of a larger high-entropy space.
I'm not saying this is true; I'm saying this is an idea worth thinking about.
You're saying that in some universes there could be a person like you drinking coffee, but out of a blue cup rather than a red one.
If our local, observable universe is embedded in a larger structure, a multiverse, then there's other places in this larger structure that have denizens in them that call their local environs the universe. And conditions in those other places could be very different. Or they could be pretty similar to what we have here.
How many of them are there? The number could well be infinity. So it is possible that somewhere else in this larger structure that we call the multiverse there are people like us, writing for newspapers like the L.A. Times and thinking about similar questions.
So how does the arrow of time fit into this?
Our experience of time depends upon the growth of entropy. You can't imagine a person looking around and saying, "Time is flowing in the wrong direction," because your sense of time is due to entropy increasing. . . . This feeling that we're moving through time has to do with the fact that as we live, we feed on entropy. . . . Time exists without entropy, but entropy is what gives time its special character.
THE SEED SALON
The father of cognitive neuroscience and the original New Journalist discuss status, free will, the human condition, and The Interpreter.
TW: ... Many of today's leading theorists, such as E. O. Wilson, Richard Dawkins, and Dan Dennett, probably know about as much on the human brain as a second-year graduate student in neuropsychology. That isn't their field. Wilson is a great zoologist and a brilliant writer. Dawkins, I'm afraid, is now just a PR man for evolution. He's kind of like John the Baptist — he goes around announcing the imminent arrival. Dennett, of course, is a philosopher and doesn't pretend to know anything about the brain. I think it has distorted the whole discussion.
MG: Well, let me roll the cameras back to the '80s and '90s, when neuroscience was taking off. There were new techniques available to understand the chemical, physiological, and anatomical processes in the brain. Imaging was starting up and the inrush of data was enormous and exciting. So there was a hunger for the big picture: What does it mean? How do we put it together into a story? Ultimately, everything's got to have a narrative in science, as in life. And there was a need for people who didn't spend their time looking down a microscope to tell a story of what this could mean. I would say that some of the people who've made attempts at that did a very good job. But I will hold out for the fact that if you haven't slaved away looking at the nervous system with the tools of neuroscience — if you're only talking about it — you don't quite have the same respect for it. Because it is an extraordinarily complex machine. ...
by Anita Elberse
It was a compelling idea: In the digitized world, there's more money to be made in niche offerings than in blockbusters. The data tell a different story.
...Much has changed in commerce, however, in the decades since the blockbuster strategy first took hold. Today we live in a world of ubiquitous information and communication technology, where retailers have virtually infinite shelf space and consumers can search through innumerable options. When books, movies, and music are digitized and therefore cheap to replicate, the question arises: Is a blockbuster strategy still effective?
One school of thought says yes. Well represented by the economists Robert Frank and Philip Cook, in their 1995 book The Winner-Take-All Society, that school argues that broad, fast communication and easy replication create dynamics whereby popular products become disproportionately profitable for suppliers, and customers become even likelier to converge in their tastes and buying habits. The authors offer three reasons for their view: First and foremost, lesser talent is a poor substitute for greater talent. Why, for example, would people listen to the world's second-best recording of Carmen when the best is readily available? Thus even a tiny advantage over competitors can be rewarded by an avalanche of market share. Second, people are inherently social, and therefore find value in listening to the same music and watching the same movies that others do. Third, when the marginal cost of reproducing and distributing products is low—as it certainly is with goods that can be digitized—the cost advantage of a brisk seller is huge. Frank and Cook were elaborating on the economist Sherwin Rosen's earlier work describing the "superstars" effect, in which a field's few top performers pull ever further away from the pack. According to this line of thought, hits will keep coming—to the increasing detriment of also-rans.
Although that thesis continues to hold sway, another idea has emerged in recent years—presented just as persuasively, and proposing the opposite. The "long tail" theory took shape in an article by Chris Anderson, editor of Wired magazine, which grew into the 2006 book The Long Tail: Why the Future of Business Is Selling Less of More. The book's subtitle puts the strategic implications in a nutshell. Now that consumers can find and afford products more closely tailored to their individual tastes, Anderson believes, they will migrate away from homogenized hits. The wise company, therefore, will stop relying on blockbusters and focus on the profits to be made from the long tail—niche offerings that cannot profitably be provided through brick-and-mortar channels. (See the sidebar "The Long-Tail Theory in Short.") ...
• BLOG: Read a response from Chris Anderson, author of The Long Tail, and participate in this online discussion.
• BLOG: Read Anita Elberse's response to Chris Anderson and participate in this online discussion.
A book from 2006, "The Long Tail," was one of those that appear periodically and demand that we rethink everything we presume to know about how society works. In this case, the Web and its nearly unlimited choices were said to be remaking the economy and culture. Now, a new Harvard Business Review article pushes back, and says any change occurring may be of an entirely different sort.
The Long Tail theory, as explained by its creator, Wired magazine editor Chris Anderson, holds that society is "increasingly shifting away from a focus on a relatively small number of 'hits' (mainstream products and markets) at the head of the demand curve and toward a huge number of niches in the tail."....
...Since appearing two years ago, the book has been something of a sacred text in Silicon Valley. Business plans that foresaw only modest commercial prospects for their products cited the Long Tail to justify themselves, as it had apparently proved that the Web allows a market for items besides super-hits. If you demurred, you were met with a look of pity and contempt, as though you had just admitted to still using a Kaypro.
That might now start to change, thanks to the article (online at tinyurl.com/3rg5gp), by Anita Elberse, a marketing professor at Harvard's business school who takes the same statistically rigorous approach to entertainment and cultural industries that sabermetricians do to baseball. ...
No one would accuse Craig Venter of harboring humble ambitions. In 2000 he decoded the human genome faster than anyone else—and he did it more cheaply than a well-funded government team. More recently he's set a new goal for himself: to replace the petrochemical industry. In a Maryland lab, he's manipulating chromosomes in the hopes of creating an energy bug—a bacterium that will ingest CO2, sunlight and water, and spew out liquid fuel that can be pumped into American SUVs. NEWSWEEK's Fareed Zakaria spoke to Venter about the brave new world of biologically based fuels. Excerpts: ...
Are You Optimistic About?:
"The optimistic visions seem not just wonderful but plausible." Wall Street Journal "Persuasively upbeat." O, The Oprah Magazine "Our greatest minds provide nutshell insights on how science will help forge a better world ahead." Seed "Uplifting...an enthralling book." The Mail on Sunday
Is Your Dangerous Idea?:
"Danger – brilliant minds at work...A brilliant bok: exhilarating, hilarious, and chilling." The Evening Standard (London) "A selection of the most explosive ideas of our age." Sunday Herald "Provocative" The Independent "Challenging notions put forward by some of the world’s sharpest minds" Sunday Times "A titillating compilation" The Guardian "Reads like an intriguing dinner party conversation among great minds in science" Discover
We Believe but Cannot Prove:
"An unprecedented roster of brilliant minds, the sum of which is nothing short of an oracle — a book ro be dog-eared and debated." Seed "Scientific pipedreams at their very best." The Guardian "Makes for some astounding reading." Boston Globe Fantastically stimulating...It's like the crack cocaine of the thinking world.... Once you start, you can't stop thinking about that question." BBC Radio 4 "Intellectual and creative magnificence" The Skeptical Inquirer