What those breadcrumbs tell is the story of your life. It tells what you've chosen to do. That's very different than what you put on Facebook. What you put on Facebook is what you would like to tell people, edited according to the standards of the day. Who you actually are is determined by where you spend time, and which things you buy. Big data is increasingly about real behavior, and by analyzing this sort of data, scientists can tell an enormous amount about you. They can tell whether you are the sort of person who will pay back loans. They can tell you if you're likely to get diabetes.
They can do this because the sort of person you are is largely determined by your social context, so if I can see some of your behaviors, I can infer the rest, just by comparing you to the people in your crowd. You can tell all sorts of things about a person, even though it's not explicitly in the data, because people are so enmeshed in the surrounding social fabric that it determines the sorts of things that they think are normal, and what behaviors they will learn from each other.
As a consequence analysis of Big Data is increasingly about finding connections, connections with the people around you, and connections between people's behavior and outcomes. You can see this in all sorts of places. For instance, one type of Big Data and connection analysis concerns financial data. Not just the flash crash or the Great Recession, but also all the other sorts of bubbles that occur. What these are is these are systems of people, communications, and decisions that go badly awry. Big Data shows us the connections that cause these events. Big data gives us the possibility of understanding how these systems of people and machines work, and whether they're stable.
The notion that it is connections between people that is really important is key, because researchers have mostly been trying to understand things like financial bubbles using what is called Complexity Science or Web Science. But these older ways of thinking about Big Data leaves the humans out of the equation. What actually matters is how the people are connected together by the machines and how, as a whole, they create a financial market, a government, a company, and other social structures.
Because it is so important to understand these connections Asu Ozdaglar and I have recently created the MIT Center for Connection Science and Engineering, which spans all of the different MIT departments and schools. It's one of the very first MIT-wide Centers, because people from all sorts of specialties are coming to understand that it is the connections between people that is actually the core problem in making transportation systems work well, in making energy grids work efficiently, and in making financial systems stable. Markets are not just about rules or algorithms; they're about people and algorithms together.
Understanding these human-machine systems is what's going to make our future social systems stable and safe. We are getting beyond complexity, data science and web science, because we are including people as a key part of these systems. That's the promise of Big Data, to really understand the systems that make our technological society. As you begin to understand them, then you can build systems that are better. The promise is for financial systems that don't melt down, governments that don't get mired in inaction, health systems that actually work, and so on, and so forth.
The barriers to better societal systems are not about the size or speed of data. They're not about most of the things that people are focusing on when they talk about Big Data. Instead, the challenge is to figure out how to analyze the connections in this deluge of data and come to a new way of building systems based on understanding these connections.
Changing The Way We Design Systems
With Big Data traditional methods of system building are of limited use. The data is so big that any question you ask about it will usually have a statistically significant answer. This means, strangely, that the scientific method as we normally use it no longer works, because almost everything is significant! As a consequence the normal laboratory-based question-and-answering process, the method that we have used to build systems for centuries, begins to fall apart.
Big data and the notion of Connection Science is outside of our normal way of managing things. We live in an era that builds on centuries of science, and our methods of building of systems, governments, organizations, and so on are pretty well defined. There are not a lot of things that are really novel. But with the coming of Big Data, we are going to be operating very much out of our old, familiar ballpark.
With Big Data you can easily get false correlations, for instance, "On Mondays, people who drive to work are more likely to get the flu." If you look at the data using traditional methods, that may actually be true, but the problem is why is it true? Is it causal? Is it just an accident? You don't know. Normal analysis methods won't suffice to answer those questions. What we have to come up with is new ways to test the causality of connections in the real world far more than we have ever had to do before. We no can no longer rely on laboratory experiments; we need to actually do the experiments in the real world.
The other problem with Big Data is human understanding. When you find a connection that works, you'd like to be able to use it to build new systems, and that requires having human understanding of the connection. The managers and the owners have to understand what this new connection means. There needs to be a dialogue between our human intuition and the Big Data statistics, and that's not something that's built into most of our management systems today. Our managers have little concept of how to use big data analytics, what they mean, and what to believe.
In fact, the data scientists themselves don't have much of intuition either…and that is a problem. I saw an estimate recently that said 70 to 80 percent of the results that are found in the machine learning literature, which is a key Big Data scientific field, are probably wrong because the researchers didn't understand that they were overfitting the data. They didn't have that dialogue between intuition and causal processes that generated the data. They just fit the model and got a good number and published it, and the reviewers didn't catch it either. That's pretty bad because if we start building our world on results like that, we're going to end up with trains that crash into walls and other bad things. Management using Big Data is actually a radically new thing.
This last year at Davos I ran several sessions around Big Data with the CEOs of leading companies in this area, and it was very clear that there's a whole new way of doing things that's just now developing. Some of them, like Palantir and TIBCO, are making progress at this, but to most of the people in the room this was brand new, and they had not gotten up to speed about it at all.
Another important issue with Big Data is that since this data is mostly about people, there are enormous issues about privacy, data ownership, and data control. You can imagine using Big Data to make a world that is incredibly invasive, incredibly 'Big Brother'… George Orwell was not nearly creative enough when he wrote 1984.
For the last several years I've been helping to run sessions at the World Economic Forum around sourcing personal data and ownership of the data, and that's ended pretty successfully with what I call the New Deal on Data. The Chairman of the Federal Trade Commission, who's been part of the group, put forward the U.S. "Consumer Data Bill of Rights," and in the EU, the Justice Commissioner declared a version of this New Deal to be a basic human right.
Both of these regulatory declarations put the individual much more in charge of data that's about them. This is a major step to making Big Data safer and more transparent, as well as more liquid and available, because people can now choose to share data. It is a vast improvement over having the data being locked away in industry silos where nobody even knows it's there.
Adam Smith And Karl Marx Were Wrong
These Big Data issues are important, but there are bigger things afoot. As you move into a society driven by Big Data most of the ways we think about the world change in a rather dramatic way. For instance, Adam Smith and Karl Marx were wrong, or at least had only half the answers. Why? Because they talked about markets and classes, but those are aggregates. They're averages.
While it may be useful to reason about the averages, social phenomena are really made up of millions of small transactions between individuals. There are patterns in those individual transactions that are not just averages, they're the things that are responsible for the flash crash and the Arab spring. You need to get down into these new patterns, these micro-patterns, because they don't just average out to the classical way of understanding society. We're entering a new era of social physics, where it's the details of all the particles—the you and me—that actually determine the outcome.
Reasoning about markets and classes may get you half of the way there, but it's this new capability of looking at the details, which is only possible through Big Data, that will give us the other 50 percent of the story. We can potentially design companies, organizations, and societies that are more fair, stable and efficient as we get to really understand human physics at this fine-grain scale. This new computational social science offers incredible possibilities.
This is the first time in human history that we have the ability to see enough about ourselves that we can hope to actually build social systems that work qualitatively better than the systems we've always had. That's a remarkable change. It's like the phase transition that happened when writing was developed or when education became ubiquitous, or perhaps when people began being tied together via the Internet.
The fact that we can now begin to actually look at the dynamics of social interactions and how they play out, and are not just limited to reasoning about averages like market indices is for me simply astonishing. To be able to see the details of variations in the market and the beginnings of political revolutions, to predict them, and even control them, is definitely a case of Promethean fire. Big Data can be used for good or bad, but either way it brings us to interesting times. We're going to reinvent what it means to have a human society.
Creating A Data-Driven Society
One of the great questions is: who is this new Data Driven world going to be for and what is it going to look like? People ask if this just for the Davos attendees or for everybody? That's a question of values and ethics, and that's why people have to be debating this now, and why I'm talking about this—to start the conversation. But I will say however that all the conversations I've been at in Davos have had an extremely strong egalitarian element. Most people are advocates for the poor. Many are people from developing countries—an enormous number, not just a token scattering. There's a real focus on building a sustainable future, which means one in which there aren't large chunks of the population left out in the cold. Obviously not everybody is 100 percent devoted to that agenda, but most are.
A key insight is that your data is worth more if you share it because it enables systems like public health. Data about the way you behave and where you go, and that can be used to can stop the spread of infectious disease. If you have children, you don't want to see them die of an H1N1 pandemic. How are you going to stop that? Well, it turns out that if you can actually watch people's behavior in real time...something that is quite possible today…you can tell when each individual person is getting sick. This means you can actually see the spread of influenza from person to person on an individual level. And if you can see it, you can stop it. You can begin to build a world where infectious pandemics cease to be as much of a threat.
Similarly, if you're worried about global warming, we now know how patterns of mobility relate to productivity (and I just showed some examples of those—we are doing a lot really amazing science around this). This means you can design cities that are far more efficient, far more human, and burn an awful lot less energy. But you need to be able to see the people moving around in order to be able to get these results. That's another instance where sharing your data is invaluable to you personally. It's everybody contributing his or her data that's going to make a greener world, and that is worth far more than the simple cash value of the data.
However today the data is siloed off and unavailable, and that was the one of the core reasons I proposed the New Deal on Data to the World Economic Forum. Since then the idea has run through various discussions turned into the Consumer Data Bill of Rights in the United States, and the declaration on Data Rights in the EU. The core idea is that when data is in silos you can't make use of it either for evil or for the public good, and we need the public good. We need to stop pandemics. We need to make a greener world. We need to make a fairer world.
Who Owns The Data In A Data-Driven Society?
How do you get the data out of those silos? The first step is you have to figure out who owns that data. Does the telephone company own it, just because it happened to be collected while you were walking around with your phone? Maybe they have some right to use it. But what the discussions are among all the participants, including the telephone companies, is that you're the only one that has final disposal of it. They would have the ability to keep copies to offer services that you've requested, but you, the individual, have to have the final say.
Some situations are, of course, more complex. What about if the data is a transaction with a merchant? Well, they have a right to the data too. But by assigning rights of ownership to people (which is not exactly the same as legal ownership) what you do is you make it possible to break data out of the silos. You've turned it into a personal asset that can then be shared for value in return. You can make it a liquid asset that can be used to build government systems, social systems, or for-profit systems. That's the world we're moving towards.
Is there opposition to this? Surprisingly little. The incumbents in the Internet are probably the major opposition because (and I don't mean to pick on them) Facebook and Google grew up in a completely unregulated environment. It is natural for them to think that they have control over the data, but now they're slowly, slowly coming around to the idea that they're going to have to compromise on that.
However the people who have the most valuable data are the banks, the telephone companies, the medical companies, and they're very highly regulated industries. As a consequence they can't really leverage that data the way they'd like to unless they get buy-in from both the consumer and the regulators. The deal that they've been willing to cut is that they will give consumers control over their data in return for being able to make them offers about using their data.
That gets these companies out of the regulator's pocket. It gives them a white hat, because they explicitly asked you if you wanted to op in, and it lets them make money, which is what they desperately want. And it appears that if you treat people's data in this sort of responsible manner, people will willingly share their data. It is a win-win-win solution to the privacy problem, and it's the companies that grew up in an unregulated environment, or the companies that are in gray markets that are likely to dry up, that are most strongly opposed.
We are beginning to see is services that leverage personal data in this sort of respectful manner. Services such as really personal recommendations, identity certification without passwords, and personal public services for transportation, health, and so forth. All these areas are undergoing tectonic changes, and the more that we can use specific data about specific people, the better we can make the system work.
These dramatic improvements in societies' systems goes back to what I was saying earlier. Today societies' systems are built on big averages and indices, e.g., this class of people do this and this market's moving that way. But really, it's all made up of millions and millions of small interactions, and with Big Data we can get down and design things that really work for us on a personal level, rather than just being treated as another type A4 consumer.
Organizations With Hard Information Boundaries Will Tend To Dissolve
I got to these issues through a long and varied history. I started off doing a lot of signal processing machine vision. I have a background in psychology as well, and am concerned with how data and people come together in social systems. For instance, we developed some of the first wearable computing devices. The Google Glass project comes out of my group…the guys that are building it are my former students. But as a result of these sorts of projects it became obvious to me that the most important thing was not the user interface or the device, it was the data about people. Later, as cell phones became more ubiquitous, it was clear that that they were going to be the biggest source of data in the world.
If you could see everybody in the world all the time, where they were, what they were doing, who they spent time with, then you could create an entirely different world. You could engineer transportation, energy, and health systems that would be dramatically better. It's this history of thinking about signals and people together, and how people work via these computer systems, and what data about human behavior can do, that led me to the realization that we're at a phase transition. We are moving from the reasoning of the enlightenment about classes and about markets to fine grain understanding of individual interactions and systems built on fine grain data sharing.
This new world could make George Orwell look like an unimaginative third stringer. It became really clear you had to think hard about the privacy and data ownership issues. Things that George Orwell didn't realize were that is that you can watch the patterns of people interacting then you can figure out things like who they're going to vote for and how they're going to react to various situations like changes of regulation, and so forth. You could build something that, to a first approximation, would be the real evil empire. And, of course, some people are going to try and do that.
At the same time, there are some elements of this new data driven world that are really promising. For instance, the most efficient and robust architectures tend to be ones that have no central points. It means that there's no single place for a dictator to grab control. They have to actually go to every house to really control the data. In addition, I see government policies going in the right directions, to minimize these sorts of dangers.
Also there is inherent in a society built on data sharing a certain level of transparency and choice for individuals that I believe will tend to mitigate against central control. It tends to dissolve the power of the state and big organizations because you can build things that are far more efficient and robust if they're distributed and without the hard information boundaries that you see today.
That means that the service-oriented government, as it were, or the service-oriented organization will tend to have better offerings for a lower price, as opposed to the ones that try to own the customer or control the citizen. As a consequence I expect to see that organizations with hard information boundaries will tend to dissolve, because there will be competition from things that are better that don't have the hard boundaries and don't try to own your data.