It has changed partly because we started to be aware of it partly because there were a lot of technological advances that forced us to think about connectedness. We had Worldwide Web, which was all about the links connecting information. We had the Internet, which was all about connecting devices. We had wireless technologies coming our way. Eventually, we had Google, we had Facebook. Slowly, the term 'network connectedness' really became part of our life so much so that now the word 'networks' is used much more often than evolution or quantum mechanics. It's really run over it, and now that's the buzzword.
The question is, what does it mean to be part of the network, or what does it mean to think in terms of the network? What does it mean to take advantage of this connectedness and to understand that? In the last decade, what I kept thinking about is how do you describe mathematically the connectedness? How do you get data to describe that? What does this really mean for us?
This had several stages, obviously. The first stage for us was to think networks, only networks down the line. That was about a decade ago, we witnessed the birth of network science. I could say a couple of geniuses came along and did it, but really it was the data that made it possible. Suddenly we started to discover that lots of data that's out there, that we're collecting thanks to the Internet and other technological advances, allowed us to look at connectedness and to measure it and to map it out.
Once you had data, you could build theories. Once you had theories, you have predictive power, you could test that and then the whole thing fitted itself. It suddenly very actively emerged as a field that we now call network science. Going beyond networks, going beyond connectedness, we realized we started to know not only whom you connect to and whom you see and where are your links (the economical, personal, social or whatever they are) but we started to see also the timing of your activities. What do you do with those links? When do you interact?
That was the second way; we called it 'human dynamics.' It describes what do we do in real time, because if you think about it, social sciences have been trying for a very, very long time to try to describe human behavior. They did a really good job delivering a set of tools for how you measure a person's activity, but much of that was really based on observation, based on small samples and based on interviews and questionnaires. What has happened in the last decade or so is that thanks to the many activities we have, and thanks to the many digital devices that we carry around, much of our activity became completely recorded. We got to the point that there's so much data recording happening around us, that pretty much somebody who lives in a big city in Western Europe or in the United States, much of their life, almost in minute resolution, can be reconstructed from the many data streams that we leave around us.
What all of this did is that it really changed science, and not only science, but it really created a whole new way of thinking about human behavior and about data because we got to the point that we don't have to rely on interviews. We actually have lots of objective data on what people do, whom they communicate with, when they communicate, how often they communicate, where are they when they communicate. Often what are their motivations when they do so. The data became so rich that you can actually start piecing away, start forming hypotheses and start answering them. That is really a game changer. If you think about science, much of science really develops through new tools. When telescopes came around, then we started to discover that many of the planets around had moons. Then microscopes came around and biology came along with that. You can have the whole list of tools that were discovered, and they gave us new discoveries.
What is happening right now is that now a new tool is becoming available, but as a result of the technological advances, there's so much data being created about us that science becomes a byproduct of all this data. The question really becomes not as much how you collect the data, but how do you make sense of it? And this comes under many different names. You can call it network science, you can call it human dynamics, you can call it computational social science, you can call it big data. Whatever you call it, down the line, what we're talking about is that there's a huge amount of information collected about us and we need to make sense of it.
Making sense of it has many different facets. One facet would be let's just be able to store it and be able to recall it, which is fundamentally a computational problem, like a computer science problem. Then there is the aspect of how do you extract really meaningful information from these huge amounts of data? Then there are the companies that want to make predictions regarding your habits so that they can sell you stuff.
Then there is the science part (which is how I come to this), which is to say I'm not trying to sell anything. I'm not trying to tax anything. What I'd like to do is to say could we use data to do new science? To really understand what we're doing, because the opportunity is there. This is the first time that we can know what people are doing in an objective manner, without biases, without lying, without kidding ourselves, of trying to present a different image than what we are.
In the last years, this has been our focus. This has many facets, because of course, the picture is pretty clear and the goal is very clear. We want to understand what humans do; we want to understand what complex systems do. We want to describe this very interconnected world that we describe. But then the way you do that is that you break it down to specific projects and ask specific questions. One question that fascinated me in the last two years is, can we ever use data to control systems? Could we go as far as, not only describe and quantify and mathematically formulate and perhaps predict the behavior of a system, but could you use this knowledge to be able to control a complex system, to control a social system, to control an economic system?
Controlling economic systems is now a very hot issue, because it became so uncontrollable. Can you describe mathematically systems in sufficient detail that you'd be able to do that? The answer is, yes, you can do that, in a sense. But the mathematics is emerging. You can ask very specific questions that pertain to control. From my perspective, science has a number of goals. The first is to be able to collect enough information so that you can describe, so that you can quantify. Eventually, if you quantify properly, then you can mathematically formulate. If you mathematically formulate it, then you gain predictive power. If you gain predictive power, eventually you get to the point to be able to control it.
The question is, where are we on this long path, on the different aspects of human behavior or social systems, of economic systems and so on? In an increasing number of systems, we started to see the world spectrum emerging to possibility for the whole spectrum to emerge.
When network science started more than a decade ago, all of us knew that this was going to be really important and that this was going to be a game changer—the way we look at complex systems. Down the line, it will also have lots of economic benefits. But being scientists, we never knew where to turn. Recently, I read a claim; the claim was that for every technology, the first ten years is the development, and the second ten years is when the market follows. I feel that that's what happened in network science.
In the first ten years we knew this was important, but being scientists, we didn't know where to start commercializing, or how to think about what we were going to produce out of it. Of course, part of it was us doing the science as well as making billions on it, using the tools of network science and so on. But it was happening in parallel, not as a consequence of network science. That has really changed in the last ten years, and it was a dramatic change. There are many signatures of that. Now there are companies all around the world who really leave off network science. They map out organizations, they map out communities, and they make predictions about the health of organizations and so on. They deeply use network science. Facebook ended up completely revamping its recommendation system based on network science. They hired a couple of people who had a degree, and they changed the way it works. It works so much better now. Their recommended system is very accurate, by building in many of the experiences we had.
There are a number of companies out there on the biological sphere that are entirely built on network science. For a long time, biologists were buyers to network thinking. But medicine is very pragmatic. Now, medicine is really exploding in thinking about network science. Harvard is in the process of starting a new division of network medicine. Just last week, we had a symposium where 550 people signed up, and 150 were on the waiting list to attend a three-hour symposium about network medicine within the Harvard Medical School. It really has started to penetrate areas that are very pragmatic, very results-oriented, very "let me see what you're going to do for me". These areas are coming along, and saying, "we need to be able to use it." We need to rethink what disease looks like, how we're going to cure disease, and that's a fundamentally interconnected problem. We need to be able to make better recommendations in social systems, and that's fundamentally a network problem.
We live in a moment when this whole network thinking is really entering into lots of different aspects of the business world, not to speak about financial systems, or fundamental aids. It's a deep interconnected problem. I wish I could say network science could be useful there as well. Yes, it could be useful, if the data would be available, so we can actually do that. There the limitation is not that we don't have the scientific doses; it's the limitation that we don't have the data on which we could actually feed off of.
My lab has been engaged in a number of problems recently. I would say that we're split in thirds in terms of what we do. One-third of the lab works on fundamental issues pertaining to network science. This is the part of the lab that produced the paper last year on how we control a network, how we apply control theory to identify early the nodes from which you can actually control a network. This is partly the lack of things fundamental to network issues, how you describe a generic network without any particular application in mind, but what are the fundamentals of network science.
A third of the lab, however, focuses on biologic assistance, and there the focus is increasingly disease. That is, how do we think about diseases? It's pretty clear now that we're onto the end of the genome paradigm. Not that the genome is not important; everybody knows how important it is, but it's just not enough. The way I think about it is that if you want to have your car fixed, the mechanic will fix it, no matter how broken your car is, they can fix it. Why can they fix it, and why can't the doctor fix us, no matter how broken we are? The answer is, the mechanic has the parts list. So does a doctor, in a way, because thanks to the Genome Project, we have all the genes and all the proteins in the cell. What the mechanic has that a doctor doesn't have is the blueprint, the wiring diagram.
That's what is really missing when it comes to thinking about diseases. The doctors don't think in terms of the wiring diagram. They think of things in terms of symptoms, they think in terms of drugs, but they don't have a mechanistic thinking of what really happens within the cell, what happened so that you got that particular disease. That has to change. The road to change is really going to have to go through networks. We're working with a number of groups. Some of them work with us to map out the networks, to understand how the genes and the proteins connect to each other within the cell such that we can start thinking about diseases.
Other parts of the lab work on the source, and other collaborators of ours work on specific diseases, like asthma and COPD, trying to say, "could we identify within the network that region within the cell, that region of the network that breaks down when you get COPD, when you get cancer, when you get heart disease? Could you localize where that disease is sitting within your cell?" If you could localize the disease module, find that neighborhood that is broken within your cellular network, then you know what to do next. How would you design drugs to actually fix it then?
People in my lab work very closely with medical doctors to understand, with the data that we have about patients, with what we know about the genomics, with the mass we have for the cell that you're caring about, let's put the pieces together, let's identify what is really broken. Where is the wiring diagram broken, essentially, within your network?
That really will be with us for the next ten, twenty years, because it's an ongoing challenge. The data is becoming more and more accurate; our predictions are becoming more and more accurate. The experimental tools are becoming more accurate. The way I think about it is that we have cardiologists and we have neurologists; I think we're going to continue having that. But the future doctor will have to become a networkologist as well. They will really need to understand that language, in the same way the special mechanic, whenever your lights don't turn on, he is right away thinking about the wiring diagram: where it's broken, what is the piece that's not working, let me find it, let me replace it. That's how doctors will have to think.
The remaining third of the lab focuses entirely on social systems and human mobility. That's what comes under competition of social science, human dynamic sense of one(?). There we rely a lot on data sets that we get from mobile phone carriers, that we get from email carriers, and try to actually understand where people are, how they communicate, how they move around. Can we quantify all those processes? Of course, these different parts of the lab are not independent of each other because the basic network science tools that one-third of the lab develops get tested both on the biological as well as on the social end.
Sometimes the questions are initiated in the social science direction, but then it turns out to be a relevant question for the biological problem as well, and we test it there, and turns out to be a fundamental question that ends up being passed to the part of the lab that thinks about fundamental issues within networks. Right now, it's a very healthy setup that we have these three approaches that allows us to have flexibility over many different systems.
It is almost common sense now that we live in the age of networks. What most people haven't really internalized is that these networks are not random. They have internal rules. Once you start seeing them, then you start looking at the very different way of how these networks function. The number of highly connected or less-connected nodes is never random in the network. The way they break down, the way they evolve is never random in these networks. The way that hubs link to their neighborhood, the way the community is formed, the way the communities look, their number, their size, they all follow very precise laws and very quantifiable patterns.
These patterns are often amazingly simple. Once you break it down, then it creates a new perspective of how the system works. Think about Google. Google's great success was really early on, the page rank algorithm. They had a good algorithm but it's not that they searched more than anybody else at the early stages, it's not that they searched any better, but they were much better at identifying what are the pages that you want to see when you are searching for a certain topic. Why were they successful? Why was page rank behind that success? It turns out it's deeply connected to the network structure. If the worldwide web had had the same ratio of hubs it did, page rank would never have worked. If the worldwide web had been truly a random network (and there's actually evidence mathematically) the worldwide web wouldn't have been the way it looks (even the number of hubs and smaller connected nodes in the ratios). Page rank would have returned garbage.
Facebook, of course, became the book on who connects to whom. It's becoming the best depository of the social network that we have. There's nothing better right now, nothing more accurate right now. There's this pool of algorithms behind it, and the question is why they work and what are the best algorithms? What are the fundamentals? Each algorithm has to have some hypothesis behind that. Down the line, it turns out that many of the tools they use of how they recommend a friend and how they choose marketing for you are exploiting the underlying structure of the network. They're exploiting the fundamental rules that we know exist about the network.
When I teach networks, whether it's for CEOs, for a very lay audience or a very professional audience, whether I do it in Dallas as I did recently in January, or I do it in a high school, I go over a set of about ten fundamental laws of what networks look like, from the smaller behavior to the scale (Inaudible) behavior, the divergence of the hubs, to the reason why they are there, the evolving nature of the network, to the robustness problem, of how robust they are to end(?) (Inaudible) and to attacks. To the emergence of the communities and the role of the communities, to the associative property of why these social systems help connect to other hubs. Why in biological systems hubs tend to avoid each other.
Once you go through and understand step by step what these mean, you will have a completely different way of how you think about networks. My experience is that depending on your background, you take it in very different directions, that knowledge. That's what I can never predict. It's really interesting. When I give some of these talks about the basic rules of networks and what they look like, what I find is that there are ten people in the audience, and they understood it in five different ways. Not that anybody got it wrong, but they took five different messages away of how that applies to what they care about. If it's a medical doctor, or a businessperson, they have a completely different message.
Often they're so creative. I could never have come up with some of the answers they are actually coming up with as they are absorbing the material. Through the last few years of lecturing about networks, I'm amazed at the power of this knowledge. Once you internalize it, what a different perspective you get. The way I think about it, once you go through this mini course that I give, people just can't stop thinking in networks anywhere they go. They start seeing the elements. They start seeing what holds them together, they start seeing it within their life.
These companies are, down the line, the feeding ground for new science. They are partly by posing questions of what they care about, and partly by collecting the data. We as the scientific community will never be able to collect the rich, layered data these companies do. Mobile phone companies right now have the most detailed information about human behavior. No matter how much money NIH or NSF would give us, we would not be able to collect those layers of information that they do—what Facebook collects, what Google collects.
Down the line they became the gold mine for research. At the same time, access is problematic. Companies can really feed their researchers, and they do so; many of them are innovative enough through data, and also through questions that they care about, but without turning this one into applied research. I think in many cases, this became a symbiotic relationship. There are researchers who work very closely with Facebook and they publish together, and they ask questions about joint interests. Google has quite a number of researchers within its team and there are people from outside who work with them. I had students, for example, spending a summer with them. I think that if the value of this symbiosis is understood, both science and the companies themselves benefit. The reality is that 1940s, '50s, '60s, all the way to the '80s, were the decades of materials. Whatever we have now came from transistors, from silicon, from plastics, from oil.
We now live in an economy that is the economy of information, of interconnectedness. In the same way as we had material science, we now have technology. We have engineering that is focusing on materials, but now we need to build up the new science that is responding to the challenges of what we have today: data challenges, making sense of data challenges. These are not pertaining only to software companies. They pertain to medicine. One of the biggest issues in medicine is how to manage this huge amount of data, how you extrapolate that. How do you know what works and what doesn't work? Often what works and what doesn't work is already there. It's already in the insurance company's files; it's already in the hospital files. You just have to put it together.
For example, I work very closely with the Hungarian Ministry of Health, where we have access to the data of where patients go: like how they go from one doctor, where do doctors send the patient? What is the flow, what is the efficiency of the hospitals? Where is the success story, where are we successful? At many levels, we've got the data companies and technologies have produced. What do we do with that? In the same way as we built a scientific industry behind materials, we need to build an equally powerful and equally potent one in the data area, in the kind of soft technological areas that we have today.
There is a tendency always when something really new comes along to consider it as fad, as not real, as passing. To a certain degree, when materials, like silicon, and quantum mechanics came along, there were lots of detractors. When the Genome Project started, lots of people said, "why would you sequence all this junk, down the line?" Because at the end, they thought much of the sequence is really not coding and it's irrelevant. I think what we may be seeing now is that this research hasn't penetrated yet. It hasn't been institutionalized yet. It's doesn't have its institutions yet. There is a tendency for the guy who is being replaced to say this is a fad, this is not relevant.
I personally think it's fundamentally very transformative. Anybody who thinks that this is not going to be with us 20 years from now are kidding themselves in the same way they may have thought 50 years ago that materials are not relevant, it's a fad and it's going to pass over. No. I think it's going to be deeply ingrained into the society. Behind it, there is very serious, very deep science. The difference is that it's not transforming according to the codes of the old science.
I can see it in my own life, in my own research. I'm a physicist. The physics department typically has a few theorists and a quite larger number of experimentalists, and it's a very carefully thought out balance, of how many theorists you need for experimentalists to keep up the innovation. But I do my own experiments and my own theory now. We collect our own data, we copy the data from where ever we can, we scrap the data from wherever we can, we analyze it, we write the theory, we write the computer code, we write the mathematical laws, we cast it on the real data. So it's a very different kind of science.
Where would you put me? Am I a theorist or an experimentalist? I'm not and I'm both. And that, of course, is creating a stress. Am I a physicist? Well, many of my colleagues in the physics department think I'm not a physicist. I deeply think I'm a physicist because I really think that fundamentally physics has to change. If it wants to survive, and doesn't want to become like mathematics (that got really marginalized, and became art(?) (Inaudible) in many respects) physics will have to tackle the problems that we face, for which it has the tools. Physics happens to have the tools to talk about complex systems or random systems in large data, except it's not part of the canon. If we want to continue physics to be relevant in society, we will have to internalize it. We have to make it part of the canon.
The paper that we wrote in 2001 with my student, which was her thesis, this year became the most cited physics paper in the most prestigious journal ever in the history of physics. It displaced Chandrasekhar's paper from 1942, which has been for the last 30 years the most cited paper. This is a ten year-old paper that became, and will probably stay on the top. So we got the top physics journal, and now the most cited paper in the history of that journal is a paper about networks, which many physicists don't even consider physics.
Is this not physics? Well, Chandrasekhar had to wait about 40 years for his Nobel Prize, because they considered astronomy as not physics. That's astronomy. The point I'm trying to make here is that science has to change. Physics has to change, mathematics has to change, and biology has to change. Biology is rapidly adopting the genomics and in the same way we have to realize and we have to take pride in the fact among physicists, we helped to make the revolution in science. There are lots of physicists, let's say, "on the guard" of the people who think about complex systems and think about data and think about networks. We need to be proud of that, because really that helped another revolution to come along. A revolution that doesn't fit under the normal boundaries of what a physics department is supposed to do.
I personally think that physics is extremely relevant. But it is done despite the establishment. Many of the driving forces that are exciting the students, what the students are coming from are really not presenting a traditional account of physics. If you go to a typical physics department, students come because they want to work in biophysics. Not the old type of biophysics where you electrocute the muscle of a frog and then it jumps. But they want to think about the cell, they want to think about genomics. They want to use the tools of physics to really rethink how complex systems and how a cell's complex system looks. They want to do networks. They want to do nanotechnology. These are all things that were not part of the traditional physics curriculum. They want to do astrophysics, which only recently we've internalized within the physics community. So it's a shift that is driven by the young people, it's a shift that is driven by the society.
One of the shocking things that I discovered through my son is that from a very young age, I kept saying "do you want to be an astronomer? Do you want to go to the moon?" Always he always said, "no, I don't want to go." But he would like to go to work for Google, he would like to go to work for Facebook. We have a generation that is growing up for whom the traditional goals of going to the moon, of flying to faraway stars, don't exist anymore. That's not what excites them. What excites them is data, networks, social systems and all of these things that were really not part of the thinking. We don't have a goal. We don't have a computational social science department, at any university.
Much of the research that goes into computational social science, into network science, into big data, is done as piggybacking on existing statistics. Computer scientists, physicists, mathematicians, sociologist, they do it despite their own community. It's funded despite their own community. I think it will change. I think that the establishment will emerge, and departments and programs will be formed. But right now, much of my support is piggybacking on traditional disciplines. I cannot get a network science grant. I have to piggyback on lots of other things that we do, and sell it as physics, sell it as biology, sell it as many other things so that I can fit in the traditional funding system, in the traditional department system, in the university system.
Where would network theory be able to help genomics? Well, in many ways, genomics is coming off from a fantastic decade. When the human genome was mapped out, it was to get the data, it really created a burst of scientific discoveries. What it didn't do is what its original promise was: drugs. If you look at how many drugs were approved in 2001, the number of drugs the year the Genome Project came out was around 110 for the year. That was the number the FDA approved. In the last four years, that number has gone from 110 drugs to 20 per year. The expectation of the Genome Project was that new drugs would explode, the number of new drugs. Instead, it went down to a fifth or a sixth of the original number. So what happened?
A number of things have happened. Lots of resources have gone towards genomics that were going towards the traditional drug discovery process. We have to develop those things. We became much better at diagnostics and discovering early on that a drug doesn't work, so we don't throw it on the market, so that put on brakes. All the traditional tools have really had their run, and they're not effective anymore in how you would discover drugs. But most important, the value that we put into genomics has really not yet resulted in drugs. We're still far away. It just showed us how much more complicated the system is rather then giving us an answer.
I personally think that the reason why we're unable to turn this one into knowledge is because we ignore the network part. The typical car has about 5,000 components. If I would lay this out in front of you and say to you (who I assume has never actually built a car before) could you assemble it into a car? You would be hopeless. Most of the pieces you wouldn't even know what to do with. That's where we are. We created the parts thanks to the Genome Project, but we have no way of assembling them together. We don't know how they work together.
We need a massive project on the scale of the Genome Project to map out the network. It's a finite problem. At the end, either a certain molecule is going to interact with another, or not. There are technologies that haven't been properly scaled up and haven't been done in large enough scale, that allow us to map it out. But it has to be real, and there has to be an understanding that that's the next step or we're going to stay where we are right now, which is knowing more and more about the pieces and understanding less and less about the connections.
What happened with the Genome Project is they became the ultimate fulfillment of reductionism. We talk about jivus(?), we talk about mutations, we talk about little errors in our genome, and we talk about, at the same time, how little predictive power they have. Many of the genes that come out now as disease genes, and the mutations, they'll say is the one percent or a two percent over the baseline in the chance of getting the disease. Which means you had one in 100,000 chance to get the disease, and now I'm going to have one percent more. For all practical purposes, it's completely irrelevant. From a fundamental scientific perspective, its very exciting, but down the line, when it comes to our health, it's completely irrelevant.
I don't think there is a way of turning the promise of genomics into reality, and bypassing the networks at the same time. It's not going to happen. We're going to have to have the blueprint. I think that many people in the genomics community are realizing that, and they increasingly start to understand that the way, if you want to interpret the genomic data that is coming out—and we have now really an avalanche of that, we know a lot about the genomic level and deep sequencing and so on—we're going to have to invest in the next one.
We have an institutional problem, which part of NIH will actually fund the network part? We have a Genomics Institute, but we don't have a Networks Institute within NIH. We don't have a home for that. It's not lung, it's not blood, it's not cancer, and it's not neurological diseases. Each of that has their home within NIH. It's all the above. It's relevant for everybody. It's not genomics, either, because it's beyond genomics. Until there will be a strong enough lobby to realize that, we're not going to get it. Every day that we're not getting that, will delay the promise of the Genome Project to turn into a reality.
I don't think that people have internalized that this is really the major problem. Let's not forget that the Genome Project started in the 1980s, and really didn't ratchet up in the 1990s; it took about two decades to get into the factory level. Only at the tail end, people started to talk about genomics, when the machines were running and the sequencing was happening. In the profession, people understand that. The limitation is that knowing more and more about the pieces is not going to be enough. But it's a train that is running very well, and it's beautiful what they're doing in genomics, there's no question about that. But those things need to be understood. It's not that the rationale of the genomics is nonsense, or that those measurements shouldn't be made, those technologies shouldn't be developed. Those are very important. What I'm saying is it's not enough, and that not enough hasn't reached the awareness of the community.
If Obama would listen, I would say disclose financial data. I would not talk about genomics. I would say actually for several years before the financial crisis took place, several of my colleagues were going around and said you guys have no idea how much dependency there is between the banks and it's all secret. If something would happen here, everything will go down like a domino. I was listening to them, but not taking it too seriously, not knowing much about it. And then, of course, we saw it unfolding. All these secret bills, all these secret dependencies, all these loans and things that we don't know anything about. If we ever want to turn the financial system into a predictable, monitorable system, there has to be massive disclosure. And that is not happening, even now after the financial crisis. We're poised for another one and another one and another one, because there's nobody who can come along and say this is a problem because we don't know what the problem is, because we don't have the data. We can't model it, we can't understand it, we can't describe it.
If we say, "the Genome Project was great, but we're not going to start with the network within the genes, because it will hurt somebody's interest. It's secret, those interactions" then we're never going to have a cancer drug, if that's what we're going to be around to do. That's where we are in the financial system.
If we really want to talk about the genomics and biology, then I think we need to think about the fact that we have to aggressively start looking at interactions between the cells. We need to step back and say why is it that we have only 20 drugs and not 110 or 500 on the market, as we thought we'd have as a consequence of the Genome Project? What is it we can really do to accelerate this process? Where are the scientific limitations and where are the technological limitations? If you start asking the questions those ways, you will sooner or later realize that the next step is we need to do maps.
Whether it's economic problems, whether it's a cellular problem, the problem is the same; we don't have the maps. We don't know who is connected to each other. We can't make predictions, we can't use the knowledge that we have about the components to really predict things and turn them into drugs or financial instruments.
The first time I became familiar with the term 'computational social science,' was through David Lazer, who was at that time at the Kennedy School of Government. He still has a position there, but he's now at Northeastern as well. He's a political scientist who had been thinking about networks and data for a long time (what we call today big data). He was the one who started shepherding many of us in the community to start using the term 'computational social science' and to start taking that as a new field being born. There were lots of people who were playing with data before. We'd been analyzing mobile phone data for years, both on the social network aspects, as well as the timing of the event aspect, as well as mobility aspects.
Other people like Nicholas Christakis were looking at the social networks and diseases, how they emerge. David Lazer himself was looking at political systems and the connectedness of political systems, what he called network governance, how the network decision-making is really affecting our work on that. Ricardo Hausmann was looking at import/export data together with Cesar Hidalgo, and perceiving it as a network and interconnected system. There were lots of people, and Derek Brockmann looked at the dollar bills, how they fly between the countries and tried to infer information about human mobility.
It was David Lazer who said we're part of the same community, we're thinking appropriately about unrelated problems, but it's all computational social science. Computational social science is three words, and there are different emphases, depending on who comes at it. Obviously myself, a physicist, I look at the computational and the data aspect of the problem. But David, for example, looks at the political science consequences of the problem. There are sociologists like Brian Uzzi, for example, who look at the social aspect of the problem and come from the social science, and think through that.
Depending on whom you talk to, they all have a different take, but the calls are very similar. There is lots of data collected about humans, pertaining to different aspects of human behavior, whether it's a political aspect, whether it's a mobility aspect, whether it's a social communication aspect. But it's all about how you describe that. For that, you're not going to have one discipline look at all of that, and you're not going to have one set of tools that will be successful. That's the value of computational social science; we try to find room for many different perspectives.
I believe that for a field to be successful, you have to have an ecosystem there, an ecosystem of very different perspectives because if you have a one-line type of thinking, any problem in the field could destroy it. But if you have lots of different people who look at different aspects of the problem, with different expertise and have a different set of goals that they're trying to get out from the project, then the field is much healthier, much more innovative. It progresses in a better way.
That's where we are with the computational social science. We don't have any department. We don't have a journal. We have lots of people who think alone. Even the name varies. Much of the research done under the buzzword 'big data' that emerged in the last few months is down the line computational social science. Many of the things that Google does to try to find out what ads you want to look at is computational social science. Increasingly, transportation engineers, when they try to think about the modern cities, and they use mobile phone data, they really do computational social science. This diversity is the key. There is no textbook; there is no canon of knowledge. What are emerging right now are lots of pieces of knowledge that are really novel. They're somewhat universal; they pertain to many aspects of human behavior. They are clumped even, that's very important. Many aspects are very predictive as well.
It's amazing, the phone, the Internet, whatever device you use, is just a reflection of your needs, your communication needs and your behavior. Just because we have Internet, just because I have a mobile phone, it doesn't fundamentally change my patterns. I still go to the cafeterias, I still have to sleep, I still have to wake up, I still have to go to work, and so on. The basic patterns of human behavior haven't fundamentally changed thanks to these digital devices. Yes, we don't have to be sitting in the office all the time; we can do some of the work in different places as well. But the fundamental patterns of what we do are the same 100 years ago and today. It's still a 24-hour cycle. The basic needs still need to be satisfied, of food and rest. You still need to get from your home to your workplace.
What happens right now is that both temporally and spatially, our behavior is very constrained. You can't decide to go to the bank at two o'clock in the night because there won't be bank tellers there. You're not even calling your friend at two o'clock in the night because you're considerate. You want to be nice, and you'll wait until the morning. Lots of things haven't really changed. What happened is that we got lots of devices that track what we do. Before we couldn't collect this data; now we leave a trail of information around.
The scientific challenge is that what I say is not being recorded. So how can I extract from the data that is being recorded what I want to say? What do I do? How do I characterize the human from all this data? By the way, what I say is really changing as well. In a way, I think what I say and the information about it is really becoming public as well. Much of the technologies that are being developed now, if you look at the sequence of technologies that we use, are going towards more and more honesty. I think if you want to project into the future, we should assume that there will be nothing private in the future.
Think about it, what happened in the online world? There was MySpace, where you were completely anonymous, maybe you disclose to your friends who you are, but you live in an anonymous world. That was really washed away by Facebook. What did Facebook do that MySpace didn't do? It made it non-anonymous. You had to accept your name. You were not hiding behind some name. It forced you to say I'm Laszlo Barabasi, I'm John Brockman, whoever you are, and you had to acknowledge your identity. But you could keep your thoughts and your information private, sharing it only with your friends. Then came Twitter, and said the problem with Facebook is that you share it only with your friends. So why don't you share it with the world? Now we got Twitter, where all your thoughts that you're willing to share are broadcast all over the world. Anybody can listen in, who wants to listen.
If you think about that the more disclosure, the less anonymity, the more successful it is, and that's where the trend is. So yes, right now we don't know what you say on the phone. I don't know what you send in your email. All I know is that you talk on the phone, all I know is that you check your Facebook page, all I know is where you are. But imagine a future where you will choose that none of that will be private. If you look at the sequence of events in the last ten years, that is where we're headed.
Every data collection is skewed. It's skewed to whom you can actually ask to fill out the questionnaire, to who is next to you working, who you can reach? It's skewed to who is using the technology, who has a cell phone? But it's also becoming universal. Look at Africa, look at Eastern Europe. They don't put down phone lines anymore. Everybody has a mobile phone; it's cheaper so it's more accessible. Everybody's using Internet. Of course, there is still a population who hasn't reached it, but it's going to reach them. It's relatively affordable. It will get there. Until then, we need to be aware of that. We actually have to deal with that in a funny way. For example, we look at the mobile phone data on many continents. If you look in Africa, the information you get about one particular mobile phone subscriber is not reliable at all, because if you are in a big city, you are like everywhere else in Western Europe, but if you're in a small village and you have a mobile phone, you are like the phone booth, essentially, for the village. I'm not getting any more individual information; I'm getting the whole village's information. So yes, you have to be aware of that, you have to be familiar with that. You need to understand the benefit and the technologies.
Putting all this aside, we had been looking at mobile data now from Africa, from Asia, from Europe, from the United States. What is the most remarkable thing is that there are hardly any differences when it comes to the basic patterns of people, how they use the phone, and how they behave, and how they move around. I thought there would be huge differences.
Several people in my lab actually started a research project; they started a comparative study between the different continents. They thought we were going to find differences. They ended up writing a paper, because there weren't any differences to note. It was all the same across the world when it came to mobility patterns: how people talk, how often they talk, how the network applies. They were the same around the world, whether you had money or not, whether you're rich or poor, whether you are in Africa or the United States, or in big city or small city. There are a huge number of similarities and its very hard to piece away the differences. It's becoming the challenge to find the differences between the people. You just see the similarities.
Not everybody has access to technology and that will always be the case. There will be new technology that not everybody will have access to. We're always going to live in the society where there is a leading pack and there are people behind it, and that's been always the case, and that's not often a question of money, it's a question of attitude, it's a question of need or a question of opportunity. But down the line, we need to be able to learn to handle that.
Data is the gold mine for science these days. With that, also the attitude about how we handle data is changing. We live in this unstable situation that the data access has not been properly worked out because there are legal limitations of what you can get. Most companies don't care about what they have, or to use their data for scientific purposes. That will have to end in one way or the other. It's not clear to me how that will happen. It's not clear to me that ever the U.S. government will force Google or Facebook to actually share data with scientists. Even though they've kind of done that in the case of Medicare and other places, where similarly sensitive data have been. I think we, in the scientific community, have to resolve that, to find a way to share. This is something that I struggle with from day-to-day. We have very rich mobile data that we cannot technically send anywhere. We can't send it to you because if you were in the data set, you wouldn't want your data sent anywhere.
My interest would be to share it very widely. My interest would be I want you and everybody to work on that data set, because I want all the help I can get to understand that data better, because it will help me in my research. Legally we're not allowed to so we figure out all these ways of sharing, we let people in the lab and we let them work in our lab in the same conditions as we do. But down the line, we have to figure out a way to change that because at the end, this data doesn't only have a huge amount of financial value, which is why Google and Facebook protect their data. But it has a huge amount of societal value.
Do you want to stop the flow? Do you want to stop different transmitted diseases? Do you want to design better cities? Do you want to stop traffic jams? The data to do so is there in private hands, and we need to identify some social consensus by which the data can be shared with the different stakeholders who can take advantage of that.