Computational Social Science
How can we avoid another cascading failure of the stock market? Why has the United States become so politically polarized? Where is Ebola likely to strike next? Why do some terrorist organizations use suicide bombers, while others do not?
Answers may well be found through the new field of computational social science.
Growing use of the Internet and social media in the past decade has led to an explosion in the amount of social and behavioral data available to researchers. This in turn has created huge opportunities for social scientists to study human behavior and social interaction in unprecedented detail. Leveraging these opportunities requires collaborative, interdisciplinary efforts involving computer and information scientists, physicists, and mathematicians who know how to build the "telescope" and economists, political scientists, and sociologists who know where to aim it.
Computational social science exists at the intersection of these varied disciplines, offering a wide range of tools and research methodologies that were previously not available to social and behavioral scientists. "The kind of analysis we're doing in computational social science used to require punch cards and giant machines which couldn't do a fraction of what we can do now," notes Peter Enns, associate professor of government.
Robert Braun, a doctoral student in the field of government, has harnessed these new technologies to investigate suicide bombers and resistance to genocide, among other topics.
"For the past 50 years the social sciences have relied heavily on surveys for data about human behavior, surveys typically done with at most a few thousand participants, none of whom know each other," says Michael Macy, Goldwin Smith Professor of Arts and Sciences in Sociology and Information Science, Director of the Social Dynamics Laboratory. "These independent observations are not very useful for studying social influences on behavior. Now we have global data from social media like Twitter and other social media platforms, that allow us for the first time to study social influences on human behavior and interpersonal interaction at population scale. And that's pretty exciting."
Cornell at forefront of computational social science
Cornell University is uniquely suited because of its interdisciplinary character to grasp the opportunities afforded by these new challenges in social sciences and has been at the forefront of developing the new field of computational social science.
"The interdisciplinary cooperation necessary to successfully conduct research in this field is a hallmark of Cornell and is a major reason for Cornell's leadership in the field," said Gretchen Ritter, the Harold Tanner Dean of Arts & Sciences.
Cornell's collaborative nature gives information and computer scientists and applied mathematicians the theoretical landscape to discover interesting and important questions to which their tools can be put to work.
While a great deal of collaboration is ongoing with economics and sociology, computer scientists are looking forward to more opportunities to interact with researchers in the fields of anthropology, psychology, law and business, says Jon Kleinberg '93, chair and professor of information science and professor of computer science.
David Mimno, assistant professor of information science, embodies the interdisciplinary nature of the field. He works at the intersection of computing and the humanities, using technology to advance scholarship. He teaches a course called Text Mining for History and Literature, and he's exploring ways to take advantage of large non-numerical data sets like haiku, novels, and runaway slave ads using algorithmic approaches to data.
Mimno offers a novel definition of digital humanities scholars: "We are basically ethnographers," he says, "studying humanists and computer scientists and seeing how they can relate and use each other's tools."
Assistant Professor of Information Science focuses on machine learning, text mining and digital humanities.
Pioneers across campus
Cornell's Information Science (IS) Department, one of the first of its kind, celebrated its tenth anniversary last year. IS was founded as an interdisciplinary effort between computer science and social science and unites faculty from the College of Arts & Sciences, the Faculty of Computer Science and the College of Agriculture and Life Sciences, to examine information systems in their social, cultural, economic, historical, legal, and political contexts. This department forms the heart of collaborative efforts between social and computational scientists at Cornell.
"An exciting thing that happened at Cornell is that we managed to find each other across a range of disciplines and start working together, which unlocked the latent potential energy in this area," says Kleinberg.
Kleinberg was one of Cornell's early computational science pioneers. While a student at Cornell he studied with leaders in the field, Daniel Huttenlocher, founding Dean and Vice Provost of Cornell Tech, and Éva Tardos, Jacob Gould Schurman Professor of Computer Science; he now works with them on collaborative projects. Other early pioneers at Cornell include Duncan Watts , Ph.D. '97 and Steven Strogatz, Jacob Gould Schurman Professor of Applied Mathematics, who researched networks together; David Easley, Henry Scarborough Professor of Social Science and professor of information science, and Lawrence Blume, Goldwin Smith Professor of Economics, who used computational ideas in their economics research; Macy, an early adopter of computational techniques in social sciences; and Geri Gay, Chair and Kenneth J. Bissett Professor of Communication, who uses insights from social science research as the basis for software interface design.
Many of the early participants in the Information Science Department were involved in a three-year project sponsored by Cornell's Institute for Social Sciences (ISS) in 2005, "Getting Connected: Social Science in the Age of Networks." The project sought to tap the expertise, tools and skills of network analysts across the university, from computer scientists archiving the Web to social psychologists studying adolescent behavior. The project's aim, to build a culture for collaborative research across disciplines and colleges, succeeded beyond what anyone might have expected, says Kleinberg, and has had a lasting impact on social science at Cornell.
"ISS gave us this space to spend time thinking how to develop and define an academic discipline at the increasingly complex boundary where human audiences intersect with computing technology," says Kleinberg. "The ISS project let us stretch and think about things you couldn't create if not for this special time."
Many of the professors involved in that network project have continued as leaders in the field of computational social sciences. Among them is Macy, who was the project leader for the ISS network project and is currently director of the Social Dynamics Laboratory. His work uses big data such as the telephone logs of nearly every call in the UK and hundreds of millions of Twitter users. "We are really only at the beginning of exploring what these big data can do for researchers," says Macy.
Early Pioneers and Key Collaborators
Goldwin Smith Professor of Arts and Sciences in Sociology and Information Science, Director of the Social Dynamics Laboratory
Teaching Computational Social Sciences
The ISS project inspired Kleinberg and Easley to teach undergraduates the ideas they'd been exploring. They designed a class that would address the science behind networks, and how the old concept of networks is applied in the modern world. Kleinberg and Easley solicited suggestions from Macy and David Strang, professor of sociology, about how a sociology perspective could also be included.
Instead of the expected 50 or so students when the course was first offered in spring 2007, 200 students, from six of the seven colleges, registered. The class, now at 700, has a waiting list. Kleinberg and Easley turned the material they developed for the course into a now widely-used textbook, "Networks, Crowds, and Markets: Reasoning About a Highly Connected World" (Cambridge University Press, 2010), as well as a MOOC that they co-teach with Tardos. The book can be downloaded for free.
The strong student interest in the networking class is not surprising, given the powerful impact that computational social science has had and will have on real-world problems. For example, researchers at Cornell's Center for the Interface of Networks, Computation and Economics are investigating networks and cascading failures, research that applies to financial contagion and political polarization, as well as to epidemic disease.
"Because of the Internet, we have data that we never had before," says Kleinberg. "It's the new telescope/microscope -- a revolution in measurement." And on the flip side, deep ideas from social sciences are informing the way we're designing websites, on-line platforms and interfaces, like Facebook, YouTube, and Twitter. Social systems have to address questions like how to build trust? How to ensure that the Internet is robust against people operating in their own self-interest? These questions demonstrate the interface between computational ideas and economic ideas, the intersection of theory of computing and theory of markets," says Kleinberg.
"Because of the Internet, we have data that we never had before. It's the new telescope/microscope—a revolution in measurement."
-Jon Kleinberg '93 Chair and Professor of Information Science and Professor of Computer Science
Enns says he has noted a growing interest among undergraduate and graduate students in learning the skills to work with big data. "They recognize that the job market values and rewards those with these skills," he says.
Networks, Crowds and Markets
Explore the critical questions posed by how the social, economic, and technological realms of the modern world interconnect.
The Roper Center
The Roper Center, the world's largest public opinion archive, recently moved to Cornell, another example of the university's on-going commitment to big data and computational social sciences. The center has 22,000 public opinion polls and is constantly getting new data. "Bringing the Roper Center to Cornell opens up opportunities to analyze this big data in new ways," says Enns, the first executive director of the Roper Center at Cornell.
Research Collaborations Continue
In September 2015, Cornell hosted a conference in Clark Hall showcasing cutting-edge research in the field of computational social science, featuring alumni and other noted scholars. The conference coincided with the opening of a joint search for a position in computational social science by the College of Arts & Sciences and the Faculty of Computing and Information Science.
Greg Morrisett, dean of the Faculty of Computing and Information Science, said computational social science has far-reaching applications in the fields of security, privacy, sociology and government.
Watts, now a principal researcher and founding member of Microsoft Research Lab in Manhattan and a Cornell A.D. White Professor-at-Large, noted that social phenomena, which arise when individuals interact to produce collective behavior, are hard to study empirically.
"It's hard to do science when you can't measure what you care about and you can't do experiments," said Watts. "The web is lifting these historic barriers. There's a dramatic increase in scale, scope and granularity of data. We can examine data at the individual level with emails, e-commerce, social media, etc." Web platforms, he said, are another revolution, increasing the speed and scale of experiments.
"This conference is really a celebration of a transformative moment in social science, not just because of technology but by the vision of what that technology can do," said Watts. "CSS research is motivated by old questions that bring new capabilities to bear."
"This conference is really a celebration of a transformative moment in social science, not just because of technology but by the vision of what that technology can do. CSS research is motivated by old questions that bring new capabilities to bear."
Duncan Watts, Ph.D. '97
Sandy Pentland, professor at the Massachusetts Institute of Technology and recently named by Forbes as one of the seven most important computer scientists in the world, told conference attendees that enormous data sets and computerized monitoring allow a level of accuracy unattainable by traditional survey methods.
Pentland also pointed out that social data can be used to measure U.N. Millennial Goals such as fixed migration, genocide, propagation of infectious diseases can all be measured. "Change happens when things are measured – you can mobilize resources more quickly and rigorously."
Other speakers included alumni Sendhil Mullainathan '93, professor of economics at Harvard University, and Lars Backstrom '04, MS '08, Ph.D. ‘09, head of News Feed for Facebook, and industry representatives Lada Adamic, head of Facebook's Product Science Group and David Pennock, Microsoft Research's principal research and assistant managing director.
As conference participants highlighted, the tools offered by computational social science are making important contributions to critical social, economic, and political issues today. Cornell’s researchers actively engage with real-world problems, reaching across the disciplines in multi-college collaborations.
Current Research Across the Social Sciences
Paying It Forward
Sociologist Michael Macy studies dynamics of human behavior through on-line data.
Digging into Data
Linguist Mats Rooth harvests web data to study spoken language.
Following the Money
Political scientist Peter Enns uses big data to find hidden connections in Congress.
Connecting with Networks
Economist David Easley collaborates across campus to uncover network complexities.
Paying It Forward
Michael Macy, Goldwin Smith Professor of Arts and Sciences in the Departments of Sociology and Information Science, is director of the Social Dynamics Laboratory (SDL), which studies the interplay between network topology (the arrangement of social ties) and the dynamics of social interaction, using computational models, data from on-line networks and experiments with human participants conducted in online "virtual labs."
The SDL includes a large group of students from many different fields and colleges, including sociology, economics, applied math, information science, computer science and city and regional planning. Their projects are interdisciplinary and team-based, sharing in common the use of data for on-line networks and the use of computer simulation.
"The graduate students in the lab are extremely sophisticated technically and highly skilled, and it is hard to tell which students are from sociology and which are from information science , says Macy.
SDL researchers recently published a paper on "Why Liberals Drink Lattes" that examines how our political views and lifestyle preferences are affected by the people with whom we interact, people with whom we are thrown together and who tend to be like us. Thus, says Macy, it is critical to study human behavior and social interaction from a relational perspective, which takes into account social influence.
"You ignore social influences and contagion at your peril, says Macy, who notes that in analyzing 40 years of classical general social surveys they found that lifestyle preferences almost all correlated with one or more demographic variables. "But what if these correlations are largely accidents of self-reinforcing dynamics of self-selection and social influence? he asks.
Other SDL projects address the dynamics of "pay it forward" in which people help a stranger, and whether it is possible to predict whether something on Twitter will go viral.
Another project, timely for this election season, is one that examines followers of the U.S. Congress to study political and cultural polarization. "Five million people follow members of Congress and they’re co-following other things, explains Macy. "So if you find correlations between lifestyle preferences (such as American football vs. soccer, or bands people follow), we can look to see if political preferences and other cultural preferences also divide politically.
Digging into Data
Mats Rooth, professor of linguistics and information Science and director of the Computational Linguistics Lab, does research in computational linguistics and natural language semantics. In 2010, he was one of the winners of an international competition, Digging into Data, that challenged scholars to devise innovative humanities and social science research projects using large-scale data analysis. His project, Harvesting Speech Datasets for Linguistic Research on the Web, looked at distinctions of prosody (rhythm, stress and intonation) in spoken language. He used software to search for word patterns in text transcriptions of audio and video files.
Using the Internet to harvest hundreds or thousands of examples of spontaneous, rather than lab-created, use of word patterns enables researchers to evaluate theories about the form and meaning of prosody on an unprecedented scale; such research has had a transformative effect on the understanding of prosody, says Rooth.
Following the Money
Peter Enns, associate professor of government, focuses on public opinion, representation and quantitative research methods. "I identify as a political scientist who works with data, which increasingly means bigger and bigger data sets, he says. Computer skills broaden our understanding and bring in connections that wouldn’t have been available before, he adds.
He’s working on a joint project with scholars at the University of Tennessee and the University of South Carolina to analyze the Congressional Record. More recent Congressional Record data is already digitized, but earlier records are in the form of PDFs, comprising almost a terabyte of data.
"In order to analyze what members of Congress are saying, we have to write code that extracts the text out of these PDFs to generate a new data set, explains Enns. With help from graduate students in the field of computer science, the researchers will be able to link this new data with existing data, such as the demographic characteristics of members of Congress and their campaign donations. Then, researchers can study topics such as connections between campaign donation and what politicians choose to speak about.
Eventually, the new data set will be put in the public domain for other researchers to access.
Connecting with Networks
David Easley, Henry Scarborough Professor of Social Science in the Department of Economics and professor of information science, collaborates with faculty across the university on a wide range of topics.
He and Lawrence Blume, Goldwin Smith Professor of Economics and professor of information science, work together on learning and wealth dynamics. In finance, his work with Maureen O'Hara, Robert W. Purcell Professor of Management and professor of finance in Johnson, focuses on market microstructure and asset pricing. His work with colleagues in the computer science department, such as Jon Kleinberg, chair of the Department of Information Science and a professor in both computer science and information science, focuses on trading networks and network formation.
Easley and Arpita Ghosh, associate professor of information science, recently investigated the optimal design of rewards for online labor markets and online crowdsourcing platforms. One of their findings is that contests with a low probability of winning a large prize dominate other payment mechanisms with the same expected cost when individuals have common behavioral biases. Their research provides insights into why such contests are so widely used and into how platforms can best design rewards to incentivize user-generated content.