Current location - Recipe Complete Network - Dietary recipes - Big data will not treat different social groups favorably.
Big data will not treat different social groups favorably.

Big data will not treat different social groups favorably.

Advocates of big data want people to believe that behind the lines of code and huge databases there are insights about human behavior patterns. Objective, universal insights into consumer spending patterns, criminal or terrorist activity, health habits, or employee productivity. But many big data evangelists are unwilling to face up to its shortcomings. Numbers cannot speak for themselves, and data sets—no matter their size—remain the product of human design.

Big data tools—such as the Apache Hadoop software framework—cannot free people from misunderstandings, barriers, and false stereotypes. These factors become especially important when big data attempts to reflect the social world in which people live, but people often foolishly believe that these results are always more objective than human opinions. Bias and blind spots exist in big data just as they exist in individual feelings and experiences. But there is a questionable belief that bigger is always better, and that correlation is the same as causation.

For example, social media is a common source of information for big data analysis, and there is undoubtedly a lot of information that can be mined there. People were told that Twitter data showed people were happier when they were farther from home and were most depressed on Thursday nights. But there are many reasons to question the meaning of these data.

First of all, people learned from the Pew Research Center that only 16% of adults online in the United States use Twitter, so they are definitely not a representative sample-compared to the overall population, they There are more young and middle-aged people and urban residents.

In addition, many Twitter accounts are known to be automated programs known as "bots," fake accounts, or "cyborg" systems (i.e., human-controlled accounts assisted by bots). Recent estimates suggest there may be as many as 20 million fake accounts. So before anyone wants to wade into the methodological minefield of how to assess Twitter user sentiment, it’s important to ask whether these sentiments are coming from real people or from automated algorithmic systems.

“Big data will make our cities smarter and more efficient.” Yes, to a certain extent.

Big data can provide valuable insights that can help improve cities, but it can only help people so much. Because data are not all generated or collected equally, big data sets suffer from “signaling problems”—that is, certain people and communities are ignored or underrepresented, known as data dark zones or shadow areas. The use of big data in urban planning therefore relies heavily on municipal officials’ understanding of the data and its limitations.

For example, Boston's StreetBump app is a smart way to collect information at low cost. The program collects data from the smartphones of drivers who drive over potholes. More similar applications are emerging. But if cities start relying on information only from smartphone users, then those citizens are just a self-selected sample—it will inevitably lead to missing data from neighborhoods with fewer smartphone users, which typically include the older and less privileged. Wealthy citizens.

Although Boston’s new City Mechanical Office has made several efforts to remedy these potential data deficiencies, less responsible public officials may miss these remedies and end up with uneven results. data, thus further exacerbating existing social injustices. One only has to look back at the 2012 Google Flu Trends, which overestimated annual influenza incidence, to realize the impact that relying on flawed big data can have on public services and public policy.

The same situation exists for "open government" projects that disclose government department data online - such as the Data.gov website and the "White House Open Government Project". More data will not necessarily improve any function of government, including transparency and accountability, unless there are mechanisms that can keep the public and public institutions engaged, let alone improve the government's ability to interpret the data and respond with adequate resources. . None of this is easy. The truth is, there aren’t many highly skilled data scientists around. Universities are currently scrambling to define the profession, develop curricula and meet market demands.

“Big data does not treat different social groups favorably.” This is hardly the case.

Another expectation for the purported objectivity of big data is that discrimination against minority groups will be reduced, because the raw data is always free of social bias, which allows analysis to be performed on a large-scale level to avoid group-based discrimination. However, because big data can make inferences about how groups behave differently, they are often used for precisely this purpose - to place different individuals into different groups. For example, a recent paper alleged that scientists allowed their own racial biases to influence big data research on the genome.

Big data has the potential to be used to engage in price discrimination, raising serious civil rights concerns. This practice has historically been known as "redlining."

Recently, a big data study conducted by the University of Cambridge on 58,000 Facebook "like" annotations was used to predict users' extremely sensitive personal information, such as sexual orientation, race, religious and political opinions, personality traits, intelligence level, and happiness. , addictive drug use, parents’ marital status, age and gender, etc.

Reporter Tom Formski said of the study: "This type of easily accessible, highly sensitive information could be used by employers, landlords, government agencies, educational institutions and private organizations to discriminate against individuals. and punishment. And people have no recourse to fight.”

Finally, consider the implications for law enforcement. Police from Washington to New Castle County, Delaware, are turning to big data for "predictive policing" models, hoping to provide clues in solving cold cases and even help prevent future crimes.

However, focusing police efforts on specific "hot spots" identified by big data risks reinforcing police suspicion of disreputable social groups and institutionalizing differentiated enforcement. As one police chief wrote, although predictive policing algorithm systems do not take into account factors such as race and gender, without consideration of disparate impacts, the actual results of using such systems may "lead to a deterioration of police-community relations." , creating a public perception of a lack of judicial process, triggering accusations of racial discrimination, and putting the legitimacy of the police at risk."

"Big data is anonymous, so it does not invade our privacy." Very wrong.

Although many big data providers strive to eliminate individual identities from human-based data sets, the risk of re-identification remains high. Cell phone data may seem fairly anonymous, but a recent study of a data set of 1.5 million mobile phone users in Europe showed that just four reference factors were enough to individually identify 95% of people. The researchers noted that the uniqueness of the paths people take in cities makes personal privacy a "growing concern" given how much information can be inferred using large public datasets.

But the privacy issues of big data go far beyond conventional identity confirmation risks. Medical data currently being sold to analytics companies has the potential to be used to trace individuals’ identities. There's a lot of talk about personalized medicine, the hope that in the future drugs and other treatments could be tailored to individuals, as if they were made using the patient's own DNA.

This is a wonderful prospect in terms of improving the efficacy of medicine, but it essentially relies on the identification of individuals at the molecular and genetic level, which can lead to serious consequences if this information is misused or leaked. Big risk. Despite the rapid development of personal health data collection applications like RunKeeper and Nike+, in practice using big data to improve medical services is still more of an aspiration than a reality.

Highly personal big data sets will become a major target for hackers or leakers. WikiLeaks has been at the center of some of the worst big data breaches in recent years. As seen with the massive data breach in the UK offshore financial industry, the world's richest 1% are just as vulnerable to having their personal information exposed as everyone else.

"Big data is the future of science." Partly true, but it still needs some growth.

Big data provides new ways for science. One need look no further than the discovery of the Higgs boson, the product of the largest grid computing project in history. In this project, CERN uses the Hadoop distributed file system to manage all data. But unless people recognize and begin to address some of the inherent shortcomings of big data in reflecting human life, major public policy and business decisions may be made based on wrong stereotypes.

To solve this problem, data scientists are starting to collaborate with social scientists. Over time, this will mean finding new ways to combine big data strategies with small data research. This would go far beyond practices used in advertising or marketing, such as focus groups or A/B testing (i.e. showing users two versions of a design or results to determine which one performs better). Rather, the new hybrid approach will ask people why they do certain things, rather than just counting how often something happens. This means leveraging sociological analysis and ethnographic insights in addition to information retrieval and machine learning.

Technology companies have long realized that social scientists can help them gain a deeper understanding of how and why people relate to their products. For example, Xerox Corporation’s research center hired the pioneering anthropologist Lu West Suchman. The next phase will be to further enrich the collaboration between computer scientists, statisticians, and social scientists of many kinds—not only to test their own research results, but also to ask different kinds of questions more rigorously.

Given how much information is collected about people every day - including Facebook clicks, Global Positioning System (GPS) data, medical prescriptions and Netflix subscription queues - sooner or later people will have to decide to To whom the information is entrusted, and for what purpose it is used. There's no getting away from the fact that data is by no means neutral and it's difficult to remain anonymous.

But people can draw on expertise across diverse fields to better identify biases, flaws, and stereotypes, and to confront new challenges to privacy and justice.