Think you’re discreet online? Think again

People concerned about privacy often try to be “careful” online. They stay off social media, or if they’re on it, they post cautiously.

Sat, 27 Apr, 2019 - 01:00

Zeynep Tufekci

People concerned about privacy often try to be “careful” online. They stay off social media, or if they’re on it, they post cautiously.

They don’t share information about their religious beliefs, personal life, health status, or political views. By doing so, they think they are protecting their privacy.

But they are wrong. Because of technological advances and the sheer amount of data now available about billions of other people, discretion no longer suffices to protect your privacy. Computer algorithms and network analyses can now infer, with a sufficiently high degree of accuracy, a wide range of things about you that you may have never disclosed, including your moods, your political beliefs, your sexual orientation, and your health.

There is no longer such a thing as individually “opting out” of our privacy-compromised world.

The basic idea of data inference is not new. Magazine subscriber lists have long been purchased by retailers, charities, and politicians because they provide useful hints about people’s views. A subscriber to The Wall Street Journal is more likely to be a Republican voter than is a subscriber to The Nation, and so on.

But today’s technology works at a far higher level. Consider an example involving Facebook. In 2017, the newspaper The Australian published an article, based on a leaked document from Facebook, revealing that the company had told advertisers it could predict when younger users, including teenagers, were feeling “insecure”, “worthless”, or otherwise in need of a “confidence boost”.

Facebook was apparently able to draw these inferences by monitoring photos, posts, and other social media data.

Facebook denied letting advertisers target people based on those characteristics, but it’s almost certainly true that it has that capacity. Indeed, academic researchers demonstrated last year that they were able to predict depression in Facebook users by analysing their social media data — and they had access to far less data than Facebook does.

Even if Facebook does not now market its ability to glean your present or future mental health from your social media activity, the fact that it (and any number of other, less visible actors) can do this should worry you.

It is worth stressing that today’s computational inference does not merely check to see if Facebook users posted phrases like “I’m depressed” or “I feel terrible”.

The technology is more sophisticated than that: Machine-learning algorithms are fed huge amounts of data, and the computer program itself categorises who is more likely to become depressed.

Consider another example. In 2017, academic researchers, armed with data from more than 40,000 Instagram photos, used machine-learning tools to accurately identify signs of depression in a group of 166 Instagram users. Their computer models turned out to be better predictors of depression than humans who were asked to rate whether photos were happy or sad and so forth.

Used for honourable purposes, computational inference can be a wonderful thing. Predicting depression before the onset of clinical symptoms would be a boon for public health, which is why academics are researching these tools; they dream of early screening and prevention.

But these tools are worrisome, too. Few people posting photos on Instagram are aware that they may be revealing their mental health status to anyone with the right computational power.

Computational inference can also be a tool of social control. The Chinese government, having gathered biometric data on its citizens, is trying to use big data and artificial intelligence to single out “threats” to Communist rule, including the country’s Uighurs, a mostly Muslim ethnic group.

Such tools are being marketed for use in hiring employees, for detecting shoppers’ moods and predicting criminal behavior.

Unless they are properly regulated, in the near future we could be hired, fired, granted or denied insurance, accepted to or rejected from college, rented housing and extended or denied credit based on facts that are inferred about us.

This is worrisome enough when it involves correct inferences. But because computational inference is a statistical technique, it also often gets things wrong — and it is hard, and perhaps impossible, to pinpoint the source of the error, for these algorithms offer little to no insights into how they operate. What happens when someone is denied a job on the basis of an inference that we aren’t even sure is correct?

Another troubling example of inference involves your phone number. It is increasingly an identifier that works like a social security number — it is unique to you. Even if you have stayed off Facebook and other social media, your phone number is almost certainly in many other people’s contact lists on their phones. If they use Facebook (or Instagram or WhatsApp), they have been prompted to upload their contacts to help find their “friends”, which many people do.

Once your number surfaces in a few uploads, Facebook can place you in a social network, which helps it infer things about you since we tend to resemble the people in our social set.

(Facebook even keeps “shadow” profiles of nonusers and deploys “tracking pixels” situated all over the web — not just on Facebook — that transmit information about your behaviour to the company.)

Last year, an investigation led by US senator Ron Wyden, a Democrat from Oregon, revealed that mobile phone networks Verizon, T-Mobile, Sprint, and AT&T were selling people’s real-time location data.

An investigative report last year by The New York Times also showed that weather apps including the Weather Channel, AccuWeather, and WeatherBug were selling their users’ location data. This kind of data isn’t useful just for tracking you but also for inferring things about you. What were you doing at a cancer clinic? Why were you leaving the house of a woman who is not your wife at 5am?

Journalist Kashmir Hill has reported on cases in which Facebook suggested to a psychiatrist’s patients that they were potential “Facebook friends”, suggested that people “friend” the person with whom their spouse was having an affair and outed prostitutes’ real identities to their clients. We don’t want corporations (or governments) to make such connections, let alone exploit this to “grow” their platform.

What is to be done? Designing phones and other devices to be more privacy-protected would be a start, and government regulation of the collection and flow of data would help slow things down. But this is not the complete solution. We also need to start passing laws that directly regulate the use of computational inference: What will we allow to be inferred, and under what conditions, and subject to what kinds of accountability, disclosure, controls and penalties for misuse?

Until we have good answers to these questions, you can expect others to continue to know more and more about you — no matter how discreet you may have been.