Spooky predictions from data analysis

A series of articles about privacy and trust in the era of tech overlords.

•  Part 1: Data about you is being gathered by the big tech companies (as well as many other companies) in startling ways. Their ability to analyze that data and predict your behavior is more like magic than technology – and it has the potential to change the world, for better or worse.

•  Part 2: It is impossible for us to detach from the big tech companies or prevent that data from being collected.

•  TODAY: Although the scale of data-gathering is unprecedented, there is nothing new about big companies observing our behavior and it is not necessarily an invasion of privacy.

•  Some acts by the tech companies absolutely do invade our privacy. Facebook has abused our trust so often that it is a special case.

•  Our individual decisions about the big tech companies should be driven by trust and transparency. Well-considered regulation can help protect our privacy.


Some thoughts about big questions: What do they know? How do they know it? What can they do with it? Is it an invasion of privacy?


What do they know?

In 2012 – seven long years ago – the New York Times described what Target was tracking about its shoppers.

“For decades, Target has collected vast amounts of data on every person who regularly walks into one of its stores. Whenever possible, Target assigns each shopper a unique code — known internally as the Guest ID number — that keeps tabs on everything they buy. ‘If you use a credit card or a coupon, or fill out a survey, or mail in a refund, or call the customer help line, or open an e-mail we’ve sent you or visit our Web site, we’ll record it and link it to your Guest ID,’ (Target statistician Andrew Pole) said. ‘We want to know everything we can.’

“Also linked to your Guest ID is demographic information like your age, whether you are married and have kids, which part of town you live in, how long it takes you to drive to the store, your estimated salary, whether you’ve moved recently, what credit cards you carry in your wallet and what Web sites you visit. Target can buy data about your ethnicity, job history, the magazines you read, if you’ve ever declared bankruptcy or got divorced, the year you bought (or lost) your house, where you went to college, what kinds of topics you talk about online, whether you prefer certain brands of coffee, paper towels, cereal or applesauce, your political leanings, reading habits, charitable giving and the number of cars you own.”

In 2019, that kind of data is the easiest and most basic information to obtain; it’s the tip of the iceberg, the low-hanging fruit. Every company has access to that kind of information about you.

The policy debates today are driven by a change in the scale of data collection. The big tech companies scaled up because they discovered that they could tease out useful information about you from data that they had been throwing away. They found additional predictive ability when they started to add information about every online click, every pause to watch a video, the items that you add to the cart but do not purchase, the items you look at but don’t add to the cart (and how long you look at them, and what time of day it is, and where you are), your geographic location, your sleep patterns, your daily health, all the information that can be gleaned from analysis of photos stored or posted online, and literally every single other thing that you do when you interact with a technology device, which is virtually every move you make all day, every day. It’s your “digital exhaust,” the unseen data that doesn’t seem valuable even when you find out that it’s being collected, but which becomes part of the information store that lets the big tech companies predict your behavior with disturbing accuracy.


How do they know it?

Advertisers and politicians and big companies have long used whatever data they could get their hands on to determine what advertisements to show us, what mailers to send to our houses, what products to introduce, what political positions to promote in a campaign. This isn’t new and it hasn’t been very controversial until recently.

Back in 1971, Harvard Business Review published an article about the techniques available to forecast consumer demands and trends, which at the time included expert opinions, historical sales data, surveys, and a healthy dose of “human judgment.” Market research with surveys and questionnaires required expensive studies and lots of estimates. The focus was on larger trends – seasonal sales, say – instead of individual predictions.

As soon as computers made it possible, companies began to consolidate everything known about each customer into a single manageable view. Retailers installed computers that could track sales in granular detail and crunch the data more reliably, allowing them to target advertising down to the individual level. We volunteered to let our data be collected; for example, every time we used our grocery store card for a few extra discounts, we were agreeing that the store could track our purchases and analyze them, then use the data to customize the mailers sent to the house and the coupons printed at checkout.

Today there is a hidden data ecosystem that collects and sells our data on vast exchanges. Data brokers, advertisers, marketers, and others are actively profiting from the collection of data about you, selling packages of data to other advertisers, social media sites, credit agencies, insurers, law enforcement, and more. The data brokers are huge, valuable companies but you haven’t heard of them unless they’ve gotten into the news – Acxiom, Experian, Epsilon, Cambridge Analytica, DataLogix, and many more. Wikipedia summarizes the sources of information: “An information broker or data broker collects information about individuals from public records and private sources including census and change of address records, motor vehicle and driving records, user-contributed material to social networking sites, media and court reports, voter registration lists, consumer purchase histories, most-wanted lists and terrorist watch lists, bank card transaction records, health care authorities, and web browsing histories.”

In an interview, Shoshanna Zuboff, author of The Age Of Surveillance Capitalism, put it this way:

“By now it’s no longer restricted to individual companies or even to the internet sector. It has spread across a wide range of products, services, and economic sectors, including insurance, retail, healthcare, finance, entertainment, education, transportation, and more, birthing whole new ecosystems of suppliers, producers, customers, market-makers, and market players. Nearly every product or service that begins with the word “smart” or “personalised”, every internet-enabled device, every “digital assistant”, is simply a supply-chain interface for the unobstructed flow of behavioural data on its way to predicting our futures in a surveillance economy.”


What can they do with it?

What can they do with it? They can predict what we’ll do next – and as they refine their methods we won’t think it’s creepy, even though they know us better than we know ourselves, because we won’t see the manipulation. And with that knowledge they are able to influence our behavior, to make us buy a product or vote for a candidate and never realize that our “choice” was not entirely our own.

We are creatures of habit.

More than half of the decisions we make every day are shaped by habit rather than conscious decision-making. The science of habit formation is a major field of research at medical centers and universities. Companies have devoted huge resources to discovering how our habits can be influenced and changed by small signals at the right time and place.

We are stereotypes.

You are a special snowflake and there is no one like you. I want you to believe that. It’s important for your self-esteem. But, just between you and me, it’s not really true. When you analyze enough data about enough people with smart AI algorithms, similarities turn up that can be manipulated and influenced.

Back in 2012, it was startling to read the New York Times story describing how far retailers had come in analyzing our individual data:

“Almost every major retailer, from grocery chains to investment banks to the U.S. Postal Service, has a ‘predictive analytics’ department devoted to understanding not just consumers’ shopping habits but also their personal habits, so as to more efficiently market to them. ‘We’re living through a golden age of behavioral research,’ says Eric Siegel, a consultant and the chairman of a conference called Predictive Analytics World. ‘It’s amazing how much we can figure out about how people think now.’”

Today Google, Facebook, and Amazon are collecting data about literally billions of people. The companies are using AI and exhaustive feedback to discover the cues that might change your behavior or cause you to change a habit.

Amazon can run tests on hundreds of thousands of people just like you, experimenting with different product pictures or slightly different prices, comparing the results and adjusting the algorithms to tweak sales. It has adjusted search results and used its knowledge of your shopping habits to move sales to its own private-label products. (Did you know Amazon has more than a hundred private label brands for everything from kids’ clothes to dog food? Wildly successful, not because of lower prices but because Amazon has the data to know when to promote them with one-day sales or a “best seller” logo or many other cues.)

Google has been called the “database of intentions,” because search queries provide insights into people’s interests, intentions, and future actions. Google’s chief economist dryly adds, “Needless to say, such insights can be very useful to businesses.”

Facebook has outlined a new AI-powered advertising service that offers advertisers the ability to target consumers based on predictions of how they will behave, what they will buy, and what they will think in the future, instead of just demographics and past consumer preferences. “Facebook explains how it can comb through its entire user base of over 2 billion individuals and produce millions of people who are ‘at risk’ of jumping ship from one brand to a competitor. These individuals could then be targeted aggressively with advertising that could pre-empt and change their decision entirely — something Facebook calls ‘improved marketing efficiency.’”

Forbes put it this way: “Instead of relying on generalized observations meant to apply to as many people as possible, marketers can build their own deep-learning platforms or rely on tools that will allow them to personalize marketing approaches in order to best appeal to all the individuals, no matter how diverse, that will buy their products. Instead of choosing a specific demographic (males, 18-49, with income over $50,000), they can just choose ‘People who will buy my product.’”

In the last twenty years, the scale of data collection has changed in a way that is literally incomprehensible, and AI/machine learning techniques now permit us to be modeled in ways that cause reactions ranging from “Magical!” to “This is the death of civilization,” with stops along the way at “creepy” and “scary.” One of the underlying questions in the current debates about privacy is whether the change in scale requires a completely new evaluation of our individual rights or if it’s just a supersized version of behavior that has existed for decades.

Big tech companies know far more about us than any company has known about individuals in history. Is this simply a natural result of evolving technology, a bit messy and uncomfortable but required as part of setting up a global communications network – the Internet – and getting personalized services? Or is it an existential threat that displaces traditional capitalism and is so likely to lead to evil manipulation that we have to rein it in right now?


Is it an invasion of privacy?

Imagine that each company compiles its data independently. Google gets data by running Google Search, Google Maps, and Google Photos. Facebook gets data when we use the Facebook app or visit the website to post photos or follow friends. Further, imagine that each company keeps that information for itself, and uses it to improve its services.

I don’t think that’s an invasion of privacy. Sure, we’re surprised by how much smart machines can learn about us from that data – we find it spooky and creepy – but it’s not necessarily an invasion of privacy if Google knows a lot about us and it doesn’t tell anybody else. I made that argument here.

That’s not a very big part of the information economy any more. Your personal data is big business. Data brokers and the large companies are buying and selling your data freely, operating in the shadows with no oversight. The big tech companies are getting better at analyzing your digital exhaust to predict your behavior. The collection of excess data was done without any notice to us, and we still do not fully understand the consequences.

And unfortunately, much of what is done invades your privacy by any definition. We’ll talk about that, and what policies might be called for, in the next article.



Some of the data being exchanged without our knowledge crosses the line into abusive invasions of privacy. It might be a stretch to expect anything from our lawmakers in the US, but at least we can talk about some of the regulations that might be appropriate.

Share This