Big Data’s Dark Side: Our Personal Information is Not Personal
The news about data breaches this holiday season at, among others, Target and Neiman Marcus sent chills up and down our collective spines. Perhaps 110 million people — more than a third of the US population — may have had their credit card, debit card, and PIN numbers, along with names and perhaps other (yet unspecified) personal data, swept into a criminal vortex just through Target alone. Here’s a summary on how this was done from PC World.
It is not simply theft of credit card numbers, but theft of other personal information that is a fresh concern. The sad fact is that more types of data could be stolen. Transactional data, for example — everything you buy — is kept by most companies, and collected by giant data firms, each with hundreds if not thousands of major clients. Lifestyle and survey data are also collected — what do you do, with whom, and what are your views? There’s also mountains of (free) demographic data out there — population, employment, racial, ethnic data, and more. The more data collected, the higher the likelihood that more of our personal information will be out in the world. Let’s not forget all the social media data we generate, all the calls and texts.
It is truly astounding.
Let’s step back for a moment, and look at “small data” and how we got to big data. Until the late 1990s, collected data could just mean simple lists of names and addresses or orders and dollars. What made data BIG is the speed, quantity, and variety of data collected. Millions of call center calls, millions of orders online, millions of store transactions. Daily flows were collected, stored, matched, and studied. In recent years, I have been an Ecommerce and digital marketing executive. Earlier in my career, however, I worked at a major consumer database. The company at which I worked was a pioneer. Prefer Network (later CMS Direct, now part of Merkle) was one of the first (if not the first) to identify, collect, and model item-level transactional data starting in 2000. In short, we collected all the data available at the time on what consumers bought, from product category and color, to season, to dollars spent, and channel purchased (that is, whether catalog, Web, or store). This was the early years of big data, about 14 years ago.
After just a few years, Prefer Network had perhaps 500 clients sending in data feeds from all channels. We would model the data to understand if Suzy Shopper spent more, more often, through a catalog, online, or in store. What did she buy in each channel? What was she more likely to get together, shirts and sweaters or shirts and pants? We created targeted lists for mailing and emailing. We also anonymized data in order to create industry trend reports. We could address questions such as, across 300 women’s fashion stores, what were the most popular items? The least popular?
Data mining is fun stuff, valuable to businesses and largely appreciated by consumers.
Even then, in the early 2000s, we knew we had to protect the data, thereby protecting consumers. We were navigating the transition from mainframe computers to servers. And, all of the sudden, we were connecting this data to the Internet. Security became a bigger concern. Instead of protecting a device or a disk (locking an office door or desk drawer), we had to ward against hackers. In addition, we recognized the need for a consumer advocate. People had to be protected, not just data. I became the living, breathing human being who answered consumers directly when they wondered how they got on a mailing list – or when they just had questions about what we were trying to do. I was the head of consumer data privacy. A retailer would take a call from a consumer, who would read the code from the back of their catalog or describe which email they received, then the customer service rep would see where the list came from. If it was from my database, I would email or call the consumer to answer questions, correct a spelling, remove them from a list, and so on.
Sometimes, things go terribly wrong.
Last week there was a story about Office Max that stopped many in their tracks. As the Wall Street Journal reports a family received a mailer from Office Max saying “Daughter Killed in Car Crash” below the name and above the address.
Tragically, the Seay family had indeed lost their daughter, Ashley, in a car accident. An investigation revealed that Office Max rented a mailing list from Things Remembered, where the Seay family had ordered digital frames for pictures of their daughter. Still, what data was collected and why? Someone at Things Remembered apparently entered this highly personal information into their system and it was passed on. Officially, Office Max says they do not know what happened. List scrubbing, or data cleansing as it’s officially called, occurs at multiple steps. Obscentities are deleted, numbers are removed from name fields, ZIP codes are verified, duplicate names are dropped, and so on.
Big data helps personalize mailings and emailings, makes our online experience more targeted and enjoyable, and ensures that you are getting more of what you may actually want and less of the rest. Use of the information But big data also means that a data breach at Target can quickly complicate the lives of 100 million people, or that Office Max can inadvertently remind a family of their personal tragedy.
We are not going to stop collecting data, so what can be done?
Something much more “human” needs to happen to prevent more situations like these. More consumer data advocates need to be involved, at each step in the process and at every company involved. A role I first occupied over a decade ago is even more vital today. Humans have got to keep an eye on the personal information collected, on how it’s handled, and how it’s protected.