Would you recognise yourself from your data?

29 May 2019

Carl Miller

Research director and author, Demos

BBC Vacuum view of Carl's house — Carl's vacuum's view of his house

The circular, grey robot vacuum gently bumps against against my feet.

It stops, rotates and follows a skirting board along the kitchen, emitting a loud and unbroken drone.

Throwing out beams of invisible light, it plots obstacles and walls, the narrowness of corridors and the expanses of open areas that it can then - gratefully - accelerate into.

On a phone connected to it, a floor plan of the house is gradually drawn, inch by inch, bump by bump.

As it quietly docks itself in its charger, the floor plan it has built leaves the vacuum and ends up on a cloud server in China.

If you squint to read the device's Terms and Conditions, you'll learn that this plan might be shared (although not sold) to a variety of the manufacturer's partners.

Carl Miller set about trying to build a portrait of himself based on the data firms are gathering about him

Catrin Nye/BBC Carl Miller — Carl Miller set about trying to build a portrait of himself based on the data firms are gathering about him

"We use network analytics to understand where this traffic is sent, and what kind of traffic it is," says Anna Maria Mandalari, part of the Systems and Algorithms Laboratory at Imperial College London that measures how data from your home ends up all over the world.

The lab has also found "smart" bulbs and plugs that tell advertising companies whenever they're switched on or off, and a webcam that sends data, of some sort, to 52 organisations other than its manufacturer.

"I was surprised to see the data going to so many third parties," Anna said, of the devices they'd looked at.

"Of course, people should be worried. If the data isn't being used just for making the device work, then who knows why it's being collected and how it's being used?"

Data doppelganger

This is a story of two business models.

There are, of course, the obvious ones that we know and pay for.

But sometimes there is another business model buried within the services and technologies that we use that is much less visible to us: our personal data is collected and forms products and services that we are at the heart of, but often know very little about.

By law, we all now have a data-right to make a "subject access request" to a company and ask them for a copy of all the data they hold about us.

For my investigation with BBC Click, I decided to try to reconstruct my own data doppelganger - to come face-to-face with myself as I exist in the data, and so to understand a little more about the ecology of exchanges and brokers, suppliers and analytics firms that have built a version of… well, myself.

I spent more than a month issuing data access requests to as many different companies as I could, around 80 in total.

It was a month spent arguing on the phone, struggling with broken portals and trawling the small-print privacy policies of websites.

Eventually, around 20 companies sent back my personal data, and if printed, it would stretch to about 7,000 pages in length.

I found that there were three different parts to my data-self.

A small amount was composed of data that I'd volunteered myself: my address, name, contact details, and so on.

The second, much larger part was data that I'd generated as I'd used a company's services or products.

But the most interesting was data in a third category: data that had been created from other data that had been collected about me - from models and segmentations, based on probabilities and likelihoods.

Go-getter or disengaged worker?

About 1,500 of those pages were this kind of educated guesswork, all of it from companies I had never heard of before.

It's easy to find data on this scale a little alarming, but most of it I found more silly than sinister:

The age of my boiler had been predicted
My likelihood to be interested in gardening was 23.3%
My interest in prize draws and competitions was 11%
My "animal/nature awareness level" was low
My consumer technology audience segmentation was described as (among other things) "young and struggling".
My household was found to have no "regular interest in book reading" (I have written a book)

At one moment I was a go-getter, an idea-seeker.

Then I was a love aspirer, a disengaged worker, part of a group called budgeted stability or, simply, downhearted.

Something I did triggered a "Netmums - women trying to conceive" event.

If this was a reflection of myself, I didn't recognise it.

A small number of categories were perhaps those that we'd rather people didn't try to guess.

There was the probability that I'd use the internet for gambling or betting, and how much I am likely to spend on alcohol.

Data denied

My greatest impression was how unwelcome I felt in this second world.

We have rights to reclaim our data, but it is far from practical to do so. Some companies did not even respond when I issued a (legally enforceable) request to get my data back.

Several directed me to broken web-portals or online forms for something completely different.

I was made to write "please give me my data back" on a letter, print it, sign it, scan it and send it in.

One company first asked that I use normal post to make the request before I pointed out I didn't need to.

What may happen to things like your floor plan are buried within small-print privacy policies that manage to both be dense and vague at the same time.

When the data did come back, it was often in formats that would be extremely difficult for any normal person to understand.

Whether through design or neglect, learning about this second world is a time-consuming, frequently frustrating process.

And this, really, is exactly the point.

In one business model we are feted as consumers, while in the other we are only the products.

In this sense, these worlds are poles apart even when they both often exist within the self-same devices, technologies and services we all use.

Carl Miller is research director at the Demos Centre for the Analysis of Social Media, and the author of The Death of the Gods: The New Global Power Grab

Would you recognise yourself from your data?

Data doppelganger

Go-getter or disengaged worker?

Data denied

Facebook, Google and Twitter in data regulators' sights