Consumer DNA testing — and the mountain of data it has generated — has become pervasive enough that it’s possible to identify about six of every 10 people in the U.S. who are of European descent, even if they’ve never given a sample.
According to a study published on Thursday in the journal Science, Americans of European extraction are more likely than not to have close genetic ties with someone who has done a consumer DNA test through a company like 23andMe Inc. or Ancestry.com Inc., even if they’ve never shipped the companies a sample of their DNA.
“We are getting very soon to the point that everyone will be potentially identifiable using this technique,” said study author Yaniv Erlich, an assistant professor at Columbia University and the chief science officer at the consumer-DNA-testing firm MyHeritage.
Erlich said that only 2 percent of the population needs to have done a DNA test for virtually everyone’s genetic information to be represented in the data.
More than 15 million people have taken consumer DNA tests, and more than a million have uploaded raw DNA data files to GEDmatch, a third-party open-source website set up to let users of different genetic services hunt for relatives across platforms.
Combining genetic information with other material people have shared online, as well as government and other databases, could be a powerful tool for finding even those who don’t wish to be found. With enough overlapping data, conceivably anyone could be identified using connections unearthed in genetic databases.
“It’s a combination of genetic data with social media, with public records,” said Debbie Kennett, a British genealogist and author. “The conversation should not just be about genetic data but about what other information people are revealing to the public.”
Privacy concerns around consumer genetic testing have increased amid several high-profile instances of law enforcement using DNA information to generate leads. In April, police arrested a suspect in the case of the Golden State Killer, who had terrorized California in the 1970s and 1980s, after uploading crime-scene DNA to GEDmatch and locating relatives of the suspect. The tactic has led to more than a dozen arrests in other investigations.
In the study, researchers looked at the genomic data of 1.28 million people who have tested with MyHeritage, of whom about three-quarters were of European descent. They attempted to find second, third or fourth cousins who’d also taken the company’s test — the same kind of familial matches recently used by police. About 60 percent of the time, they found a match.
Erlich has long been interested in the privacy threats posed by DNA. In 2013, his lab at the Whitehead Institute showed that it was possible to discover the identities of people who participate in genetic research studies by cross-referencing their data with other publicly available information.
Research participants, the latest study found, could be identified with this newer technique, too. Using publicly available data, within a day researchers were able to find the identity of a Utah woman whose DNA data was available publicly as part of the 1000 Genomes project.
Erlich said genetic information should be considered identifiable and, particularly when it comes to research, protected. He proposes that direct-to-consumer testing companies implement cryptographic signatures for DNA data files to ensure the data is authentic. Such a measure might even allow users to specify when and how they want their data to be used.
“The last thing I want is for people to think from our study that it’s dangerous to give data for genetic research,” he said. “We need people participating in research studies.”