When biomedical researchers go to sleep at night, they dream of genomes. Yours, and mine, and all six degrees of Kevin Bacon between us. And who can blame them? Think of all the information packed into the six billion letters of genetic code that makes you uniquely you and most definitely not me. Blockbuster drugs and other disease-smashing discoveries could be hiding in that DNA, if only scientists could collect enough of it.
So far, about 26 million people worldwide have had at least part of their genome decoded—mostly by companies like 23andMe and Ancestry. But only a tiny fraction of them have gone all the way. In 2009, a full genome would run you $100,000. Today, it’s more like $1,000. One company thinks it can crack $100 by 2021. So where are all the genomes? At least one startup is arguing that would-be sequencers have been scared off by a once distant specter: personal data privacy.
According to Kevin Quinn, the chief technology officer at Nebula Genomics, the great privacy awakening started shortly after the Facebook/Cambridge Analytica scandal broke in 2018. “People started seeing services they use every day not working the way they were intended,” he says. “And it’s had a strong whiplash in the genomics space.” 23andMe’s CEO Anne Wojcicki has also suggested privacy concerns as the reason for slumping sales of DNA tests. Nebula is one of several startups trying to solve those issues by putting people’s DNA on a blockchain.
The startup was cofounded by Harvard genomics pioneer George Church, who last month apologized for his associations with Jeffrey Epstein. When it launched early last year, it offered low-quality genome sequences for $99 with data access controls written into a public ledger. This summer they added a “sponsored sequencing” model, which offers customers a free clinical-grade genome if they let Nebula share their de-identified DNA and other data with pharmaceutical partners. And on Thursday, the company introduced the field’s first “anonymous sequencing,” a process that aims to entirely remove the person from their most personal information.
When you order a spit kit from a company like 23andMe or Ancestry, you have to pay with a credit card and enter an address. And you need an email to set up an account to see your results. All of this you’re doing on an internet browser. And all that data gets attached to the DNA swirling inside your tube of spit, soon to become a data file filled with short strings of As, Cs, Ts, and Gs. Before companies can share that genetic data with researchers or pharma companies eager to mine it, they have to strip away all those personal identifiers (and then some).
Nebula already does this. But, Quinn says, customers have to trust that everything gets properly scrubbed and no one ever messes up. The idea of anonymous sequencing is to decouple genomic data from personal information from the get-go. Before it even gets to Nebula.
That’s why the first step to anonymous sequencing is to clean up your ecommerce habits more generally. Nebula suggests getting encrypted email, a service provided by companies like Enigmail, Mailvelope, and Protonmail, and using a VPN to mask your browsing behavior. And you’ll definitely need an address not associated with your name. For that a PO Box will work. A secure crypto wallet or preloaded credit card is also a must. Once you’ve done all that, you’re ready to anonymously pay for and receive a Nebula spit kit. The company sequences your genome and throws it on their secure cloud without ever knowing who it belongs to.
“It doesn’t need to be de-identified on our end because it’s already intrinsically separate,” says Quinn. “And that’s never really been done before.” By creating a process based on the premise that Nebula can’t be trusted, the company says it is actually building trust. Counterintuitive, I know. But hey, this is blockchain after all.
There’s just one, tiny, double-helix-shaped problem. The genome is itself a unique identifier. Not in the eyes of the US’s complicated patchwork of genetic privacy laws. But in recent years, researchers have shown it’s increasingly possible to identify individuals from DNA alone, using public databases like those police used to catch the Golden State Killer. “What do you care what someone’s name is if you have all 6 billion of their base pairs? That’s a much more unique identifier.” says Mark Gerstein, a bioinformatician who codirects Yale’s Center for Biomedical Data Science.
To prevent DNA hackers from stealing a genome repository and combining it with other data to re-identify people, it should be encrypted. That’s just data security 101. But the issue with that, says Gerstein, is that reading a genome requires comparing it to other people’s DNA. That’s the only way to know what the letters mean. Encrypting your genome would keep it secret. But it would also keep it secret from the software that tells you where your ancestors come from or if your version of APOE4 will make you more susceptible to Alzheimer’s “Computing needs to be done to make any sense of it, which means it has to move amongst servers and databases. And doing that without revealing the underlying sequence is tricky.”
It’s tricky because genomic data is so huge. Bank numbers, tax returns, medical records, these are small files. So companies that offer knowledge-less storage can encrypt that data and give you the only key. Encrypting a whole genome is a much more computationally expensive process. Running computations on encrypted genomes even more so. But that’s what Nebula is working on next. For the last year the company has been collaborating with researchers to build and test a secure computing environment, a publication about which is currently under review.
The plan is to deploy it starting next year, first with the company’s own genome interpretation services, which tell customers about their health and ancestry, and eventually with its academic and pharmaceutical research partners. Currently, these calculations happen on the distributed network where Nebula stores genomic data. Partners can submit queries—for the presence of the Alzheimer’s-causing APOE variant, for example—and just see the results of their queries. Only Nebula and the genome owner have access to the plain text data. Eventually, even Nebula won’t have access, and only the genome’s owner will.
Despite his nitpicking, Gerstein is excited about the advance. “This type of thing is a really good step in terms of developing options for truly private genome sequencing and storage,” he says. That’s important, because he expects a time in the not-too-distant future where sequencing will become as commonplace in the doctor’s office as getting your blood pressure drawn. Normalizing these protections now could help prevent a bigger backlash later. Sweet dreams, scientists.