Researchers have made significant advances toward the goal of a new microchip able to grow DNA strands that could provide high-density 3D archival data storage at ultra-low cost – and be able to hold that information for hundreds of years. To enable the technology, researchers have also developed a correction system able to compensate for errors in reading data stored in the DNA.
DNA data storage uses the four bases that make up biological DNA - adenine (A), thymine (T), guanine (G) and cytosine (C) – to store data in a way that is analogous to the zeroes and ones of traditional computing. Current DNA storage is mostly restricted to boutique applications such as time capsules, but there is broad interest in DNA as the next major storage medium for massive data archives.
The microchip work is part of the Scalable Molecular Archival Software and Hardware (SMASH) project, a collaboration led by the Georgia Tech Research Institute (GTRI) to develop scalable DNA-based read/write storage techniques. The project, supported by the Intelligence Advanced Research Projects Activity (IARPA) Molecular Information Storage (MIST) program, could help address the growing demand for archival storage, providing a cost-effective alternative to current tape and hard-drive systems.
The proof-of-concept nanofabricated microchips include tiny microwell structures a few hundred nanometers deep from which the DNA strands grow in a massively parallel process. The chips will ultimately include a second layer of electronic controls – fabricated in conventional CMOS – that will manage the chemical process as a unique molecule of DNA is grown in each of the wells, one base at a time. Once the sequence of bases that stores data has been completed, the DNA strands will be stripped off the surface and dried for long-term storage.
Because each base that stores information consists of a small number of atoms, the technique will allow hundreds of terabytes of information – that would now require many conventional disk drives – to be stored in a single dot of DNA. GTRI is working with California biotech companies Twist Bioscience and Roswell Biotechnologies toward a goal of demonstrating this new type of commercially viable data storage that could eventually scale into the exabyte regime.
"We've been able to show that it's possible to grow DNA to the sort of length that we want, and at about the feature size that we care about using these chips," said Nicholas Guise, a GTRI senior research scientist who is project director for SMASH. "The goal is to grow millions of unique, independent sequences across the chip from these microwells, with each serving as a tiny electrochemical bioreactor."
The current prototype chip is about an inch square and includes 10 banks of microwells where the DNA is grown. "Working with our colleagues at Twist and in Georgia Tech's Institute for Electronics and Nanotechnology, we have optimized the geometry of the microwells to fit more and more of them on a chip," he explained.
The DNA chips will be used for long-term, archival data storage in which information is infrequently accessed – but must be kept available for a long time. Such data is currently kept in magnetic tape memory, which must periodically be replaced by new tapes as the media ages. Storing and retrieving the data in DNA will be time-consuming, but the media will last virtually forever and can be retrieved using standard DNA sequencing techniques used for medical diagnostics.
"As long as you keep the temperature low enough, the data will survive for thousands of years, so the cost of ownership drops to almost zero," Guise said. "It only costs much money to write the DNA once at the beginning and then to read the DNA at the end. If we can get the cost of this technology competitive with the cost of writing data magnetically, the cost of storing and maintaining information in DNA over many years should be lower."
One of the disadvantages of storing data in DNA is a higher error rate – considerably higher than what computer engineers would tolerate with conventional hard drive storage. In collaboration with the University of Washington, GTRI researchers have designed an encoding of the information into DNA (a "codec") designed to identify and correct the errors and protect the data stored in DNA.
"We are working with a bunch of new technologies, and these new technologies have higher error rates than storage technologies have in the past," said Adam Meier, a GTRI senior research scientist working on the SMASH project. "We've targeted this codec to be super robust against errors, able to work with devices that read as much as 10% of the bases wrong."
Error correction eases the burden on the hardware side of the project, and the error correction scheme is tunable to allow the team to experiment with different chemistry approaches and DNA lengths. In testing their work, the team received support from the Molecular Evolution Core at Georgia Tech and the Advanced Concepts Laboratory at GTRI in sequencing the data stored in the DNA.
"What this does operationally is allow us to potentially turn up the speed and throughput of the synthesizer and sequencer," said Guise. "If you can tolerate some of the error through a resilient codec, you can write much more data and read much more data faster."
The researchers have demonstrated writing image files into DNA, then reading them back out, with help from company partner Twist. Meier expects that the error rate will decline as the technology advances, though he says error correction will always be part of the data reading operations.
"What we expect is that eventually the error correction code will be more lightweight," he said. "It will eventually have less of an impact on the final design, and when the error rates are better, then the codec will become less important. That's part of our research into future phases of the program."