Recently, Dr. Roy Tan, Director of the American Marketing Center of MGI, was interviewed on the development of sequencing technology and sequencing of large populations. Dr.Tan has been engaged in the field of sequencing for many years and has extensive experience in the development and application of sequencing technology. He has a lot of ideas and insights on the field of sequencing.
Forward-looking perspective
1. The development of sequencing applications depends to some extent on the development of chip technology.
2. Looking ahead, the two directions of optics and chemistry will be the focus of sequencing technology development.
3. Interpreting human genes and exploring complex diseases requires large-scale sequencing of large populations, and the largest group is all human beings.
Q: The development of sequencing technology is changing with each passing day. Can you tell us about the progress we have made in recent years?
A: In the past few years, high-throughput large-scale parallel sequencing technology has developed rapidly. The DNBSEQTM sequencing technology represented by MGI is particularly prominent. In the past few years, our technological progress has mainly been reflected in the detection methodology and library preparation methods, learning the production technology of large-scale parallel chips and the understanding of data quality.
Q: High-quality library preparation is a prerequisite for accurate sequencing. How can we obtain higher quality sequencing libraries?
There are two relatively new developments in sequencing libraries or in the preparation of DNA libraries.
A: One is that we want to maintain the state of the original DNA sample as much as possible during the entire test process, that is, the library can be prepared by high-fidelity, no replication error process, and the real genomic information can be retained.
The other is that our sequencing technology uses linear amplification, which is rarely amplified, and the library preparation method without PCR, so the detected genome, we can call it the real genome.
The PCR-free library preparation method plus the PCR-free detection method is our new technological advancement. The PCR-free library products we developed have many advantages, including low initial amount, long read length, and support for multiple samples. One of the outstanding features is that PCR free libraries are used to detect high GC regions compared to PCR libraries. Better accuracy results in very long contigs that no longer depend on GC content in the sequenced samples.
Another fast-growing technology is our newly developed stLFR single-tube long-segment reading technology. The main principle of stLFR is a double barcode, double-labeled marking process. We know that the human body has two genomes, one from the mother and one from the father. We want to identify both genomes, preferably in the same tube, so that we can read longer fragments.
Each long segment has a common barcode on each of the beads, and we can identify whether the short segments belong to a long segment by identifying the barcode. Long fragments can be obtained by short segment sequencing, assembly, and two barcode recognition methods.
These fragments can be detected up to about 300K, and most of the fragments are around 60K. A good application of this technique is that we can detect large, long missing. We can detect the "Phasing" of haploid DNA in our body in the form of a single-length fragment, whether the fragment is from the mother or from the father.
We call the genomes made by PCR-free and stLFR respectively the real genome and the perfect genome. These techniques allow our sequencer to not only detect long fragments, but also ensure that the sequenced genome is very accurate. We now integrate these technologies into one sequencing instrument and the same sequencing reaction to achieve multiple applications on one machine.
Q: With the popularity of various sequencing applications, the need to increase the sequencing throughput is becoming more and more urgent. What breakthroughs do you think we have made in the face of the future, and in what direction will the future sequencing technology develop?
A: The development of various sequencing applications depends on the development of chip technology. In order to achieve large-scale sequencing, we must have relatively large chips. At present, our most representative one is the chip used by DNBSEQ-T7 sequencer. The chip can achieve 5 G reads, each with 5 billion sites that can be read.
Looking ahead, sequencing technology will evolve in two directions:
One is the optical method. By improving the performance of the optical detection system, the slides (i.e., the spacing between the modification sites) that the optical system can detect are getting smaller and smaller. If you shrink the 700-nanometer pitch on the current slide to 500 nanometers, the 5Gb reads will become 8Gb.
The second is the chemical method. At present, we can achieve a machine throughput of about 6T. If we can do PE300 or PE600 in the future, we can achieve 20T, 40T throughput.
Q: You mentioned earlier that we have a new understanding of data quality. Can you talk about the newly released "676" standard by MGI?
A: The "676" standard is a new standard that defines high-precision genomes that we propose under new technical conditions. That is, the quality of a genome generated from the first assembly should meet the following conditions:
When the sequencing depth is about 50X, and there is no dup or few repeats, the Contig N50 greater than 106 bases (>1Mb), Scaffold N50 greater than 107 bases (>10Mb), and for human genome, total assembled size larger than 6Gb. This is our high precision genomic standard.
Here we need to distinguish between resequencing and de novo assembly sequencing. Resequencing means that after we have sequenced many short fragments, we compare these short fragments with a known genome, which is resequencing. De novo assembly and sequencing (de novo assembly) means that we do not compare with a known genome, but directly assemble the data from the hand. This is called de novo assembly. Sequencing from scratch assembly can tell us the true state of each gene.
We used this technique to assemble and sequence data from six plants and two animals from scratch. Experiments have shown that de novo assembly is very similar to the data produced by common assembly and performs better. In addition, we have assembled the genomes of more than 20 species of marine fish from scratch, proving that the performance of this technology is very satisfactory.
At MGI, we have always had a belief that re-sequencing is not our goal. The perfect genome sequencing from scratch is our goal. Now, the DNBSEQ-T7 sequencer can achieve a "676" standard for $1,000. The genome is sequenced, and in the future, everyone has a unique, perfect genome assembled from scratch.
Q: In recent years, the United Kingdom, China, the United States and other countries have launched a large-scale population sequencing program. Has the sequencing of large populations become a trend? what direction will the future technology develop in?
A: Interpreting human genes and exploring complex diseases requires large-scale sequencing of large populations, and the largest group is all humans.
About 8 billion people in all humans, according to the "676" high-precision genome standard, if we want to sequence all humans within 50 years, the sequencing throughput required will reach 240 million people per year. It is not impossible to achieve such a goal. If there are 1000 sequencing laboratories on the earth, each laboratory can sequence 1000 people per day, and this can be achieved.
Humans built roads, railways, bridges and airports for transportation, let us go smoothly; we built countless farms, farms and restaurants for good food; we built large shopping malls and built them for modern life and work. Skyscrapers can accommodate tens of thousands of people inside.
So why don't I build more labs to detect the genes of each of us, so that each of us can have personal genomic data assembled from scratch to identify and prevent possible diseases? The health is of great significance and we need to establish such a system as soon as possible.
Health is the fundamental appeal of everyone. Health comes from two aspects. On the one hand, the environment is the process we live in; on the other hand, the inheritance is the gene, which is the information that is inherited from our parents.
Environment and genes are the eternal theme of human beings. We learned to control the environment in the course of tens of thousands of years of development, and now gradually understand and learn how to control genes. Through the advancement of technology, good genes will be passed on to our next generation, and everyone will become healthier and human development will be better and better.