{"id":15698,"date":"2022-04-01T14:48:59","date_gmt":"2022-04-01T14:48:59","guid":{"rendered":"https:\/\/interhospi.com\/?p=15698"},"modified":"2022-04-01T15:07:27","modified_gmt":"2022-04-01T15:07:27","slug":"in-major-breakthrough-scientists-complete-first-gapless-sequence-of-a-human-genome-reveal-hidden-regions","status":"publish","type":"post","link":"https:\/\/interhospi.com\/in-major-breakthrough-scientists-complete-first-gapless-sequence-of-a-human-genome-reveal-hidden-regions\/","title":{"rendered":"In major breakthrough, scientists complete first gapless sequence of a human genome, reveal hidden regions"},"content":{"rendered":"
\"Karen

Karen Miga, assistant professor of biomolecular engineering at UC Santa Cruz, co-led the Telomere-to-Telomere (T2T) Consortium, which has released the first complete, gapless assembly of a human genome sequence. (Photo by Carolyn Lagattuta)<\/p><\/div>\n

 <\/p>\n

The first truly complete sequence of a human genome, covering each chromosome from end to end with no gaps and unprecedented accuracy, is now accessible through the UCSC Genome Browser<\/a> and is described in six papers published March 31 in Science<\/em>.<\/p>\n

Since the first working draft of a human genome sequence was assembled at UC Santa Cruz in 2000, genomics research has led to enormous advances in our understanding of human biology and disease. Nevertheless, crucial regions accounting for some 8% of the human genome have remained hidden from scientists for over 20 years due to the limitations of DNA sequencing technologies.<\/p>\n

\u201cEver since we had the first draft human genome sequence, determining the exact sequence of complex genomic regions has been challenging,\u201d said Evan Eichler, Ph.D., researcher at the University of Washington School of Medicine and T2T consortium co-chair. \u201cI am thrilled that we got the job done. The complete blueprint is going to revolutionize the way we think about human genomic variation, disease and evolution.\u201d<\/p>\n

Telomere-to-Telomere Consortium<\/strong><\/h4>\n

The sequencing and analysis were performed by a team of more than 100 people, the so-called Telemere-to-Telomere Consortium, or T2T, named for the telomeres that cap the ends of all chromosomes. T2T was initially set up in 2019 by Karen Miga, assistant professor of biomolecular engineering at UC Santa Cruz, and Adam Phillippy at the National Human Genome Research Institute (NHGRI).<\/p>\n

The consortium\u2019s gapless version of all 22 autosomes and the X sex chromosome is composed of 3.055 billion base pairs, the units from which chromosomes and our genes are built, and 19,969 protein-coding genes.<\/p>\n

The new reference genome, called T2T-CHM13, adds nearly 200 million base pairs of novel DNA sequences, including 99 genes likely to code for proteins and nearly 2,000 candidate genes that need further study. It also corrects thousands of structural errors in the current reference sequence.<\/p>\n

\"genetic<\/p>\n

<\/h4>\n

<\/h4>\n

Complete sequence of a Y chromosome<\/strong><\/h4>\n

The researchers also released this week the complete sequence of a Y chromosome from a different source, which took nearly as long to assemble as the rest of the genome combined, said Nicolas Altemose, a postdoctoral fellow at the University of California, Berkeley, and a co-author of four new papers about the completed genome. The analysis of this new Y chromosome sequence will appear in a future publication.<\/p>\n

\u201cIn the future, when someone has their genome sequenced, we will be able to identify all of the variants in their DNA and use that information to better guide their health care,\u201d said Phillippy, one of the leaders of T2T and a senior investigator at NHGRI. \u201cTruly finishing the human genome sequence was like putting on a new pair of glasses. Now that we can clearly see everything, we are one step closer to understanding what it all means.\u201d<\/p>\n

The gaps now filled by the new sequence include the entire short arms of five human chromosomes and cover some of the most complex regions of the genome. These include highly repetitive DNA sequences found in and around important chromosomal structures such as the telomeres at the ends of chromosomes and the centromeres that coordinate the separation of replicated chromosomes during cell division.<\/p>\n

New discoveries<\/strong><\/h4>\n

The new DNA sequences reveal never-before-seen detail about the region around the centromere. Variability within this region may also provide new evidence of how our human ancestors evolved in Africa.<\/p>\n

\u201cUncovering the complete sequence of these formerly missing regions of the genome told us so much about how they\u2019re organized, which was totally unknown for many chromosomes,\u201d said Altemose. \u201cBefore, we just had the blurriest picture of what was there, and now it\u2019s crystal clear down to single base pair resolution.\u201d<\/p>\n

The new sequence also reveals previously undetected segmental duplications, long stretches of DNA that are duplicated in the genome and are known to play important roles in evolution and disease.<\/p>\n

\u201cThese parts of the human genome that we haven\u2019t been able to study for 20-plus years are important to our understanding of how the genome works, genetic diseases, and human diversity and evolution,\u201d Miga said.<\/p>\n

Many of the newly revealed regions have important functions in the genome even if they do not include active genes.<\/p>\n

\"human<\/p>\n

\u00a0<\/strong><\/h4>\n

What they found in and around the centromeres were layers of new sequences overlaying layers of older sequences, as if through evolution new centromere regions have been laid down repeatedly to bind to the kinetochore. The older regions are characterized by more random mutations and deletions, indicating they\u2019re no longer used by the cell. The newer sequences where the kinetochore binds are much less variable, and also less methylated. The addition of a methyl group is an epigenetic tag that tends to silence genes.<\/p>\n

All of the layers in and around the centromere are composed of repetitive lengths of DNA, based on a unit about 171 base pairs long, which is roughly the length of DNA that wraps around a group of proteins to form a nucleosome, keeping the DNA packaged and compact. These 171 base pair units form even larger repeat structures that are duplicated many times in tandem, building up a large region of repetitive sequences around the centromere.<\/p>\n

DNA sequences around the centromere could also be used to trace human lineages back to our common ape ancestors, he noted.<\/p>\n

\u201cAs you move away from the site of the active centromere, you get more and more degraded sequence, to the point where if you go out to the furthest shores of this sea of repetitive sequences, you start to see the ancient centromere that, perhaps, our distant primate ancestors used to bind to the kinetochore,\u201d Altemose said. \u201cIt\u2019s almost like layers of fossils.\u201d<\/p>\n

Seeing the whole genome as a complete system for the first time
\n<\/strong><\/h4>\n

\u201cThere is a profound advantage to seeing the whole genome as a complete system. It puts us in a position to unravel how that system works,\u201d said David Haussler, director of the UC Santa Cruz Genomics Institute. \u201cWe\u2019ve gotten an enormous understanding of human biology and disease from having roughly 90 percent of the human genome, but there were many important aspects that lay hidden, out of view of science, because we did not have the technology to read those portions of the genome. Now we can stand at the top of the mountain and see all of the landscape below and get a complete picture of our human genetic heritage.\u201d<\/p>\n

The T2T genome sequence, representing the finished CHM13 genome plus the recently finished T2T Y chromosome (CHM13 includes an X but not a Y chromosome), is now a new reference genome in the UCSC Genome Browser. The T2T sequence is fully annotated in the browser, providing an efficient way for scientists to access and visualize a wealth of information associated with genes and other elements of the genome.<\/p>\n

\u201cWe wanted to put the information out in a way that is accessible and familiar to researchers so they can begin to build on it and use all the tools and resources the browser provides,\u201d Miga explained.<\/p>\n

Genome Reference Consortium<\/strong><\/h4>\n

The new T2T reference genome will complement the standard human reference genome, known as Genome Reference Consortium build 38 (GRCh38), which had its origins in the publicly funded Human Genome Project and has been continually updated since the first draft in 2000.<\/p>\n

\u201cWe\u2019re adding a second complete genome, and then there will be more,\u201d explained Haussler. \u201cThe next phase is to think about the reference for humanity\u2019s genome as not being a single genome sequence. This is a profound transition, the harbinger of a new era in which we will eventually capture human diversity in an unbiased way.\u201d<\/p>\n

Human Pangenome Reference Consortium<\/strong><\/h4>\n

The T2T Consortium has now joined with the Human Pangenome Reference Consortium<\/a>, which aims to create a new \u201chuman pangenome reference\u201d based on the complete genome sequences of 350 individuals.<\/p>\n

\u201cPangenomics is about capturing the diversity of the human population, and it\u2019s also about ensuring we\u2019ve captured the whole genome properly,\u201d said Benedict Paten, associate professor of biomolecular engineering at UCSC\u2019s Baskin School of Engineering, a coauthor of the T2T papers, and a leader of the pangenomics effort. \u201cWithout having a map of these difficult-to-sequence regions of the genome across multiple individuals, then we\u2019re missing a huge amount of the variation present in our population. T2T sets us up to look across hundreds of genomes from telomere to telomere. It\u2019s going to be great!\u201d<\/p>\n

The standard reference genome (GRCh38) does not represent any one individual but was assembled from multiple donors. Merging them into one linear sequence created artificial structures in the sequence. The Human Pangenome Project will make it possible to compare newly sequenced genomes to multiple complete genomes representing a range of human ancestries.<\/p>\n

More accurate assessments of genetic variants <\/strong><\/h4>\n

An important outcome of the new T2T sequence is enabling more accurate assessments of genetic variants. When human genomes are sequenced for clinical studies to understand the role of genetic variants in disease or to study genetic diversity within and between human populations, they are nearly always analyzed by aligning the sequencing results with the reference genome for comparison. The T2T variant team documented major improvements in identifying and interpreting genetic variants using the new T2T sequence compared to the standard human reference genome.<\/p>\n

\u201cThe new human genome is incredibly accurate at the base level, allowing us to flag hundreds of thousands of variants that had been misinterpreted by mapping them to the standard reference. Many of these new variants are in genes known to contribute to disease. We can now spot those because we have a more complete and accurate reference genome,\u201d Miga said.<\/p>\n

Miga\u2019s research has focused on satellite DNA, the long stretches of repetitive DNA sequences found mostly in and around telomeres and centromeres. The centromeres separate each chromosome into a short arm and a long arm and hold duplicated chromosomes together prior to cell division.<\/p>\n

\u201cThe centromeres play a critical role in how chromosomes segregate properly during cell division, and we\u2019ve known for some time now that they are misregulated in all kinds of human diseases. But we\u2019ve never been able to study them at the sequence level,\u201d Miga said. \u201cBy far the largest portion of new sequences added to the reference are centromere satellite DNAs. For the first time, we can study \u2018base-by-base\u2019 the sequences that define the centromere and can start to understand how it works.\u201d<\/p>\n

Long-read sequencing a game changer<\/strong><\/h4>\n

The T2T\u2019s success is due to improved techniques for sequencing long stretches of DNA at once, which helps when determining the order of highly repetitive stretches of DNA. Among these are PacBio\u2019s HiFi sequencing, which can read lengths of more than 20,000 base pairs with high accuracy. Technology developed by Oxford Nanopore Technologies, on the other hand, can read up to several million base pairs in sequence, though with less fidelity. For comparison, so-called next-generation sequencing by Illumina is limited to hundreds of base pairs.<\/p>\n

\u201cThese new long-read DNA sequencing technologies are just incredible; they\u2019re such game changers, not only for this repetitive DNA world, but because they allow you to sequence single long molecules of DNA,\u201d Altemose said. \u201cYou can begin to ask questions at a level of resolution that just wasn\u2019t possible before, not even with short-read sequencing methods.\u201d<\/p>\n

———<\/h2>\n

Karen Miga<\/strong><\/h4>\n

Miga is a co-corresponding author of the main Science<\/em> paper along with Adam Phillippy at NHGRI and Evan Eichler at the University of Washington:<\/p>\n