The world’s heaviest parrot – representing one of the most ancestral branches of the parrot family tree – is nearly extinct, with barely 200 adults roaming the undergrowth of four small islands. It has long been unknown whether the remaining kakapos have the genetic resilience to survive, and a question that only high-quality genomic analysis could answer.
But high-quality genomic assembly does not exist for the kakapo – nor for most of the 70,000 species of vertebrates living today. As a result, questions abound on how best to prevent the extinction of species like the flightless kakapos and the adorable vaquita dolphins.
Responses may come from Vertebrate Genomes Project, which aims to generate high quality reference genomes for all existing vertebrate species. In a flagship study of the journal Nature, the team presents methods and principles for sequencing and assembling high-quality reference genomes.
The team applied this approach and principles to produce 16 high-quality reference genomes, one of which was the endangered kakapo, to help reveal whether it is robust enough to rebuild its population. Researchers have found that extremely small populations of the endangered kakapo and vaquita have been able to survive their low numbers in the past since the last ice age more than 10,000 years ago, purging the deleterious mutations that cause the disease. of consanguinity.
As long as humans don’t kill more of the last remaining animals, the findings of high-quality reference genomes give hope that these species could survive even with fewer than 100 individuals each.
“We call it the ‘kitchen sink approach’ – combining tools from several biotech companies to create this high-quality genome assembly pipeline,” says Erich D. Jarvis of Rockefeller University, president of the Vertebrate Genomes project. “Endangered species were the first to benefit from the new technology because, although conservation is not my area of research, I felt it was a moral duty.
Genomes full of errors
High-quality reference genomes exist only for celebrities in laboratory science – mice, fruit flies, zebrafish and, of course, humans. For less popular species, there is often no reference genome or, perhaps worse, messy genomes assembled from sequences obtained via quick and dirty methods. Compared to the new VGP genomes, up to 60% of the genes in these genomes have missing sequences, are completely missing or poorly assembled, the researchers found. It can take years to unravel the thousands of assembly errors by species.
Many false gene duplications have been discovered, most of them caused by algorithms that do not properly separate maternal and paternal chromosome sequences and instead interpret them as two separate sister genes. “We have thousands of genes in the literature that are false duplications. The genes aren’t actually there!” Jarvis said. “It is inadmissible to work with some of these genomes.”
The Vertebrate Genomes Project was born out of the frustrations of hundreds of scientists working in its parent organization, the Genome 10K consortium, whose mission was to generate genomic assemblies of 10,000 vertebrate species. The initial genomic assemblies generated by G10K and other groups were based on short reads of 35-200 base pairs, but these assemblies were very incomplete. VGP’s goal is to create an error-free reference genome library for all vertebrate species that researchers and conservationists can use easily, without spending months or years attaching individual genes. .
“We said, let’s work hard on the front end, so that we can get high quality data on the back end,” Jarvis says.
Deployment of the vertebrate genomes project
Many companies have approached the Vertebrate Genomes Project, promising unique sequencing technology that would solve all the problems with disordered reference genomes. The Vertebrate Genomes Project’s assembly team tested each method on a single hummingbird, chosen both for its relatively small genome and because of Jarvis’ research interests in vocal learning in bird species (” two birds with one stone, ”he quipped). But every technology has failed. “None had all of the components needed to achieve a high quality assembly,” explains Jarvis. “So we combined a lot of tools into one pipeline.”
Their approach works. Organizations such as the Earth Biogenome Project, the Darwin Tree of Life Project, and the New Zealand Genome Sequencing Project are already using the most advanced version of the new pipeline. Reference genomes that once took years to generate now unfold in weeks and months, all without the false duplications and other errors endemic to previous assemblies.
Scientists are already using the new data to study the genes that make bats immune to COVID-19 and challenge long-held conventions in basic science, as if there are significant differences between oxytocin and its receptors found in humans, birds, reptiles and fish.
A total of 20 studies and 25 high-quality vertebrate genomes are supporting the deployment of the new pipeline. “The first high-quality genomes we sequenced taught us so much about the technology and biology that we decided to publish in these early papers,” Jarvis says. But there is still a lot of work to be done. “The next step is to sequence the 1,000 genera of vertebrates, then the 10,000 families of vertebrates, and finally all species of vertebrates.”
Source: Rockefeller University
Original study DOI: 10.1038 / s41586-021-03451-0