However, several other aspects, emerged from our work, should be considered in the evolution of alignment research area, such as the involvement of artificial intelligence to support cloud computing and mapping to multiple genomes. Starting from Sanger sequencing 40 years ago, more precise and rapid sequencing technologies expanded scale and resolution of various biological applications, including the detection of genome-wide single nucleotide polymorphisms (SNPs) and structural variants, quantitative analysis of transcriptome (RNA-Seq), identification of protein binding sites (ChIP-Seq), understanding methylation patterns in DNA, the assembly of new genomes or transcriptomes, determining species composition using metagenomic workflows. However, the huge amount of generated data explains almost nothing about the DNA without the appropriate analysis tools and algorithms. Therefore, bioinformatics researchers started to think about new ways to efficiently manage and analyze such enormous amount of data. The first crucial step in the analysis of next-generation sequencing (NGS) data, posterior to quality control and filtering steps, is alignment (mapping) of generated sequencing reads to the respective reference. However, this step is biased by many errors due to the following reasons : (1) a reference genome is generally long (~ billions) and presents complex regions such as repetitive elements (repetitive regions are usually masked because there is no consensus about how to deal with them, yet), (2) reads are short in length (typically, 50–150 bp), causing issues with efficiency and accuracy, aligning more likely in multiple locations rather than to unique positions in the reference genome, (3) the subject genome could inherently be different from the reference genome due to acquired alterations over time.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |