The course introduces some of the main techniques and tools used in genomics and pangenomics, such as: exact and approximate string matching, pairwise and multiple sequence alignment, and pattern matching and alignment of degenerate strings.
Teachers
Person in charge
Gabriel Valiente Feruglio (
)
Xavier Messeguer Peypoch (
)
Weekly hours
Theory
2
Problems
1
Laboratory
1
Guided learning
0
Autonomous learning
8.5
Competences
Technical Competences of each Specialization
Direcció i gestió
CDG1 - Capability to integrate technologies, applications, services and systems of Informatics Engineering, in general and in broader and multicisciplinary contexts.
Especifics
CTE7 - Capability to understand and to apply advanced knowledge of high performance computing and numerical or computational methods to engineering problems.
CTE9 - Capability to apply mathematical, statistical and artificial intelligence methods to model, design and develop applications, services, intelligent systems and knowledge-based systems.
Generic Technical Competences
Generic
CG4 - Capacity for mathematical modeling, calculation and simulation in technology and engineering companies centers, particularly in research, development and innovation tasks in all areas related to Informatics Engineering.
Transversal Competences
Information literacy
CTR4 - Capability to manage the acquisition, structuring, analysis and visualization of data and information in the area of informatics engineering, and critically assess the results of this effort.
Basic
CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
Objectives
Understand pattern matching algorithms and data structures and their implementation in a modern programming language, and apply them to solve practical problems in bioinformatics.
Related competences:
CB6,
CTR4,
CDG1,
CTE7,
CTE9,
CG4,
Contents
Efficient search algorithms and data structures
The most efficient algorithms for searching for patterns in texts based on the alphabet and the length and number of patterns will be shown. Data structures that are very useful for comparing genomes such as suffix trees, suffix arrays and the Burrows-Wheeler transform, will also be explained.
Sequence alignment
Dynamic programming will be explained and we will discuss its application to compute the edit distance between two words, to the approximate search of a word in a text and to find the best alignment between two sequences. We will also study how to generalize the alignment to multiple sequences.
Data base searching: BLAST
The computational and statistical foundations of the BLAST algorithm and its use for approximate searches in databases will be introduced.
Alignment of degenerate strings
The generalization to pangenomes of pattern search problems will be explained and the extensions of the Boyes-Moore algorithm and the Burrows-Wheeler transform for the alignment of degenerate sequences will be introduced.
In the theoretical sessions, the lecturer will introduce algorithms and data structures, combined with examples and problem-solving. In the problem-solving sessions, students will work on their own solving problems, under supervision and assistance of the lecturer.
Evaluation methodology
There will be a (mid-term) final exam, in which the students will explain advanced algorithms and data structures from the research literature. The final grade will be just the final exam grade.