Credits
3
Types
- MEI: Elective
- MIRI: Elective
- MDS: Elective
Requirements
This subject has not requirements
, but it has got previous capacities
Department
CS
Teachers
Person in charge
- Santiago Marco Sola ( santiago.marco@upc.edu )
Weekly hours
Theory
2
Problems
1
Laboratory
1
Guided learning
0
Autonomous learning
8.5
Competences
Direcció i gestió
Especifics
Generic
Information literacy
Basic
Objectives
Contents
-
Sequencing Technologies and Genome Analysis
The course gives an overview of modern DNA sequencing technologies and the computational characteristics of genomic data. We introduce the main sources of error and variability in sequencing data and review fundamental genome analysis tasks, including read mapping, variant calling, and assembly. These problems motivate the abstraction of biological sequences as strings and highlight the algorithmic and computational challenges posed by genome-scale datasets. -
Exact String Matching Algorithms
We study the exact string matching problem as a foundational algorithmic task. Classical algorithms are introduced to illustrate different design paradigms, including prefix-based preprocessing, heuristic skipping strategies, hashing, and bit-parallelism. Emphasis is placed on correctness, time complexity, and practical performance considerations. -
Text-Indexing Data Structures
This block focuses on data structures that support efficient pattern matching over large-scale texts, with special attention to genomic applications. We cover hash-based and k-mer indexing strategies, lexicographic tree structures, suffix trees and suffix arrays, and compressed full-text indexes based on the Burrows-Wheeler Transform. -
Approximate String Matching
We extend pattern matching to the approximate setting, where mismatches and indels are allowed. The block introduces distance and error models, followed by filtering and seeding techniques to reduce the search space. Index-assisted methods, sketch-based similarity estimation, verification algorithms, and chaining strategies are presented as key components of modern approximate matching algorithms and tools. -
Pairwise Sequence Alignment
This block addresses sequence alignment as a fundamental task in bioinformatics and genome analysis. We introduce dynamic programming techniques for computing edit distance and other alignment models under different scoring schemes, including affine gap penalties. Both classical alignment modalities and advanced algorithmic optimizations are studied, with attention to space efficiency, sparsity, and output-sensitive methods such as wavefront alignment. -
Sequence Graphs and Pangenomics
The course concludes with an introduction to graph-based representations of genomic variation. We discuss sequence graphs, pangenome models, and their relationship to linear references. The sequence-to-graph alignment problem is introduced, highlighting the algorithmic challenges and opportunities that arise when extending alignment techniques to graph-based genome representations.
Activities
Activity Evaluation act
Theory
2h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
5h
Theory
2h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
5h
Theory
3h
Problems
1h
Laboratory
2h
Guided learning
0h
Autonomous learning
5h
Theory
2h
Problems
1h
Laboratory
1h
Guided learning
0h
Autonomous learning
5h
Theory
2h
Problems
1h
Laboratory
2h
Guided learning
0h
Autonomous learning
5h
Theory
1h
Problems
1h
Laboratory
1h
Guided learning
0h
Autonomous learning
5h
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h
Teaching methodology
The course combines lectures, problem-solving sessions, and independent work to develop both theoretical understanding and practical skills. During lecture sessions, the instructor introduces algorithmic concepts, data structures, and core techniques for genomics and pangenomics, supported by illustrative examples and discussion of complexity and performance trade-offs.Problem-solving sessions are devoted to the guided resolution of exercises, in which students work on algorithmic problems related to the course content. These sessions emphasize reasoning, correctness, and analysis, and students receive continuous instructor supervision and feedback.
Independent study and project work play a central role in the course. Students are expected to read and analyze research literature, prepare an oral presentation of a selected paper, and develop an applied programming project that consolidates the techniques studied throughout the course.
Evaluation methodology
Assessment in this course is based on continuous evaluation.The main component, accounting for 80% of the final grade, consists of an oral presentation and defence of a research paper on topics covered in the course. Students are expected to independently read, analyze, and critically present a research-level contribution that demonstrates understanding of the underlying algorithmic ideas, methodological choices, and their relevance to genome-scale sequence analysis.
The second component, accounting for the remaining 20% of the final grade, consists of a programming project, in which students autonomously design and implement an applied, experimental solution that puts into practice the algorithmic techniques studied throughout the course.
Bibliography
Basic
-
Genome-scale algorithm design : bioinformatics in the era of high-throughput sequencing
- Mäkinen, Veli; Belazzougui, Djamal; Cunial, Fabio; Tomescu, Alexandru I,
Cambridge University Press,
2023.
ISBN: 9781009341233
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991005219277706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
Algorithms on strings, trees, and sequences : computer science and computational biology
- Gusfield, Dan,
Cambridge University Press,
1997.
ISBN: 978- 0521585194
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991001989459706711&context=L&vid=34CSUC_UPC:VU1 -
Biological sequence analysis [Recurs electrònic] : probabilistic models of proteins and nucleic acids
- Durbin, Richard,
Cambridge University Press,
1998.
ISBN: 978-0521629713
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991000581539706711&context=L&vid=34CSUC_UPC:VU1
Complementary
-
Bioinformatics : sequence and genome analysis
- Mount, David W,
Cold Spring Harbor Laboratory Press,
cop. 2004.
ISBN: 0879696877
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002934579706711&context=L&vid=34CSUC_UPC:VU1&lang=ca -
An Introduction to bioinformatics algorithms
- Jones, Neil C; Pevzner, Pavel,
MIT Press,
cop. 2004.
ISBN: 978-0262101066
https://discovery.upc.edu/discovery/fulldisplay?docid=alma991002778129706711&context=L&vid=34CSUC_UPC:VU1
Previous capacities
No prior knowledge in bioinformatics is required. Basic concepts in molecular biology, genetics, or genomics may be helpful, but all necessary biological background will be introduced during the course.Students are expected to have basic knowledge of computer science concepts, including basic algorithms and data structures. Moreover, students should be comfortable with programming in at least one mainstream programming language (e.g., Python, C, C++, Java, or Rust). Prior exposure to high-performance computing, computer architecture, or performance engineering is not required, although a basic understanding of these topics may be advantageous for the course's implementation and optimization components.
Students must have a strong interest and motivation in algorithms and data structures.