This course presents the fundamentals of sequence analysis of biological sequence data, from the basic algorithms to their main applications.
The subject consists in three main blocks:
1. Genomic data analysis: Sequencing Technologies. Computational genomics. Main file formats for sequence data. Approximate string matching aligners for sequencing reads. Genome assembly algorithms and strategies.
2. Dynamic programming and Sequence alignment: Dynamic programming. Pairwise alignment (Needleman-Wunsch and Smith-Waterman algorithms). BLAST. Multiple sequence alignment. Other applications.
3. Clustering Methods and Algorithms in Genomics: Hidden-Markov Models (HMM). Principal Component Analysis (PCA), Parsimony. Maximum Likelihood Methods. Genetic Algorithms.
The programming language used in this course is Python with special emphasis on solving applied genomics and clustering problems. Following a problem-based learning approach, the students will write their own scripts and/or use pre-existing bioinformatic approaches for different challenges. We will encourage the use of python libraries (for statistics and plots) and classes.
Teachers
Person in charge
Arnau Cordomí Montoya (
)
Fernando Cruz Rodríguez (
)
Oscar Lao Grueso (
)
Weekly hours
Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6
Learning Outcomes
Learning Outcomes
Knowledge
K1 - Recognize the basic principles of biology, from cellular to organism scale, and how these are related to current knowledge in the fields of bioinformatics, data analysis, and machine learning; thus achieving an interdisciplinary vision with special emphasis on biomedical applications.
K2 - Identify mathematical models and statistical and computational methods that allow for solving problems in the fields of molecular biology, genomics, medical research, and population genetics.
K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.
K7 - Analyze the sources of scientific information, valid and reliable, to justify the state of the art of a bioinformatics problem and to be able to address its resolution.
Skills
S1 - Integrate omics and clinical data to gain a greater understanding and a better analysis of biological phenomena.
S2 - Computationally analyze DNA, RNA and protein sequences, including comparative genome analyses, using computation, mathematics and statistics as basic tools of bioinformatics.
S3 - Solve problems in the fields of molecular biology, genomics, medical research and population genetics by applying statistical and computational methods and mathematical models.
S4 - Develop specific tools that enable solving problems on the interpretation of biological and biomedical data, including complex visualizations.
S5 - Disseminate information, ideas, problems and solutions from bioinformatics and computational biology to a general audience.
S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.
S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.
Competences
C2 - Identify the complexity of the economic and social phenomena typical of the welfare society and relate welfare to globalization, sustainability and climate change in order to use technique, technology, economy and sustainability in a balanced and compatible way.
C3 - Communicate orally and in writing with others in the English language about learning, thinking and decision making outcomes.
C4 - Work as a member of an interdisciplinary team, either as an additional member or performing managerial tasks, in order to contribute to the development of projects (including business or research) with pragmatism and a sense of responsibility and ethical principles, assuming commitments taking into account the available resources.
Objectives
Present their work in front of their coleagues
Related competences:
C3,
Collaborate with other students to conduct a project assignment
Related competences:
C4,
Development of mathematical models for working with biological sequences during the practical assignments using Phyton programming language. Different tools will be provided for visualizing the results.
Related competences:
K4,
K2,
K7,
S1,
S2,
S3,
S4,
S5,
S7,
S8,
Generating optimal programming skills for minimizing computational time and the fingerprint of global climate change
Related competences:
C2,
Understanding how sequence alignment and phylogenetics can be applied to medicine.
Related competences:
K1,
Contents
Theoretical Contents
T1 = Sequencing Technologies. Computational genomics. File formats for sequence data.
T2 = Pairwise sequence alignment
T3 = BLAST and Multiple Sequence Alignment
T4 = Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.
T5 = De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.
T6 = Hidden Markov Models for sequences.
T7 = Basics of Phylogenetics. Basic Algorithms in Phylogenetics.
T8 = Phylogenetics Distance based methods.
T9 = Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.
Activities
ActivityEvaluation act
Introduction to Genomic Data Analysis
Sequencing Technologies. Computational genomics. File formats for sequence data
Theory: Sequencing Technologies. Computational genomics. File formats for sequence data
Problems: Parsing Fasta and Fastq Files. Sequence analyses and visualization.
Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.
Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment. Objectives:3 Contents:
De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.
De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding. Objectives:3 Contents:
* Theoretical lectures.
* Practical programming exercises directly related to theory.
* Group project in algorithms and Bioinformatic applications.
Evaluation methodology
Continuous Assessment 20%: Quizzes and submission of exercises.
Group project 20%: Evaluation rubric will be published in the subject Moodle.
Exams 60%: Evaluation rubric will be published in the subject Moodle.
Retake: The retake exam grade replaces the grade of the final exam.
Bioinformatics algorithms. An active learning approach. -
Compeau, Phillip P; Pevzner, Pavel.,
Active Learning Publishers. , 2014. ISBN: 9780990374602
Problems and Solutions in Biological Sequence Analysis -
Borodovsky, Mark; Ekisheva, Svetlana,
Cambridge University Press, 2006. ISBN: 978-0521847544
The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing -
Lemey,P; Salemi, M; Vandamme, A,
Cambridge, 2009. ISBN: 978-0521730716
Web links
Will be provided during the lecture presentations http://None