Algorithms in Biology

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
UPF
This course presents the fundamentals of sequence analysis of biological sequence data, from the basic algorithms to their main applications.

The subject consists in three main blocks:

1. Genomic data analysis: Sequencing Technologies. Computational genomics. Main file formats for sequence data. Approximate string matching aligners for sequencing reads. Genome assembly algorithms and strategies.
2. Dynamic programming and Sequence alignment: Dynamic programming. Pairwise alignment (Needleman-Wunsch and Smith-Waterman algorithms). BLAST. Multiple sequence alignment. Other applications.
3. Clustering Methods and Algorithms in Genomics: Hidden-Markov Models (HMM). Principal Component Analysis (PCA), Parsimony. Maximum Likelihood Methods. Genetic Algorithms.

The programming language used in this course is Python with special emphasis on solving applied genomics and clustering problems. Following a problem-based learning approach, the students will write their own scripts and/or use pre-existing bioinformatic approaches for different challenges. We will encourage the use of python libraries (for statistics and plots) and classes.

Teachers

Person in charge

  • Arnau Cordomí Montoya ( )
  • Fernando Cruz Rodríguez ( )
  • Oscar Lao Grueso ( )

Weekly hours

Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6

Learning Outcomes

Learning Outcomes

Knowledge

  • K1 - Recognize the basic principles of biology, from cellular to organism scale, and how these are related to current knowledge in the fields of bioinformatics, data analysis, and machine learning; thus achieving an interdisciplinary vision with special emphasis on biomedical applications.
  • K2 - Identify mathematical models and statistical and computational methods that allow for solving problems in the fields of molecular biology, genomics, medical research, and population genetics.
  • K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.
  • K7 - Analyze the sources of scientific information, valid and reliable, to justify the state of the art of a bioinformatics problem and to be able to address its resolution.

Skills

  • S1 - Integrate omics and clinical data to gain a greater understanding and a better analysis of biological phenomena.
  • S2 - Computationally analyze DNA, RNA and protein sequences, including comparative genome analyses, using computation, mathematics and statistics as basic tools of bioinformatics.
  • S3 - Solve problems in the fields of molecular biology, genomics, medical research and population genetics by applying statistical and computational methods and mathematical models.
  • S4 - Develop specific tools that enable solving problems on the interpretation of biological and biomedical data, including complex visualizations.
  • S5 - Disseminate information, ideas, problems and solutions from bioinformatics and computational biology to a general audience.
  • S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.
  • S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.

Competences

  • C2 - Identify the complexity of the economic and social phenomena typical of the welfare society and relate welfare to globalization, sustainability and climate change in order to use technique, technology, economy and sustainability in a balanced and compatible way.
  • C3 - Communicate orally and in writing with others in the English language about learning, thinking and decision making outcomes.
  • C4 - Work as a member of an interdisciplinary team, either as an additional member or performing managerial tasks, in order to contribute to the development of projects (including business or research) with pragmatism and a sense of responsibility and ethical principles, assuming commitments taking into account the available resources.

Objectives

  1. Present their work in front of their coleagues
    Related competences: C3,
  2. Collaborate with other students to conduct a project assignment
    Related competences: C4,
  3. Development of mathematical models for working with biological sequences during the practical assignments using Phyton programming language. Different tools will be provided for visualizing the results.
    Related competences: K4, K2, K7, S1, S2, S3, S4, S5, S7, S8,
  4. Generating optimal programming skills for minimizing computational time and the fingerprint of global climate change
    Related competences: C2,
  5. Understanding how sequence alignment and phylogenetics can be applied to medicine.
    Related competences: K1,

Contents

  1. Theoretical Contents
    T1 = Sequencing Technologies. Computational genomics. File formats for sequence data.
    T2 = Pairwise sequence alignment
    T3 = BLAST and Multiple Sequence Alignment
    T4 = Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.
    T5 = De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.
    T6 = Hidden Markov Models for sequences.
    T7 = Basics of Phylogenetics. Basic Algorithms in Phylogenetics.
    T8 = Phylogenetics Distance based methods.
    T9 = Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.

Activities

Activity Evaluation act


Introduction to Genomic Data Analysis

Sequencing Technologies. Computational genomics. File formats for sequence data
  • Theory: Sequencing Technologies. Computational genomics. File formats for sequence data
  • Problems: Parsing Fasta and Fastq Files. Sequence analyses and visualization.
Objectives: 3 5
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Pairwise sequence alignment

Pairwise sequence alignment
Objectives: 3
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

BLAST and Multiple Sequence Alignment

BLAST and Multiple Sequence Alignment
  • Problems: 2 Groups of Students

Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.

Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.
Objectives: 3
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.

De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.
Objectives: 3
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Hidden Markov Models for sequences

Hidden Markov Models for sequences
Objectives: 3 4
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Basics of Phylogenetics. Basic Algorithms in Phylogenetics.

Basics of Phylogenetics. Basic Algorithms in Phylogenetics.
Objectives: 5
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Phylogenetics Distance based methods.

Phylogenetics Distance based methods.
Objectives: 3 5
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.

Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.
Objectives: 3 4 5
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Group project in algorithms and Bioinformatic applications.

Group project in algorithms and Bioinformatic applications.
Objectives: 1 2 4
Contents:
Theory
3h
Problems
3h
Laboratory
0h
Guided learning
0h
Autonomous learning
36h

Teaching methodology

Problem-based Learning approach:

* Theoretical lectures.
* Practical programming exercises directly related to theory.
* Group project in algorithms and Bioinformatic applications.

Evaluation methodology

Continuous Assessment 20%: Quizzes and submission of exercises.
Group project 20%: Evaluation rubric will be published in the subject Moodle.
Exams 60%: Evaluation rubric will be published in the subject Moodle.

Retake: The retake exam grade replaces the grade of the final exam.

Bibliography

Basic:

  • Biological sequence analysis [Recurs electrònic] : probabilistic models of proteins and nucleic acids - Durbin, Richard, Cambridge University Press, 1998. ISBN: 9780521620413
    https://discovery.upc.edu/discovery/fulldisplay?docid=alma991000581539706711&context=L&vid=34CSUC_UPC:VU1
  • Bioinformatics algorithms. An active learning approach. - Compeau, Phillip P; Pevzner, Pavel., Active Learning Publishers. , 2014. ISBN: 9780990374602
  • Problems and Solutions in Biological Sequence Analysis - Borodovsky, Mark; Ekisheva, Svetlana, Cambridge University Press, 2006. ISBN: 978-0521847544
  • The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing - Lemey,P; Salemi, M; Vandamme, A, Cambridge, 2009. ISBN: 978-0521730716

Web links

  • Will be provided during the lecture presentations http://None

Previous capacities

Applied Programming I, II and III