Algorithms in Biology

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
UPF;UAB
This course presents the fundamentals of sequence analysis of biological sequence data, from the basic algorithms to their main applications.

The subject consists in three main blocks:
- Dynamic programming and Sequence alignment: Dynamic programming. Pairwise alignment (Needleman-Wunsch and Smith-Waterman algorithms). BLAST. Multiple sequence alignment. Other applications.
- Genomic data analysis: Sequencing Technologies. Computational genomics. Main file formats for sequence data. Approximate string matching aligners for sequencing reads. Genome assembly algorithms and strategies.
- Clustering Methods and Algorithms in Genomics: Hidden-Markov Models (HMM). Principal Component Analysis (PCA), Parsimony. Maximum Likelihood Methods. Genetic Algorithms.

The programming language used in this course is Python with special emphasis on solving applied genomics and clustering problems. Following a problem-based learning approach, the students will write their own scripts and/or use pre-existing bioinformatic approaches for different challenges. We will encourage the use of python libraries (for statistics and plots) and classes.

Teachers

Person in charge

  • Arnau Cordomí Montoya ( )

Others

  • Donate Weghorn ( )
  • Emanuele Raineri ( )
  • Oscar Lao Grueso ( )

Weekly hours

Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6

Learning Outcomes

Learning Outcomes

Knowledge

  • K1 - Recognize the basic principles of biology, from cellular to organism scale, and how these are related to current knowledge in the fields of bioinformatics, data analysis, and machine learning; thus achieving an interdisciplinary vision with special emphasis on biomedical applications.
  • K2 - Identify mathematical models and statistical and computational methods that allow for solving problems in the fields of molecular biology, genomics, medical research, and population genetics.
  • K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.
  • K7 - Analyze the sources of scientific information, valid and reliable, to justify the state of the art of a bioinformatics problem and to be able to address its resolution.

Skills

  • S1 - Integrate omics and clinical data to gain a greater understanding and a better analysis of biological phenomena.
  • S2 - Computationally analyze DNA, RNA and protein sequences, including comparative genome analyses, using computation, mathematics and statistics as basic tools of bioinformatics.
  • S3 - Solve problems in the fields of molecular biology, genomics, medical research and population genetics by applying statistical and computational methods and mathematical models.
  • S4 - Develop specific tools that enable solving problems on the interpretation of biological and biomedical data, including complex visualizations.
  • S5 - Disseminate information, ideas, problems and solutions from bioinformatics and computational biology to a general audience.
  • S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.
  • S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.

Competences

  • C2 - Identify the complexity of the economic and social phenomena typical of the welfare society and relate welfare to globalization, sustainability and climate change in order to use technique, technology, economy and sustainability in a balanced and compatible way.
  • C3 - Communicate orally and in writing with others in the English language about learning, thinking and decision making outcomes.
  • C4 - Work as a member of an interdisciplinary team, either as an additional member or performing managerial tasks, in order to contribute to the development of projects (including business or research) with pragmatism and a sense of responsibility and ethical principles, assuming commitments taking into account the available resources.

Objectives

  1. Present their work in front of their coleagues
    Related competences: C3,
  2. Collaborate with other students to conduct a project assignment
    Related competences: C4,
  3. Development of mathematical models for working with biological sequences during the practical assignments using Phyton programming language. Different tools will be provided for visualizing the results.
    Related competences: K2, K4, K7, S1, S2, S3, S4, S5, S7, S8,
  4. Generating optimal programming skills for minimizing computational time and the fingerprint of global climate change
    Related competences: C2,
  5. Understanding how sequence alignment and phylogenetics can be applied to medicine.
    Related competences: K1,

Contents

  1. Theoretical Contents
    T1. Introduction to sequence alignment
    T2. Scoring functions
    T3. Global and Local Pairwise Sequence Alignment (Dynamic Programming)
    T4. Basic Local Alignment Tool (BLAST)
    T5. Advanced dynamic programming
    T6. Multiple Sequence Alignment
    T7. Sequencing Technologies and Computational Genomics Foundations
    T8. Short Read Alignment and Compressed Indexing
    T9. Genome Assembly Algorithms
    T10. Introduction to Phylogenetic Trees and Algorithms
    T11. Distance-Based Methods
    T12. Character-Based Methods

Activities

Activity Evaluation act


Introduction to sequence alignment


Objectives: 3
Contents:
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

BLAST


  • Problems: 2 Groups of Students

Contents:
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.

Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.
Objectives: 3
Contents:
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.

De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.
Objectives: 3
Contents:
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Sequencing Technologies and Computational Genomics Foundations


Objectives: 3 4
Contents:
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Basics of Phylogenetics. Basic Algorithms in Phylogenetics.

Basics of Phylogenetics. Basic Algorithms in Phylogenetics.
Objectives: 5
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Phylogenetics Distance based methods.

Phylogenetics Distance based methods.
Objectives: 3 5
Contents:
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.

Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.
Objectives: 3 4 5
Contents:
Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Group project in algorithms and Bioinformatic applications.

Group project in algorithms and Bioinformatic applications.
Objectives: 1 2 4
Contents:
Theory
2.4h
Problems
2.4h
Laboratory
0h
Guided learning
0h
Autonomous learning
18h

Funcions de puntuació



Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Global and Local Pairwise Sequence Alignment



Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Advanced dynamic programming



Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Multiple Sequence Alignment



Theory
2.3h
Problems
2.3h
Laboratory
0h
Guided learning
0h
Autonomous learning
6h

Teaching methodology

Problem-based learning approach:

- Theoretical lectures.
- Practical programming exercises directly related to theory.
- Group project in algorithms and Bioinformatic applications.

Evaluation methodology

- Continuous Assessment (CA) ¿ 20%: Quizzes and submission of exercises.

- Group Project (GP) 20%: Assessed using a rubric that will be published on the course Moodle page.

- Exams 60%: Mid-term Exam (ME) 30%, Final Exam (FE) 30%. Evaluation rubrics for the exams will be published on the course Moodle page.

- Retake: Consists of two exams (E1 and E2), corresponding to each subject block. The final grade after the retake will be calculated as: 20% CA + 20% GP + 30% max(ME, E1) + 30% max(FE, E2). + 30% max(ME, E1) + 30% max(FE, E2).

Bibliography

Basic:

Previous capacities

Applied Programming I, II and III