Skip to main content

Algorithms in Biology

Credits
6
Types
Compulsory
Requirements
This subject has not requirements , but it has got previous capacities
Department
UPF;UAB
This course presents the fundamentals of sequence analysis of biological sequence data, from the basic algorithms to their main applications.

The subject consists in three main blocks:
- Dynamic programming and Sequence alignment: Dynamic programming. Pairwise alignment (Needleman-Wunsch and Smith-Waterman algorithms). BLAST. Multiple sequence alignment. Other applications.
- Genomic data analysis: Sequencing Technologies. Computational genomics. Main file formats for sequence data. Approximate string matching aligners for sequencing reads. Genome assembly algorithms and strategies.
- Clustering Methods and Algorithms in Genomics: Hidden-Markov Models (HMM). Principal Component Analysis (PCA), Parsimony. Maximum Likelihood Methods. Genetic Algorithms.

The programming language used in this course is Python with special emphasis on solving applied genomics and clustering problems. Following a problem-based learning approach, the students will write their own scripts and/or use pre-existing bioinformatic approaches for different challenges. We will encourage the use of python libraries (for statistics and plots) and classes.

Teachers

Person in charge

Others

Weekly hours

Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6

Competences

Knowledge

  • K1 - Recognize the basic principles of biology, from cellular to organism scale, and how these are related to current knowledge in the fields of bioinformatics, data analysis, and machine learning; thus achieving an interdisciplinary vision with special emphasis on biomedical applications.
  • K2 - Identify mathematical models and statistical and computational methods that allow for solving problems in the fields of molecular biology, genomics, medical research, and population genetics.
  • K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.
  • K7 - Analyze the sources of scientific information, valid and reliable, to justify the state of the art of a bioinformatics problem and to be able to address its resolution.
  • Skills

  • S1 - Integrate omics and clinical data to gain a greater understanding and a better analysis of biological phenomena.
  • S2 - Computationally analyze DNA, RNA and protein sequences, including comparative genome analyses, using computation, mathematics and statistics as basic tools of bioinformatics.
  • S3 - Solve problems in the fields of molecular biology, genomics, medical research and population genetics by applying statistical and computational methods and mathematical models.
  • S4 - Develop specific tools that enable solving problems on the interpretation of biological and biomedical data, including complex visualizations.
  • S5 - Disseminate information, ideas, problems and solutions from bioinformatics and computational biology to a general audience.
  • S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.
  • S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.
  • Competences

  • C2 - Identify the complexity of the economic and social phenomena typical of the welfare society and relate welfare to globalization, sustainability and climate change in order to use technique, technology, economy and sustainability in a balanced and compatible way.
  • C3 - Communicate orally and in writing with others in the English language about learning, thinking and decision making outcomes.
  • C4 - Work as a member of an interdisciplinary team, either as an additional member or performing managerial tasks, in order to contribute to the development of projects (including business or research) with pragmatism and a sense of responsibility and ethical principles, assuming commitments taking into account the available resources.
  • Objectives

    1. Present their work in front of their coleagues
      Related competences: C3,
    2. Collaborate with other students to conduct a project assignment
      Related competences: C4,
    3. Development of mathematical models for working with biological sequences during the practical assignments using Phyton programming language. Different tools will be provided for visualizing the results.
      Related competences: K2, K4, K7, S1, S2, S3, S4, S5, S7, S8,
    4. Generating optimal programming skills for minimizing computational time and the fingerprint of global climate change
      Related competences: C2,
    5. Understanding how sequence alignment and phylogenetics can be applied to medicine.
      Related competences: K1,

    Contents

    1. Theoretical Contents
      T1. Introduction to sequence alignment
      T2. Scoring functions
      T3. Global and Local Pairwise Sequence Alignment (Dynamic Programming)
      T4. Basic Local Alignment Tool (BLAST)
      T5. Advanced dynamic programming
      T6. Multiple Sequence Alignment
      T7. Sequencing Technologies and Computational Genomics Foundations
      T8. Short Read Alignment and Compressed Indexing
      T9. Genome Assembly Algorithms
      T10. Introduction to Phylogenetic Trees and Algorithms
      T11. Distance-Based Methods
      T12. Character-Based Methods

    Activities

    Activity Evaluation act


    Introduction to sequence alignment


    Objectives: 3
    Contents:
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    BLAST


    • Problems: 2 Groups of Students

    Contents:
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.

    Approximate string matching aligners for short reads. Fundamentals of Burrow-Wheeler Transform. Introduction Long read alignment.
    Objectives: 3
    Contents:
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.

    De novo genome assembly. Short read assembly: Debruijn graph and Overlap-layout consensus. Long Read and Hybrid Assembly. Scaffolding.
    Objectives: 3
    Contents:
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Sequencing Technologies and Computational Genomics Foundations


    Objectives: 3 4
    Contents:
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Basics of Phylogenetics. Basic Algorithms in Phylogenetics.

    Basics of Phylogenetics. Basic Algorithms in Phylogenetics.
    Objectives: 5
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Phylogenetics Distance based methods.

    Phylogenetics Distance based methods.
    Objectives: 3 5
    Contents:
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.

    Character based methods. Parsimony, maximum likelihood & Bayesian Phylogenetics.
    Objectives: 3 4 5
    Contents:
    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Group project in algorithms and Bioinformatic applications.

    Group project in algorithms and Bioinformatic applications.
    Objectives: 1 2 4
    Contents:
    Theory
    2.4h
    Problems
    2.4h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    18h

    Funcions de puntuació



    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Global and Local Pairwise Sequence Alignment



    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Advanced dynamic programming



    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Multiple Sequence Alignment



    Theory
    2.3h
    Problems
    2.3h
    Laboratory
    0h
    Guided learning
    0h
    Autonomous learning
    6h

    Teaching methodology

    Problem-based learning approach:

    - Theoretical lectures.
    - Practical programming exercises directly related to theory.
    - Group project in algorithms and Bioinformatic applications.

    Evaluation methodology

    - Continuous Assessment (CA) ¿ 20%: Quizzes and submission of exercises.

    - Group Project (GP) 20%: Assessed using a rubric that will be published on the course Moodle page.

    - Exams 60%: Mid-term Exam (ME) 30%, Final Exam (FE) 30%. Evaluation rubrics for the exams will be published on the course Moodle page.

    - Retake: Consists of two exams (E1 and E2), corresponding to each subject block. The final grade after the retake will be calculated as: 20% CA + 20% GP + 30% max(ME, E1) + 30% max(FE, E2). + 30% max(ME, E1) + 30% max(FE, E2).

    Bibliography

    Basic

    Previous capacities

    Applied Programming I, II and III