CT4 - Capacity for managing the acquisition, the structuring, analysis and visualization of data and information in the field of specialisation, and for critically assessing the results of this management.
Third language
CT5 - Achieving a level of spoken and written proficiency in a foreign language, preferably English, that meets the needs of the profession and the labour market.
Basic
CB6 - Ability to apply the acquired knowledge and capacity for solving problems in new or unknown environments within broader (or multidisciplinary) contexts related to their area of study.
CB7 - Ability to integrate knowledge and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.
CB10 - Possess and understand knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context.
Generic Technical Competences
Generic
CG4 - Design and implement data science projects in specific domains and in an innovative way
Technical Competences
Especifics
CE1 - Develop efficient algorithms based on the knowledge and understanding of the computational complexity theory and considering the main data structures within the scope of data science
CE2 - Apply the fundamentals of data management and processing to a data science problem
CE5 - Model, design, and implement complex data systems, including data visualization
CE6 - Design the Data Science process and apply scientific methodologies to obtain conclusions about populations and make decisions accordingly, from both structured and unstructured data and potentially stored in heterogeneous formats.
CE9 - Apply appropriate methods for the analysis of non-traditional data formats, such as processes and graphs, within the scope of data science
Objectives
Introduce the student to the algorithmic, computational, and statistical problems that arise in the analysis of biological data.
Related competences:
CB10,
CB6,
CB7,
CT4,
CT5,
CE5,
CE6,
CE9,
CG4,
Reinforce the knowledge of discrete structures, algorithmic techniques, and statistical techniques that the student may have from previous courses.
Related competences:
CT5,
CE1,
CE2,
CE9,
Contents
Introduction to bioinformatics
Computational biology and bioinformatics. Algorithms in bioinformatics. Strings, sequences, trees, and graphs. Algorithms on strings and sequences. Representation of trees and graphs. Algorithms on trees and graphs.
Agreement of phylogenetic trees
Partition distance. Nodal distance. Triplets distance. Transposition distance. Edit distance. Alignment of phylogenetic trees.
Phylogenetic reconstruction II
Phylogenetic networks. Galled trees. Tree-child networks. Tree-sibling networks. Time consistency of phylogenetic networks. A hierarchy of phylogenetic networks.
Phylogenetic reconstruction III
Phylogenies and taxonomies. Classification of metagenomic samples. The taxonomic assignment problem. Accuracy and coverage. The LCA skeleton tree.
Agreement of phylogenetic networks
Path multiplicity distance. Tripartition distance. Nodal distance. Triplets distance. Edit distance. Alignment of phylogenetic networks.
Introduction to statistical genetics
Basic genetic terminology. Population-based and family-based studies. Traits, markers and polymorphisms. Single nucleotide polymorphisms and microsatellites. R-package genetics.
Hardy-Weinberg equilibrium
Hardy-Weinberg law. Hardy-Weinberg assumptions. Multiple alleles. Statistical tests for Hardy-Weinberg equilibrium: chi-square, exact and likelihood-ratio tests. Graphical representations. Disequilibrium coefficients: the inbreeding coefficient, Weir's D. R-package HardyWeinberg.
Linkage disequilibrium
Definition of linkage disequilibrium (LD). Measures for LD. Estimation of LD by maximum likelihood. Haplotypes. The HapMap project. Graphics for LD. The LD heatmap.
Phase estimation
Phase ambiguity for double heterozygotes. Phase estimation with the EM algorithm. Estimation of haplotype frequencies. R-package haplo.stats.
Population substructure
Definition of population substructure. Population substructure and Hardy-Weinberg equilibrium. Population substructure and LD. Statistical methods for detecting substructure. Multidimensional scaling. Metric and non-metric multidimensional scaling. Euclidean distance matrices. Stress. Graphical representations.
Genetic association analysis
Disease-marker association studies. Genetic models: dominant, co-dominant and recessive models. Testing models with chi-square tests. The alleles test and the Cochran-Armitage trend test. Genome-wide assocation tests.
Family relationships and allele sharing
Identity by state (IBS) and Identity by descent (IBD). Kinship coefficients. Allele sharing. Detection of family relationships. Graphical representations.
Objectives:12 Week:
9 (Outside class hours) Type:
lab exam
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
15h
Final exam Statistical Genetics
Objectives:12 Week:
18 (Outside class hours) Type:
lab exam
Theory
0h
Problems
0h
Laboratory
3h
Guided learning
0h
Autonomous learning
15h
Teaching methodology
All classes consist of a theoretical session (a lecture in which the professor introduces new concepts or techniques and detailed examples illustrating them) followed by a practical session (in which the students work on the examples and exercises proposed in the lecture). On the average, two hours a week are dedicated to theory and one hour a week to practice, and the professor allocates them according to the subject matter. Students are required to take an active part in class and to submit the exercises at the end of each class.
Evaluation methodology
Students are evaluated during class, and in a final exam. Every student is required to submit one exercise each week, graded from 0 to 10, and the final grade consists of 50% for the exercises and 50% for the final exam, also graded from 0 to 10.
Basic knowledge of algorithms and data structures.
Basic knowledge of statistics.
Basic knowledge of the Python programming language.
Basic knowledge of the R programming language.