The course presents basic algorithmic techniques that are applied in Bioinformatics problems, with a view on the strengths and limitations of these techniques. It also describes common data elements and formats used to represent biological data. During the course students will acquire the knowledge to deal with programming problems of a biological nature of small and medium complexity, making sensible choices of standard packages or pragmatic algorithmic implementations for specific problems.
At the end of the course, students:
1 will know the basic concepts of programming, algorithmics and information management in solving problems of a biological nature through computer programs.
2. will be able to use the main algorithmic schemes and some of their variants that frequently appear in common Bioinformatics problems,
3. will recognize the cases of application of the main methods used in Bioinformatics to access data stored in computers, with special attention to efficient mechanisms for sequence treatment.
4. will know how to integrate access to large biological databases with access to other local information structures and combine them appropriately with the necessary algorithmic concepts.
5. will know how to interface to external tools and use common libraries that extend functionality and improve performance of Python programs.
The programming language used in this course is Python, which will be complemented with the occasional use of tools from the Operating System or external applications.
Teachers
Person in charge
Alexis Molina Martinez de los Reyes (
)
Gabriel Valiente Feruglio (
)
Miquel Angel Senar Rosell (
)
Weekly hours
Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6
Learning Outcomes
Learning Outcomes
Knowledge
K3 - Identify the mathematical foundations, computational theories, algorithmic schemes and information organization principles applicable to the modeling of biological systems and to the efficient solution of bioinformatics problems through the design of computational tools.
K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.
K5 - Identify the nature of the biological variables that need to be analyzed, as well as the mathematical models, algorithms, and statistical tests appropriate to develop and evaluate statistical analyses and computational tools.
Skills
S2 - Computationally analyze DNA, RNA and protein sequences, including comparative genome analyses, using computation, mathematics and statistics as basic tools of bioinformatics.
S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.
S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.
Competences
C6 - Detect deficiencies in the own knowledge and overcome them through critical reflection and the choice of the best action to expand this knowledge.
Objectives
Understand how to build a program and use additional tools to solve problems that use bioinformatics data.
Related competences:
C6,
K3,
K4,
S7,
Understand the format and semantics of basic data structures used to represent biological data: sequencies, genomes,...
Related competences:
K3,
K5,
S2,
C6,
Understand the most common operations that apply to bioinformatics data files and develop programs to perform them.
Related competences:
C6,
K3,
K4,
K5,
S2,
S7,
Understand basic algorithm principles that are used to solve sequence alignment and pattern matching problems.
Related competences:
K3,
K4,
K5,
S2,
S7,
C6,
Analyze solutions regarding time and memory cost and use programming components to improve performance.
Related competences:
C6,
K3,
S7,
S8,
Contents
Introduction and Python recap
Python basics, flow control, functions, lists, dictionaries and structured data.
Making sense of sequencies and parsing mechanisms
Sequences, Strings, and the Genomic Data. Sequences file formats. Biological databases. Sequence and genomic data manipulation: biopython and other common tools.
Motifs and kmers
Motifs and kmers. Basic string matching. Consensus sequences. Motif finding. Motif discovery tools.
Pattern searching and Regular Expressions
Finding patterns. Finding patterns with Regular Expressions. Creating and matching regex objects.
String manipulation and sequence alignment
String manipulation: indexing, joining, slicing. Alignment algorithms and dynamic programming. Alignment software and alignment statistics. Genomic data and file formats.
Multiple sequence alignment and phylogenetics
MSA file formats and MSA methods. Phylogenetics trees: representation and basic operation.
Miscellaneous topics
OS interfacing with Python and module usage. Improving speed of Python scripts.
Activities
ActivityEvaluation act
Introduction and Python recap
Solving problems with Python Objectives:12 Contents:
Making sense of sequencies, motifs and kmers. Parsing mechanisms
Representation of sequences and genomic data. File formats to store biological data. External databases. Programs using Biopython and other common tools to manipulate biodata. Objectives:123 Contents:
Finding patterns of text without regular expressions. Finding patterns with Regular Expressions. Using regex objects in Python. Objectives:234 Contents:
Common methods used to solve the Multiple Sequence Alignment problem. Using MSA files. Representation and basic operation of Phylogenetics trees. Objectives:2345 Contents:
Interfaz de Python con el sistema operativo y otros módulos externos. Mejora de la velocidad de los programas Python: Cython, numpy,... Objectives:15 Contents:
During theoretical sessions, the professor will expose programming concepts, combined with examples and problem solving.
During problem-solving sessions, students will work on their own solving problems on a computer system, under supervision and assistance of the professor when needed.
Evaluation methodology
There will be two exams: a mid-term exam and a final exam
In addition, there will be some evaluable problem tests taken during problem sessions, announced in advance.
FinalScore = 0.20*NP + 0.80*max(EF, 0.35*EP+0.65*EF)
where:
NP : Problem score. Short problem tests taken during problem sessions
EP: Partial exam score
EF: Final exam score
Bibliography
Basic:
Python for the Life Sciences. A Gentle Introduction to Python for Life Scientists -
Lancaster, Alexander and Webster, Gordon,
Apress Berkeley, 2019. ISBN: 9781484245224