Applied Programming III

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
CS;UAB
The course presents basic algorithmic techniques that are applied in Bioinformatics problems, with a view on the strengths and limitations of these techniques. It also describes common data elements and formats used to represent biological data. During the course students will acquire the knowledge to deal with programming problems of a biological nature of small and medium complexity, making sensible choices of standard packages or pragmatic algorithmic implementations for specific problems.
At the end of the course, students:
1 will know the basic concepts of programming, algorithmics and information management in solving problems of a biological nature through computer programs.
2. will be able to use the main algorithmic schemes and some of their variants that frequently appear in common Bioinformatics problems,
3. will recognize the cases of application of the main methods used in Bioinformatics to access data stored in computers, with special attention to efficient mechanisms for sequence treatment.
4. will know how to integrate access to large biological databases with access to other local information structures and combine them appropriately with the necessary algorithmic concepts.
5. will know how to interface to external tools and use common libraries that extend functionality and improve performance of Python programs.
The programming language used in this course is Python, which will be complemented with the occasional use of tools from the Operating System or external applications.

Teachers

Person in charge

  • Alexis Molina Martinez de los Reyes ( )
  • Gabriel Valiente Feruglio ( )
  • Miquel Angel Senar Rosell ( )

Weekly hours

Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6

Learning Outcomes

Learning Outcomes

Knowledge

  • K3 - Identify the mathematical foundations, computational theories, algorithmic schemes and information organization principles applicable to the modeling of biological systems and to the efficient solution of bioinformatics problems through the design of computational tools.
  • K4 - Integrate the concepts offered by the most widely used programming languages in the field of Life Sciences to model and optimize data structures and build efficient algorithms, relating them to each other and to their application cases.
  • K5 - Identify the nature of the biological variables that need to be analyzed, as well as the mathematical models, algorithms, and statistical tests appropriate to develop and evaluate statistical analyses and computational tools.

Skills

  • S2 - Computationally analyze DNA, RNA and protein sequences, including comparative genome analyses, using computation, mathematics and statistics as basic tools of bioinformatics.
  • S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.
  • S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.

Competences

  • C6 - Detect deficiencies in the own knowledge and overcome them through critical reflection and the choice of the best action to expand this knowledge.

Objectives

  1. Understand how to build a program and use additional tools to solve problems that use bioinformatics data.
    Related competences: C6, K3, K4, S7,
  2. Understand the format and semantics of basic data structures used to represent biological data: sequencies, genomes,...
    Related competences: K3, K5, S2, C6,
  3. Understand the most common operations that apply to bioinformatics data files and develop programs to perform them.
    Related competences: C6, K3, K4, K5, S2, S7,
  4. Understand basic algorithm principles that are used to solve sequence alignment and pattern matching problems.
    Related competences: K3, K4, K5, S2, S7, C6,
  5. Analyze solutions regarding time and memory cost and use programming components to improve performance.
    Related competences: C6, K3, S7, S8,

Contents

  1. Introduction and Python recap
    Python basics, flow control, functions, lists, dictionaries and structured data.
  2. Making sense of sequencies and parsing mechanisms
    Sequences, Strings, and the Genomic Data. Sequences file formats. Biological databases. Sequence and genomic data manipulation: biopython and other common tools.
  3. Motifs and kmers
    Motifs and kmers. Basic string matching. Consensus sequences. Motif finding. Motif discovery tools.
  4. Pattern searching and Regular Expressions
    Finding patterns. Finding patterns with Regular Expressions. Creating and matching regex objects.
  5. String manipulation and sequence alignment
    String manipulation: indexing, joining, slicing. Alignment algorithms and dynamic programming. Alignment software and alignment statistics. Genomic data and file formats.
  6. Multiple sequence alignment and phylogenetics
    MSA file formats and MSA methods. Phylogenetics trees: representation and basic operation.
  7. Miscellaneous topics
    OS interfacing with Python and module usage. Improving speed of Python scripts.

Activities

Activity Evaluation act


Introduction and Python recap

Solving problems with Python
Objectives: 1 2
Contents:
Theory
4h
Problems
4h
Laboratory
0h
Guided learning
0h
Autonomous learning
12h

Making sense of sequencies, motifs and kmers. Parsing mechanisms

Representation of sequences and genomic data. File formats to store biological data. External databases. Programs using Biopython and other common tools to manipulate biodata.
Objectives: 1 2 3
Contents:
Theory
4h
Problems
4h
Laboratory
0h
Guided learning
0h
Autonomous learning
12h

Pattern searching and Regular Expressions

Finding patterns of text without regular expressions. Finding patterns with Regular Expressions. Using regex objects in Python.
Objectives: 2 3 4
Contents:
Theory
4h
Problems
4h
Laboratory
0h
Guided learning
0h
Autonomous learning
12h

String manipulation and sequence alignment

Common string manipulation actions: indexing, joining, slicing, searching, inserting. Basic alignment algorithms and dynamic programming implementations. Manipulation of genomic data generated with alignment tools.
Objectives: 2 3 4 5
Contents:
Theory
8h
Problems
8h
Laboratory
0h
Guided learning
0h
Autonomous learning
24h

Multiple sequence alignment and phylogenetics

Common methods used to solve the Multiple Sequence Alignment problem. Using MSA files. Representation and basic operation of Phylogenetics trees.
Objectives: 2 3 4 5
Contents:
Theory
6h
Problems
6h
Laboratory
0h
Guided learning
0h
Autonomous learning
18h

Miscellaneous topics

Interfaz de Python con el sistema operativo y otros módulos externos. Mejora de la velocidad de los programas Python: Cython, numpy,...
Objectives: 1 5
Contents:
Theory
4h
Problems
4h
Laboratory
0h
Guided learning
0h
Autonomous learning
12h

Teaching methodology

During theoretical sessions, the professor will expose programming concepts, combined with examples and problem solving.
During problem-solving sessions, students will work on their own solving problems on a computer system, under supervision and assistance of the professor when needed.

Evaluation methodology

There will be two exams: a mid-term exam and a final exam
In addition, there will be some evaluable problem tests taken during problem sessions, announced in advance.
FinalScore = 0.20*NP + 0.80*max(EF, 0.35*EP+0.65*EF)
where:
NP : Problem score. Short problem tests taken during problem sessions
EP: Partial exam score
EF: Final exam score

Bibliography

Basic:

Complementary:

  • Python Programming for Biology - Stevens, Tim J., Cambridge University Press , 2015. ISBN: 9780511843556

Previous capacities

Applied Programming I
Applied Programming II
Introduction to Bioinformatics