Computational Genomics

You are here

Credits
6
Types
Compulsory
Requirements
This subject has not requirements, but it has got previous capacities
Department
UB
Mail
In the Computational Genomics course, different bioinformatic approaches will be used to understand the Biology of the genome sequences under study (for instance, from sequencing data on DNA, RNA, ChIP, etc.). With an eminently genomic focus, the idea is to apply computational methods to analyze the structure and function of sequences, with particular emphasis on the process of annotating functional elements at the genomic level, such genes and their regulatory regions.

Teachers

Person in charge

  • Josep Francesc Abril Ferrando ( )

Weekly hours

Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6

Objectives

  1. To acquire advanced knowledge on the Computational Genomics Field.
    Related competences: K1, K2, K3, K7,
  2. Understand the computational protocols, which parameters affect the outcomes from the bioinformatics tools used in the field, and properly interpret the results of the analyses.
    Related competences: C2, C3, C4, S1, S2, S3, S5, S7, S8,

Contents

  1. Introduction to Computational Genomics
    Sequences and annotations; basic data formats; sequence ontology; sequence basic stats and biases; sequencing methods (from Sanger to single molecule technologies) and range of applications (DNA-seq, RNA-eq, ChIP-seq, chromatin conformation, ...); sequencing quality and coverage. Sequence repositories (NCBI, EnsEMBL/BioMart, UCSC, others).
  2. Sequence Analysis
    Alphabets and strings; sequence complexity, entropy, and information content; k-mer analysis; repetitive elements (types, detection, and masking).
  3. Sequence Assembly
    Genomes, transcriptomes, and meta-genomes assemblies; reads, contigs, scaffolds, and chromosomes; assembly algorithms, from prefix-suffix alignment to assembly graphs; de-Bruijn graphs; compression-based string matching: mapping reads over assemblies (DNA aligners); assembly assessment metrics, N50, completeness (CEGMA, BUSCO); accuracy assessment of assembler tools (GAGE, Assemblathon) and DNA aligners (RGASP).
  4. Sequence Models
    Consensus sequence. Modeling signals and content: regular expressions, position weight matrices (PWMs), Markov chains, hidden Markov models (HMMs), other models.
  5. Computational Gene Finding
    The genome landscape: signals, exons, genes, regulatory elements, chromatin marks, etc...; comparing gene-finding on prokaryota vs eukaryota; computational gene-finding approaches: ab-initio, similarity-based, homology-based, comparative genomics, NGS; dynamic programming to assemble exons; generalized hidden Markov models (GHMMs); phylogenetic models (phyloHMMs); prediction of non-cannonical features: selenoproteins, pseudogenes, non-coding RNAs, ...; GF accuracy assessment: metrics (sensitivity/specificity, ...), benchmarks (*GASP).
  6. Regulatory Elements Prediction
    Regulatory elements: regulatory programs (network complexity, space and time compartimentalization), transcription factors and transcription factor binding sites (TFBSs), promoters, enhancers; pattern matching (TranFac/Jaspar/Oreganno); pattern discovery (PEAKS, MEME); phylogenetic footprinting; NGS-approaches, decyphering epigenetic code with ChIP-seq; annotating chromatin conformation over genomic sequences.
  7. Functional Annotation
    From sequence to function: genes, transcripts, and proteins; gene ontologies (GO, KO); annotating domains: patterns (PROSITE), profile HMMs (PFAM, RFAM), homology-based approaches: BLAST searches versus NOG models; meta-genomic samples: ecological network functional components, species composition and diversity measures.
  8. Managing Annotations' Data
    Annotation pipelines: manual curation procedures (NCBI, VEGA), Maker, Galaxy, EnsEMBL; visualization paradigms: from gff2ps to circos, from command-line tools to graphical interfaces (Apollo, IGV), genome browsers (EnsEMBL, UCSC-Genome browser, GBrowse/JBrowse); distributed annotation systems; database tracks versus custom tracks.

Activities

Activity Evaluation act


End Term Exam


Objectives: 1 2
Week: 18
Theory
2h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
0h




Teaching methodology

Conceptual materials necessary to understand the topics of the Computational Genomics field will be delivered as face-to-face lectures (theory sessions). Then, each student will play with example cases and/or protocols to apply some of the ideas exposed in the theory sessions (practical sessions), which can lead to further work outside the classroom in order to complete the requested exercises for the continued evaluation (exercise submissions to the Virtual Campus).

Independent individual study and work efforts will be necessary, in very varying amounts depending on the profit and capability of each student, in order to absorb and extend, if needed, the essentials of the concepts provided in class.

Teaching Resources:
+ Lecture notes: Slides will be made available before the classes from Virtual Campus.
+ Practicals: Materials for the exercises will be available from Virtual Campus.
+ Links to further resources will be accessible through the Virtual Campus.

Evaluation methodology

Evaluation of academic performance for this subject will be based on these two blocs:

+ Practicals (Continued Evaluation): Students must submit to the Virtual Campus several exercises that will be proposed all along the practical sessions. Details about the formatting and submission procedure will be provided on the first practical session. Students will have about a week to submit each exercise through the links provided on the Virtual Campus. This part does not include any re-evaluation exam, as the scores are based on the assessment of the submitted exercises that must be delivered along the quarter.

+ Lectures (End Term Synthesis Exam): Theoretical lectures will be assessed by a synthesis exam to be realized at the end of the term on the date assigned in the calendar. Only those students failing this exam can present to the Re-Evaluation exam, if they had a minimum score of 2.5 out of 10, also on the date assigned in the calendar for this purpose. The grade of the Re-Evaluation exam will replace that of the Synthesis Test.

With regard to the Honor Code that students agreed to follow, any attempt of copy detected during the exams (End-Term or Re-Evaluation) will imply the FAILURE of the course. Furthermore, tasks to be submitted individually cannot be solved in groups and each student is responsible for her/his deliverables.

The final mark is obtained by summing up the continued evaluation score (60%) and the end term score (40%), once the end term or the reassessment test has been passed.
To pass the course requires a minimum score of 5 out of 10, once all the grades have been aggregated.

Bibliography

Basic:

  • Introduction to genomics - Lesk, A.M., Oxford University Press, 2012.
  • Introduction to computational genomics: a case studies approach - Cristianini, N.; Hahn, M.W., Cambridge University Press, 2007.
  • Bioinformatics: sequence and genome analysis - Mount, D.W., Cold Spring Harbor Laboratory Press, 2004.
  • An Introduction to Bioinformatics Algorithms - Jones, N.C.; Pevzner, P.A., The MIT Press, 2004.
  • Genómica Computacional - Blanco García, E., UOC, 2013.

Complementary:

  • Encyclopedia of Bioinformatics and Computational Biology - Ranganathan, S.; Nakai, K.; Schönbach, C.; Gribskov, M. (editors), Elsevier Inc , 2019.
  • Concise Encyclopaedia of Bioinformatics and Computational Biology - Hancock, J.M.; Zvelebil, M.J. (editors), Wiley Blackwell , 2014.
  • Methods for Computational Gene Prediction - Majoros ,W.H., Cambridge University Press , 2007.
  • Handbook of Hidden Markov Models in Bioinformatics - Gollery, M., Chapman & Hall/CRC Press , 2008.
  • Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids - Durbin, R.; Eddy, S.R.; Krogh, A.; Mitchinson, G., Cambridge University Press , 1998.
  • Algorithms on Strings, Trees and Sequences - Gusfield, D., Cambridge University Press , 1997.
  • Bioinformatics: a practical guide to the analysis of genes and proteins - Baxevanis, A.D.; Ouelette, B.F.F. (editors)., Wiley , 2005.
  • Discovering genomics, proteomics, and bioinformatics - Campbell, A.M.; Heyer, L.J., Benjamin Cummings , 2007.
  • Bioinformatics and functional genomics - Pevsner, J. , Wiley-Blackwell , 2009.
  • Developing bioinformatics computer skills - Gibas, C.; Jambeck, P. , O¿Reilly , 2001.
  • Sequence analysis in a nutshell: a guide to tools and databases - Markel, S.; Leon, D., O¿Reilly , 2003.
  • UNIX and Perl to the Rescue! - Bradnam, K.; Korf, I., Cambridge University Press , 2012.

Previous capacities

Prior knowledge of Unix, Perl/Python, R, and MarkDown is recommended, as well as some ground concepts on Molecular Genetics and Genomics.