The subject of Databases is divided into two parts:
The first part of the subject will introduce the different types of data available in the field of health and life sciences, highlighting their specific characteristics, their heterogeneity and integration needs in order to solve scientific questions and clinics in the area. The main knowledge portals and databases for accessing and exploiting this type of data will be presented in practical sessions to give students the experience to use these resources.
The second part introduces the basic concepts of Databases (DB) needed both at the level of the DB designer and the user. Initially, the paradigms and entity-relationship design of databases will be studied and finally the manipulation of data using SQL.
Teachers
Person in charge
Oscar Conchillo Sole (
)
Others
Carles Sanchez Ramos (
)
Irepan Salvador Martínez (
)
Xim Cerda Company (
)
Weekly hours
Theory
2
Problems
2
Laboratory
0
Guided learning
0
Autonomous learning
6
Learning Outcomes
Learning Outcomes
Knowledge
K1 - Recognize the basic principles of biology, from cellular to organism scale, and how these are related to current knowledge in the fields of bioinformatics, data analysis, and machine learning; thus achieving an interdisciplinary vision with special emphasis on biomedical applications.
K2 - Identify mathematical models and statistical and computational methods that allow for solving problems in the fields of molecular biology, genomics, medical research, and population genetics.
K7 - Analyze the sources of scientific information, valid and reliable, to justify the state of the art of a bioinformatics problem and to be able to address its resolution.
Skills
S4 - Develop specific tools that enable solving problems on the interpretation of biological and biomedical data, including complex visualizations.
S5 - Disseminate information, ideas, problems and solutions from bioinformatics and computational biology to a general audience.
S7 - Implement programming methods and data analysis based on the development of working hypotheses within the area of study.
S8 - Make decisions, and defend them with arguments, in the resolution of problems in the areas of biology, as well as, within the appropriate fields, health sciences, computer sciences and experimental sciences.
S9 - Exploit biological and biomedical information to transform it into knowledge; in particular, extract and analyze information from databases to solve new biological and biomedical problems.
Competences
C2 - Identify the complexity of the economic and social phenomena typical of the welfare society and relate welfare to globalization, sustainability and climate change in order to use technique, technology, economy and sustainability in a balanced and compatible way.
C3 - Communicate orally and in writing with others in the English language about learning, thinking and decision making outcomes.
C4 - Work as a member of an interdisciplinary team, either as an additional member or performing managerial tasks, in order to contribute to the development of projects (including business or research) with pragmatism and a sense of responsibility and ethical principles, assuming commitments taking into account the available resources.
Objectives
Acquisition of the basic knowledge of the most common kinds of biological and biomedical information and the methods to store and access them
Related competences:
C2,
C3,
C4,
K1,
K7,
S4,
S5,
S7,
S8,
S9,
Combine multiple types of data, from different sources, to solve biological problems
Related competences:
C3,
C4,
K1,
K2,
K7,
S4,
S5,
S7,
S8,
S9,
Learn the principles of Graph Theory and Network analysis and their application to Genomics and Proteomics
Related competences:
C3,
C4,
K1,
K2,
S4,
S5,
S7,
S9,
Database manipulation using SQL programing.
Related competences:
C4,
K1,
K2,
K7,
S4,
S5,
S7,
S8,
S9,
C3,
Contents
Topic 1 - Introduction and basic concepts in Biological Data Bases
Presentation of the course and introduction to the subject. Organizing biological knowledge in databases. Technical concepts and definitions. Different classifications of databases according to type of data. Hierarchical organization of life and levels of annotation. Tools for online databases search and data retrieval. Formats to store and present biological datra
Topic 2 - Peptide and Nucleotide sequences public databases.
Uniprot and NCBI as examples of repositories for protein and nucleotide sequences. Advanced, filtered and refined searches.
Topic 3 - Protein structure and derived databases.
Databases for protein structures (PDB). Secondary, derived and specialized protein databases such as domains (PFAM and CATH) and predicted structure models (AlphaFold DB)
Topic 4. Genes, Genomes and Functional Genomics
Retrieving gene information from the NCBI Gene. Browsing genome information at Ensembl & the UCSC genome browser. Functional genomics: ENCODE, GTEx. Functional genomics databases at EMBL-EBI: ArrayExpress and Expression Atlas.
Topic 5. Single Cell Expression, Networks and Pathways
Functional genomics at single cell resolution: Single Cell Expression Atlas. Network representation and analysis. Molecular interaction networks: IntAct and other databases. Visualization of networks and pathways in Cytoscape.
Topic 6. Drug discovery and Data integration.
The drug discovery timeline. Databases: Open Targets and InterMine. Data Integration: Standards,
ontologies, ID mapping and metadata.
Topic 7 - Database paradigm
Second part of the course presentation. Basic and introductory concepts in databases will be studied, as well as their architecture.
It will consist of self-learning SQL queries using a self-assessment module available in Caronte. The student will upload the queries to Caronte in a specific format so that their results can be evaluated. Objectives:5 Week:
18 (Outside class hours)
In the first part of the course:
Lectures will be mainly of expository type. There will be also practical sessions problem-based.
In the scond part of the course:
The final objective of the subject is for students to be able to design and manipulate relational databases in the context of current IT applications. For this reason, face-to-face classes will be highly practical and will focus on students consolidating the knowledge that is the learning objective of this subject.
The general methodology of the subject can be divided into three activities:
PRIOR PREPARATION. The aim is that the students can learn the concepts that will be worked on in the following session through various activities proposed by the teaching staff, such as watching videos, reading texts, etc. All necessary material (statements, BD scripts, problem results) will be available in the CV.
PRESENCE CLASS. The aim is to consolidate the concepts seen and put them in value within the context of the subject. The teaching staff will ensure that students deepen these concepts through (more or less) guided exercises during the session. That is why face-to-face classes will be held in 2 weekly sessions of 2 hours each.
AUTONOMOUS WORK. Self-learning of typical SQL queries. It will consist of a self-learning of SQL queries using a self-assessment module available in Caronte. The student will upload the queries to Caronte in a specific format so that their results can be evaluated.
The statements of the practices are available on the website in pf and Caronte (http://caronte.uab.cat).
Evaluation methodology
For the evaluation of the subject, the grade of the first partial exam (E1) (40%), the grade of the second partial exam (E2) (40%), the average grade of the end-of-subject practical quizzes will be taken into account ( Q)(8%) and active class participation (CP)(2%) for the first part of the course, and SQL self-learning practical modules (SL) for the second part of the course (10%)
Following the following formula: E1*0.4+ Q*0.08 + CP*0.02 + E2*0.4 + SL*0.1
To be able to access the partial exams, the student must have participated in 50% of the end-of-theme practical quizzes (Q) and in the SQL practical modules (SL).
Those students who do not pass the subject can take the recovery exam. It will have two parts, ER1 and ER2 which will weight only 80% in the final grade as follows:
-if E2 >=5, the grade will be ER1*0.8*0.4+E2*0.4+Q*0.08+CP*0.02+ SL*0.1
-if E1>=5, the grade will be E1*0.4+ER2*0.8*0.4+Q*0.08+CP*0.02+ SL*0.1
-if E1<5 and E2<5, the grade will be ER1*0.8*0.4+ER2*0.8*0.4+Q*0.08+CP*0.02 + SL*0.1
A student who cannot attend the E1 and/or E2 partial exams for duly substantiated reasons, will take the ER1 and/or ER2 exam, but these will count 100% for the calculation of the final grade as follows:
ER1*0.4+ER2*0.4+Q*0.08+CP*0.02 + SL*0.1
A student who has access to the calculation of the final grade and this is greater than 5 will be considered passed.
Special Interest Group in Management of Data. Grup de l'ACM (Association of Computer Machinery) que realitza activitats sobre Base de Dades, organitza congressos i edita revistes sobre el tema. http://www.acm.org/sigmod
Web d'Oracle Iniciativa Acadèmica (OAI) amb molta informació d'interès relativa a les facilitats que proporciona la Iniciativa Acadèmica als alumnes de la UAB. https://oai.oracle.com/
As a national resource for molecular biology information, NCBI's mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. More specifically, the NCBI has been charged with creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics; facilitating the use of such databases and software by the research and medical community; coordinating efforts to gather biotechnology information both nationally and internationally; and performing research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules. https://www.ncbi.nlm.nih.gov
RCSB PDB (RCSB.org) is the US data center for the global Protein Data Bank (PDB) archive of 3D structure data for large biological molecules (proteins, DNA, and RNA) essential for research and education in fundamental biology, health, energy, and biotechnology. https://www.rcsb.org
The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information. https://www.uniprot.org
Previous capacities
It is recommended that the student has knowledge and skills in:
Having completed a subject of Biochemistry and molecular biology in any degree related to biological sciences
Programming in third-generation languages (C, Pascal, Basic, etc.)
Basic data structures.