Service Oriented Business Intelligence

Professors
Hores setmanals
Competències
Objectius
Continguts
Activitats
Metodologia docent
Mètode d'avaluació
Bibliografia
Capacitats prèvies

Crèdits

6

Tipus

IT4BI: Obligatòria
MIRI: Complementària d'especialitat (Enginyeria de Serveis)

Requisits

Aquesta assignatura no té requisits, però té capacitats prèvies

Departament

ESSI

Web

https://learnsql.fib.upc.es/moodle

The main goal of this course is to analyze the technological and engineering needs of Big Data Management. The enabling technology for such a challenge is cloud services, which provide the elasticity needed to properly scale the infrastructure as the needs of the company grow. Thus, students will learn advanced data management techniques (i.e., NOSQL solutions) that also scale with the infrastructure. Being Big Data Management the evolution of Data Warehousing, such knowledge (see the corresponding subject in Data Science speciality for more details on its contents) is assumed in this course , which will specifically focus on the management of data Volume and Velocity.

On the one hand, to deal with high volumes of data, we will see how a distributed file system can scale to as many machines as necessary. Then, we will study different physical structures we can use to store our data in it. Such structures can be in the form of a file format at the operating system level, or at a higher level of abstraction. In the latter case, they take the form of either sets of key-value pairs, collections of semi-structured documents or column-wise stored tables. We will see that, independently of the kind of storage we choose, current highly parallelizable processing systems using funtional programming principles (typically based on Map and Reduce functions), whose processing framework can rely on temporal files (like Hadoop MapReduce) or in-memory structures (like Spark).

On the other hand, to deal with high velocity of data, we need some low latency system which processes either streams or micro-batches. However, nowadays, data production is already beyond processing technologies capacity. More data is being generated than we can store or even process on the fly. Thus, we will recognize the need of (a) some techniques to select subsets of data (i.e., filter out or sample), (b) summarize them maximizing the valuable information retained, and (c) simplify our algorithms to reduce their computational complexity (i.e., doing one single pass over the data) and provide an approximate answer.

Finally, the complexity of a Big Data project (combining all the necessary tools in a collaborative ecosystem), which typically involves several people with different backgrounds, requires the definition of a high level architecture that abstracts technological difficulties and focuses on functionalities provided and interactions between modules. Therefore, we will also analyse different software architectures for Big Data.

This course participates in a joint project conducted during the second semester together with VBP, SDM and CC. In VBP, the students will come up with a business idea related to Big Data Management and Analytics, which will be evaluated from a business perspective. In the three other courses the students have to implement a prototype meeting the business idea created. In CC, they will be introduced to the main concepts behind large-scale distributed computing based on a service-based model and will have to choose the right infrastructure for their prototype. In BDM and SDM, they will be introduced to specific data management techniques to deal with Volume and Velocity (BDM) and Variety (SDM). As final outcome, a working prototype must be delivered.

Professors

Responsable

Alberto Abello Gamazo ( )

Altres

Sergi Nadal Francesch ( )

Hores setmanals

Teoria

2

Problemes

0

Laboratori

1

Aprenentatge dirigit

0

Aprenentatge autònom

5.33

Competències

Generic Technical Competences

Generic

CG2 - Capability to lead, plan and supervise multidisciplinary teams.
CG5 - Capability to apply innovative solutions and make progress in the knowledge to exploit the new paradigms of computing, particularly in distributed environments.

Transversal Competences

Teamwork

CTR3 - Capacity of being able to work as a team member, either as a regular member or performing directive activities, in order to help the development of projects in a pragmatic manner and with sense of responsibility; capability to take into account the available resources.

Basic

CB7 - Ability to integrate knowledges and handle the complexity of making judgments based on information which, being incomplete or limited, includes considerations on social and ethical responsibilities linked to the application of their knowledge and judgments.

Technical Competences of each Specialization

Specific

CEC1 - Ability to apply scientific methodologies in the study and analysis of phenomena and systems in any field of Information Technology as well as in the conception, design and implementation of innovative and original computing solutions.
CEC2 - Capacity for mathematical modelling, calculation and experimental design in engineering technology centres and business, particularly in research and innovation in all areas of Computer Science.
CEC3 - Ability to apply innovative solutions and make progress in the knowledge that exploit the new paradigms of Informatics, particularly in distributed environments.

Objectius

Understand the differences and benefits of in-memory data management.
Related competences: CG5, CB7, CEC1, CEC2, CEC3,
Understand the execution flow of a distributed query.
Related competences: CG5, CB7, CEC1, CEC2, CEC3,
Identify the difficulties of scalability and parallelization.
Related competences: CG5, CB7, CEC1, CEC2, CEC3,
Design a distributed database using NoSQL tools.
Related competences: CG2, CG5, CB7, CEC1, CEC2, CEC3, CTR3,
Produce a functional program to process Big Data in a Cloud environment.
Related competences: CG2, CG5, CB7, CEC1, CEC2, CEC3, CTR3,
Manage and process a Data Stream.
Related competences: CG2, CG5, CB7, CEC1, CEC2, CEC3, CTR3,
Design the architecture of a Big Data management system.
Related competences: CG2, CG5, CB7, CEC1, CEC2, CEC3, CTR3,

Continguts

Introduction
Big Data, Cloud Computing, Scalability
Big Data Design
Polyglot systems; Schemaless databases; Key-value stores; Wide-column stores; Document-stores
Distributed Data Management
Transparency layers; Distributed file systems; File formats; Fragmentation; Replication and synchronization; Sharding; Consistent hash; LSM-Trees
In-memory Data Management
NUMA architectures; Columnar storage; Late reconstruction; Light-weight compression
Distributed Data Processing
Distributed Query Processing; Sequential access; Pipelining; Parallelism; Synchronization barriers; Multitenancy; MapReduce; Resilient Distributed Datasets; Spark
Stream management and processing
One-pass algorithms; Sliding window; Stream to relation operations; Micro-batching; Sampling; Filtering; Sketching
Big Data Architectures
Centralized and Distributed functional architectures of relational systems; Data Wareshousing architectures; Service Oriented Architecture; Lambda architecture

Activitats

Theoretical lectures

In these activities, the lecturer will introduce the main theoretical concepts of the subject. Besides lecturing, cooperative learning techniques will be used. These demand the active participation of the students, and consequently will be evaluated.

Teoria

28

Problemes

0

Laboratori

0

Aprenentatge dirigit

0

Aprenentatge autònom

28

Objectius: 1 2 3 4 5 6 7
Continguts:

1 . Introduction
4 . In-memory Data Management
3 . Distributed Data Management
2 . Big Data Design
5 . Distributed Data Processing
6 . Stream management and processing
7 . Big Data Architectures

Projects

The students will be asked to define a BI project either using services or making decisions on a service, and prepare a demo of it in front of the classmates and a jury. Also, there will be some deliverables along the course.

Teoria

0

Problemes

0

Laboratori

9

Aprenentatge dirigit

0

Aprenentatge autònom

57

Objectius: 2 3 4 5 6 7
Continguts:

1 . Introduction
2 . Big Data Design
3 . Distributed Data Management
4 . In-memory Data Management
5 . Distributed Data Processing
6 . Stream management and processing
7 . Big Data Architectures

Exam

Written exam of the theoretical concepts introduced along the course.

Teoria

2

Problemes

0

Laboratori

0

Aprenentatge dirigit

0

Aprenentatge autònom

8

Objectius: 1 2 3 4 5 6 7
Continguts:

1 . Introduction
2 . Big Data Design
3 . Distributed Data Management
4 . In-memory Data Management
5 . Distributed Data Processing
6 . Stream management and processing
7 . Big Data Architectures

Lab

Students will use different NOSQL tools in a sandbox environment.

Teoria

0

Problemes

0

Laboratori

6

Aprenentatge dirigit

0

Aprenentatge autònom

12

Objectius: 2 3 4 5 6
Continguts:

3 . Distributed Data Management
5 . Distributed Data Processing
6 . Stream management and processing

Metodologia docent

The course comprises theory, problems, lab sessions and a project.

Theory: The theory classes comprise the teacher's explanations and constitute the main part of the course. The students will also have some contents to be read and prepared outside the classroom, and will be asked to participate in cooperative learning activities to solve some problems.

Lab: There will be some lab sessions to introduce some of the technologies and a cluster execution environment.

Project: The students will have to implement a proof of concept of an analytical service, which will share the topic with the project in other sibling subjects: BIP, SEAIT, VBP, SDM, CC, and WS (for those students taking them).

Mètode d'avaluació

Final mark = 35% C + 30% Pr + 5%P + 30%E

C = Weighted average of collaborative activities and labs
Pr = Project
P = Peer evaluation
E = Exam

Calculation of C:
1) Multiply the mark of each activity/lab by a weight equal to 1, 2, 4 or 6 (depending on the content of the corresponding activity/lab)
2) Divide the sum of these values by the sum of weights assigned minus 4

Calculation of P: Students will have multiple mates in the activities during the semester and they will evaluate them at the end. Based on these evaluations, the teacher will assign a mark to each student.

Project: Students will implement a proof of concept of the project and will present it in front of their classmates and a jury. Based on that presentation, and the delivered materials during the course the teacher will assign a mark to each student.

Bibliografia

Bàsica:

Principles of distributed database systems - Özsu, M. Tamer; Valduriez, Patrick, Springer , cop. 2011. ISBN: 978-1-4419-8833-1
http://cataleg.upc.edu/record=b1388855~S1*cat
Encyclopedia of database systems [Recurs electrònic] - Özsu, M. Tamer; Liu, Ling, Springer , 2009. ISBN: 978-0-387-39940-9
http://cataleg.upc.edu/record=b1377257~S1*cat
NoSQL distilled : a brief guide to the emerging world of polygot persistence - Sadalage, Pramod J; Fowler, Martin, Addison-Wesley , 2013. ISBN: 978-0-321-82662-6
http://cataleg.upc.edu/record=b1428723~S1*cat
In-Memory Data Management - Hasso Plattner; Alexander Zeier, Springer , 2011. ISBN: 978-3-642-19362-0
DOI:10.1007/978-3-642-19363-7
An Architecture for fast and general data processing on large clusters - Zaharia, Matei, ACM Books 2016 , . ISBN: 9781970001563
http://cataleg.upc.edu/record=b1477760~S1*cat
Mining of massive datasets - Leskovec, Jure; Rajaraman, Anand; Ullman, Jeffrey D, Cambridge University Press , 2014. ISBN: 978-1107077232
http://cataleg.upc.edu/record=b1476405~S1*cat
Data streams : models and algorithms - Aggarwal, Charu C, Springer , cop. 2007. ISBN: 978-0-387-28759-1
http://cataleg.upc.edu/record=b1303157~S1*cat

Complementaria:

Database systems : the complete book - Garcia-Molina, Hector; Ullman, Jeffrey D; Widom, Jennifer, Pearson Education , 2009. ISBN: 0131873253
http://cataleg.upc.edu/record=b1346544~S1*cat
Master Data Management - Loshin, David, , 2009. ISBN: 978-0-12-374225-4
Service-oriented architecture : concepts, technology and design - Erl, Thomas, Prentice Hall PTR , cop. 2005. ISBN: 0-13-185858-0
http://cataleg.upc.edu/record=b1298955~S1*cat

Capacitats prèvies

Being Big Data Management the evolution of Data Warehousing, such knowledge is assumed in this course. Thus, general knowledge is expected on: Relational database desing; Database management system architecture; ETL and OLAP

Specifically, knowledge is expected on:
- Multidimensional modeling (i.e, star schemas)
- Querying relational databases
- Physical design of relational tables (i.e., partitioning)
- Hash and B-tree indexing
- External sorting algorithms (i.e., merge-sort)
- ACID transactions

© Facultat d'Informàtica de Barcelona - Universitat Politècnica de Catalunya - Avís legal sobre aquest web
Aquest web utilitza cookies pròpies per oferir una millor experiència i servei. En continuar amb la navegació entenem que acceptes la nostra política de cookies..

Service Oriented Business Intelligence

Professors

Responsable

Altres

Hores setmanals

Competències

Generic Technical Competences

Generic

Transversal Competences

Teamwork

Basic

Technical Competences of each Specialization

Specific

Objectius

Continguts

Activitats

Theoretical lectures

Projects

Exam

Lab

Metodologia docent

Mètode d'avaluació

Bibliografia

Bàsica:

Complementaria:

Capacitats prèvies

On som

Contacta amb la FIB

Service Oriented Business Intelligence

Esteu aquí

Professors

Responsable

Altres

Hores setmanals

Competències

Generic Technical Competences

Generic

Transversal Competences

Teamwork

Basic

Technical Competences of each Specialization

Specific

Objectius

Continguts

Activitats

Theoretical lectures

Projects

Exam

Lab

Metodologia docent

Mètode d'avaluació

Bibliografia

Bàsica:

Complementaria:

Capacitats prèvies

On som

Contacta amb la FIB