Very practical course whose main objective is learning programming techniques for "optimizing" (reducing execution time) applications, considering the architecture of the computer and what the compiler can do, using measurement and analysis tools, and using high level synthesis tools to develop custom hardware from C.
Person in charge
Daniel Jimenez Gonzalez (
Entrepreneurship and innovation
G1 [Avaluable] - To know and understand the organization of a company and the sciences which govern its activity; capacity to understand the labour rules and the relation between planning, industrial and business strategies, quality and benefit. To develop creativity, entrepreneur spirit and innovation tendency.
- To have strong decision-making skills. To use knowledge and strategic skills for the creation and management of projects, apply systematic solutions to complex problems, and design and manage the innovation in the organization. To demonstrate flexibility and professionalism when developing her work.
Technical Competences of each Specialization
Computer engineering specialization
CEC2 - To analyse and evaluate computer architectures including parallel and distributed platforms, and develop and optimize software for these platforms.
- To program taking into account the hardware architecture, using assembly language as well as high-level programming languages.
- To develop and analyse software for systems based on microprocessors and its interfaces with users and other devices.
CEC3 - To develop and analyse hardware and software for embedded and/or very low consumption systems.
- To analyse, evaluate and select the most adequate hardware and software platform to support embedded and real-time applications.
- To develop specific processors and embedded systems; to develop and optimize the software of these systems.
Follow a methodology of optimization to reduce the execution time of an application in a specific processor; first identifying the parts more expensive, then evaluating whether or not (Amdahl's law ) its optimization is worth, and finally perform the optimization if the performance gain is reasonable.
Identify and recognize the optimizations the compiler can do, either by the basic knowledge that the student has of the gcc compiler, the analysis of assembler code generated, or because the student does profiling or instrumentation.
Choose the tools and their options most suitable to generate the binary code, analyze the binary code generated and the execution of that code (by profiling, instrumentation or tracing); in addition to choosing metrics for this analysis and the environment more suitable for evaluation.
Define latency, repetition rate and throughput of an instruction; and apply this knowledge in optimizing of the codes to choose the most appropriate operation in each case.
Apply, in the proper way, optimization techniques to reduce and / or avoid long latency operations, reducing the execution time of the application.
Apply the technique of "memoization" to reduce / avoid long latency operations
Reduce the "overhead" of operating system and library calls.
Specialize the code applications to reduce / avoid long latency operations.
Apply the technique of "bithacks" to reduce / avoid long latency operations
Reduce the time allocated for breaks in code, using techniques that reduce the number of jumps.
Apply the techniques of loop unrolling, loop fusion and loop collapsing.
Change the order of the condition evaluations to reduce, in average, the execution time of an application.
Apply the technique of code hoisting.
Apply other techniques such as memoization, bithacks etc.. to reduce the number of jumps.
Apply the technique of inlining.
Characterize the memory hierarchy of the computer with either a manual or studying it empirically with codes and tools for profiling and tracing.
Apply, in the proper way, optimization techniques to make more efficient use of processor memory hierarchy, reducing the execution time of the application.
Apply optimization techniques to improve the exploitation of data locality.
Detect and prevent possible "data aliasing" to help the work of the compiler.
Apply optimizations to exploit the bandwidth of memory.
Identify if a code is vectoritzable and vectorized it if it means reducing the execution time of the application.
Apply the optimization techniques of scalar code also on vector codes.
Analyze and identify if a code can be accelerated using custom hardware and use the high level tools provided in the course to define the specific hardware from C language with very simple pragmas.
Analyze and optimize a medium sized application along the semester, applying the optimization techniques and methodology followed in the course, in coordination with the rest of their team members.
Lead a group of students to obtain the best solution to an optimization problem or exercise of analysis given, considering the solutions of the rest of his team members and after a discussion among all team members.
Introduction: Architecture-Aware Programming
What we mean by Architecture-Aware Programming and in what cases it is worth it. Optimization strategies. Performance evaluation.
Programming and Optimization Tools
Performance measurement programs. Compilation options. Study of dynamic behavior of programs. Use of high level tools to accelerate (optimize) some part of the code in hardware.
Long latency operations
Alternatives to run some high cost operations and / or long latency. Replacement of complex operations for simple operations. Memoization of results. Detection of trivial cases. Use of operating system resources.
Control flow optimization
Detection and elimination of critical breaks. Inlining. Loop Unrolling, Collapsing, Fusion, etc.
Memory Conscious Optimizations
Size and alignment of data. Division of data into blocks of size proportional to the cache (blocking). Memory disambiguation. Definition of data structures and pragmas to improve the performance of the application both in a specific hardware or a general purpose processor.
Vector Extensions and use of custom hardware to "vectorize"
SIMD extensions. Identification of codes that can be vectorized. Inserting source code vector in C.
Use of tools of High Level Synthesis to accelerate (optimize) in hardware the most time consuming parts of the program exploiting the same idea of vectorization
Introduction to Programming conscious architecture
Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an individual homework. Objectives:1 Contents:
Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work. Objectives:2310 Contents:
Participar activament en una sessió de classe expositiva-participativa de 2 hores de teoria i problemes. (2 hores). Estudiar a casa el tema assignat i realitzar els exercicis que s'hagin proposat com a treball autònom. Objectives:124512 Contents:
Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work. Objectives:12361210 Contents:
Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work. Objectives:123871210 Contents:
Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work. Objectives:12391210 Contents:
Theory classes designed to present basic concepts of architecture and processor optimization tools and methodology; inserting small problems to solve in class.
For each subject, a series of theoretical and practical exercises will be proposed to the students. Those exercises have to be solved and hanged in the forums created by the teacher at Athena. The problems to be solved in the forums will be a medium difficulty and designed to facilitate the laboratory activity, i.e., which also serve as preparatory work.
We create teams of about 5 students. The number of people should be equal to the number of exercises per subject and team and will be fixed for the course. In principle, allocated a minimum of 5 problems per team and topic.
The methodology to solve these problems consists of a personal work and teamwork. Basically consist of: (1) individually solve problems and upload the problem solved in the forum created by the teacher for each team, (2) discuss in the forum of the individual solutions of the team (the first feedback received by the students) and agree a solution to every problem and (3) post a final solution to every problem in a common forum for final solutions to all teams. For each team, an student will be responsible for a problem, and that student will post the final solution of that problem to the common forum before a certain date. The teacher will review those final solutions (second feedback) but can also check the forums for discussion of individual solutions.
The classes of problems and team activities will be discussed with all the teams (third feedback).
To allow students to compare their solutions with those of other students of the subject, the forums will be open to everyone. In addition, the course offers a website where students can send the solution of certain exercises of the course. The website will run programs, check their correctness and order the programs submitted by students considering the execution time.
The laboratory classes will support the theory and the exercises done in class of problems. Students will have the documentation before each practice session. The implementation of these practices will be entirely in the assigned lab hours. Practices will be divided into two groups: limited experiments to illustrate certain applications and optimizations, and others that will apply all knowledge to optimize a complete application.
At the end of the last laboratory session of each topic, students will have time to deliver some of the exercises done in the laboratory.
In the evaluation of the student involved five components:
(1) Laboratory exercises: L - 40% of the subject mark.
There is a lab practice per lesson and there may be more than one session per lab practice. Each practice will be 20% of the overall lab mark.
L = 0.2*L2+0.2*L3+0.2*L4+0.2*L5+0.2*L6
The solution of lab exercises will be a brief report (between 1 and 5 pages), source codes and scrips that must be delivered to the teacher via the Raco to be evaluated. Also, some of the codes may be asked to be delivered to a server where they will be compiled and checked if they correctly accelerate the application.
(2) Exercises class of problems: P - 10% of the subject mark.
These exercises are requested by the teacher in class of problems and must work in teams and presented to Athena Forum or via Raco.
(3) Partial control mark: C - 50% of the subject mark.
During the course students must answer two controls: C1, C2. Its objective is to assess whether the student has achieved assimilate code optimizations explained so far.
C = 0.35 * C1 +0.65 * C2 .
(4) Final Exam Mark: F
Final: Final examination will include theoretical and practical cases. This exam is only compulsory for those students that have not pass the controls.
(5) Extra Challenge Point: cha
Challenge: We propose an optimization exercise delivered via a web portal or via Raco that will be awarded function of the final ranking position. The student with the best solution's final grade will be increased by 1 point (faster solution), 0.9 points (second quickest solution ),... and 0.1 points (tenth quickest solution) respectively.
These extra points will be only applied to those student that has C>=5.0.
The course can be approved in two ways:
The grade exam (NEX) can be achieved in two ways, either by continuous assessment or final examination:
If (C> = 5)
NEX = MAX (C, F)
NEX = MAX(0.25*C+0.75*F, F)
The final grade is:
Final Mark = MIN (0.50 * NEX + 0.10*P + 0.40*L+ cha 10)
The competition evaluates each student considering cross their teamwork, leadership in finding effective solutions, alternative, creative and innovative to the problems to be solved which the student is responsible. That is, it will be evaluated their ability to find a solution to be a successful combination of the solutions found by a problem or the solution. Also part of this assessment to participate in finding creative solutions to problems that can not be responsible.
Degut a la situació de pandemia aquest curs no realitzarem la part de generació de hardware specífic (acceleradors) amb les plaques amb ARM+FPGA de les que disposem, i per tant, els alumnes es centraran en l'optimització d'aplicacions sobre els processadors del laboratori (intel) o bé els processador que puguin tenir a casa (intel, arm, amd). Les técniques que desenvoluparem són vàlides per a qualsevol processador de caràcter general; encara que a classe ens centrarem en Intel i ARM. Això es fa així per evitar compartir un placa entre alumnes i per les dificultats que tindria per alguns alumnes haver de gestionar la placa, el disc dur, etc. en cas de confinament total encara que tindrien el suport técnic del professorat.
En qualsevol cas, segurament dedicarem alguna sessió com activitat adicional no avaluable del que es pot fer per accelerar aplicacions amb plaques amb FPGA.
No hi ha canvis significatius més que aquells de passar de presencial a online les classes de problemes i teoria. S'ìntentarà que les classes siguin interactives en el sentit que es puguin fer problemes o plantejar problemes de forma online, amb discussions amb videoconferència, foros, etc.
Comptem que el primer control sigui presencial.
Si podem fer una avaluació presencial d'un segon control a una aula normal, en horari de classe laboratori que se suposa presencial i que per tant no hauria d'afectar als alumnes que s'hagin matriculat amb l'horari públic, aleshores el mètode d'avaluació no quedarà modifcat.
En cas de no ser possible fer el segon control, aleshores el mètode d'avaluació seria el següent:
(L) L'avaluació de laboratori no queda modificada (40%).
(P) L'avaluació dels problemes no queda modificada (10%).
(C) L'avaluació de la nota de controls C (50%) quedaria com:
C1 examen presencial en la setmana blanca: 35% de la nota de C i allibera si la nota C1 >=5, si no s'aprova, l'estudiant haurà de realitzar tot l'examen final.
Examen Final (EF) tindrà dues parts: FC1 (si escau fer-la) i C2 (65% de C).
si C1>=5 aleshores Nex = max(EF, C1*0.35+C2*0.65)
sino aleshores Nex = EF
(cha) Nota Extra de Challenge no queda modificada (punt extra).
Nota Final queda com MIN(0.50 * NEx + 0.10 *P + 0.40 * L + cha, 10)
El pla de contingència ja està contemplat a l'addenda. Aquest és el de realitzar les pràctiques sobre processadors de caràter general durant aquest quadrimestre, i per tant és més fàcil passar a una modalitat totalment online si és necessari.
Where we are
B6 Building Campus Nord
C/Jordi Girona Salgado,1-3
08034 BARCELONA Spain
Tel: (+34) 93 401 70 00