Architecture-Aware Programming

You are here

Credits
6
Types
Specialization complementary (Computer Engineering)
Requirements
  • Prerequisite: AC
Department
AC
Very practical course whose main objective is learning programming techniques for "optimizing" (reducing execution time) applications, considering the architecture of the computer and what the compiler can do, using measurement and analysis tools, and using high level synthesis tools to develop custom hardware from C.

Teachers

Person in charge

  • Daniel Jimenez Gonzalez ( )

Weekly hours

Theory
1
Problems
1
Laboratory
2
Guided learning
0.4
Autonomous learning
5.6

Competences

Transversal Competences

Entrepreneurship and innovation

  • G1 - To know and understand the organization of a company and the sciences which govern its activity; capacity to understand the labour rules and the relation between planning, industrial and business strategies, quality and benefit. To develop creativity, entrepreneur spirit and innovation tendency.
    • G1.3 - To have strong decision-making skills. To use knowledge and strategic skills for the creation and management of projects, apply systematic solutions to complex problems, and design and manage the innovation in the organization. To demonstrate flexibility and professionalism when developing her work.

Technical Competences of each Specialization

Computer engineering specialization

  • CEC2 - To analyse and evaluate computer architectures including parallel and distributed platforms, and develop and optimize software for these platforms.
    • CEC2.2 - To program taking into account the hardware architecture, using assembly language as well as high-level programming languages.
    • CEC2.3 - To develop and analyse software for systems based on microprocessors and its interfaces with users and other devices.
  • CEC3 - To develop and analyse hardware and software for embedded and/or very low consumption systems.
    • CEC3.1 - To analyse, evaluate and select the most adequate hardware and software platform to support embedded and real-time applications.
    • CEC3.2 - To develop specific processors and embedded systems; to develop and optimize the software of these systems. 

Objectives

  1. Follow a methodology of optimization to reduce the execution time of an application in a specific processor; first identifying the parts more expensive, then evaluating whether or not (Amdahl's law ) its optimization is worth, and finally perform the optimization if the performance gain is reasonable.
    Related competences: CEC2.3, CEC2.2,
  2. Identify and recognize the optimizations the compiler can do, either by the basic knowledge that the student has of the gcc compiler, the analysis of assembler code generated, or because the student does profiling or instrumentation.
    Related competences: CEC2.3, CEC3.2, CEC3.1,
  3. Choose the tools and their options most suitable to generate the binary code, analyze the binary code generated and the execution of that code (by profiling, instrumentation or tracing); in addition to choosing metrics for this analysis and the environment more suitable for evaluation.
    Related competences: G1.3, CEC2.3, CEC3.1,
  4. Define latency, repetition rate and throughput of an instruction; and apply this knowledge in optimizing of the codes to choose the most appropriate operation in each case.
    Related competences: CEC3.2, CEC2.2,
  5. Apply, in the proper way, optimization techniques to reduce and / or avoid long latency operations, reducing the execution time of the application.
    Related competences: G1.3, CEC3.2, CEC2.2,
    Subcompetences:
    • Apply the technique of "bithacks" to reduce / avoid long latency operations
    • Specialize the code applications to reduce / avoid long latency operations.
    • Apply the technique of "memoization" to reduce / avoid long latency operations
    • Reduce the "overhead" of operating system and library calls.
  6. Reduce the time allocated for breaks in code, using techniques that reduce the number of jumps.
    Related competences: G1.3, CEC3.2, CEC2.2, CEC3.1,
    Subcompetences:
    • Apply the technique of inlining.
    • Change the order of the condition evaluations to reduce, in average, the execution time of an application.
    • Apply the technique of code hoisting.
    • Apply the techniques of loop unrolling, loop fusion and loop collapsing.
    • Apply other techniques such as memoization, bithacks etc.. to reduce the number of jumps.
  7. Characterize the memory hierarchy of the computer with either a manual or studying it empirically with codes and tools for profiling and tracing.
    Related competences: CEC2.3,
  8. Apply, in the proper way, optimization techniques to make more efficient use of processor memory hierarchy, reducing the execution time of the application.
    Related competences: G1.3, CEC3.2, CEC2.2,
    Subcompetences:
    • Detect and prevent possible "data aliasing" to help the work of the compiler.
    • Apply optimizations to exploit the bandwidth of memory.
    • Apply optimization techniques to improve the exploitation of data locality.
  9. Identify if a code is vectoritzable and vectorized it if it means reducing the execution time of the application.
    Related competences: CEC3.2, CEC2.2, CEC3.1,
    Subcompetences:
    • Apply the optimization techniques of scalar code also on vector codes.
  10. Analyze and identify if a code can be accelerated using custom hardware and use the high level tools provided in the course to define the specific hardware from C language with very simple pragmas.
    Related competences: G1.3, CEC3.2,
  11. Analyze and optimize a medium sized application along the semester, applying the optimization techniques and methodology followed in the course, in coordination with the rest of their team members.
    Related competences: G1.3, CEC2.2,
  12. Lead a group of students to obtain the best solution to an optimization problem or exercise of analysis given, considering the solutions of the rest of his team members and after a discussion among all team members.
    Related competences: G1.3,

Contents

  1. Introduction: Architecture-Aware Programming
    What we mean by Architecture-Aware Programming and in what cases it is worth it. Optimization strategies. Performance evaluation.
  2. Programming and Optimization Tools
    Performance measurement programs. Compilation options. Study of dynamic behavior of programs. Use of high level tools to accelerate (optimize) some part of the code in hardware.
  3. Long latency operations
    Alternatives to run some high cost operations and / or long latency. Replacement of complex operations for simple operations. Memoization of results. Detection of trivial cases. Use of operating system resources.
  4. Control flow optimization
    Detection and elimination of critical breaks. Inlining. Loop Unrolling, Collapsing, Fusion, etc.
  5. Memory Conscious Optimizations
    Size and alignment of data. Division of data into blocks of size proportional to the cache (blocking). Memory disambiguation. Definition of data structures and pragmas to improve the performance of the application both in a specific hardware or a general purpose processor.
  6. Vector Extensions and use of custom hardware to "vectorize"
    SIMD extensions. Identification of codes that can be vectorized. Inserting source code vector in C.
    Use of tools of High Level Synthesis to accelerate (optimize) in hardware the most time consuming parts of the program exploiting the same idea of vectorization

Activities

Activity Evaluation act


Introduction to Programming conscious architecture

Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an individual homework.
Objectives: 1
Contents:
Theory
1h
Problems
0h
Laboratory
2h
Guided learning
0h
Autonomous learning
1h

Scheduling and Optimization Tools

Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work.
Objectives: 2 3 10
Contents:
Theory
2h
Problems
2h
Laboratory
6h
Guided learning
0h
Autonomous learning
10h

Long latency operations

Participar activament en una sessió de classe expositiva-participativa de 2 hores de teoria i problemes. (2 hores). Estudiar a casa el tema assignat i realitzar els exercicis que s'hagin proposat com a treball autònom.
Objectives: 1 2 4 5 12
Contents:
Theory
2h
Problems
2h
Laboratory
4h
Guided learning
0h
Autonomous learning
9h

Optimization flow control

Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work.
Objectives: 1 2 3 6 12 10
Contents:
Theory
3h
Problems
2h
Laboratory
6h
Guided learning
0h
Autonomous learning
18h

Control 1-T

Issues 1-3.
Objectives: 1 2 3 4 5 7 10
Week: 7
Type: theory exam
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Control-1-P

Lessons 1-3
Objectives: 1 2 3 4 5 7 10
Week: 7
Type: problems exam
Theory
0h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Optimizations conscious memory

Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work.
Objectives: 1 2 3 8 7 12 10
Contents:
Theory
2h
Problems
3h
Laboratory
6h
Guided learning
0h
Autonomous learning
14h

Vector Extensions and custom hardware usage

Actively participate in a session-class exhibition participatory 2 hours of theory and problems. (2 hours). Studying at home and do the exercises assigned topic to be proposed as an independent work.
Objectives: 1 2 3 9 12 10
Contents:
Theory
3h
Problems
4h
Laboratory
6h
Guided learning
0h
Autonomous learning
22h

Control 2-T

All subjects. More centered on 5-6, but any previous concept may be avaluated.
Objectives: 1 2 3 4 5 6 8 7 9 11 12 10
Week: 14
Type: theory exam
Theory
1h
Problems
0h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Control-2-P

All topics. More focused on topics 4-6 but any previous concept can come out.
Objectives: 1 2 3 4 5 6 8 7 9 11 12 10
Week: 14
Type: problems exam
Theory
0h
Problems
1h
Laboratory
0h
Guided learning
0h
Autonomous learning
2h

Theory
0h
Problems
0h
Laboratory
0h
Guided learning
3h
Autonomous learning
2h

Final Exam

Final Exam. Outside of class time.
Objectives: 1 2 3 4 5 6 8 7 9 11 12 10
Week: 15 (Outside class hours)
Type: final exam
Theory
0h
Problems
0h
Laboratory
0h
Guided learning
3h
Autonomous learning
0h

Teaching methodology

Theory classes designed to present basic concepts of architecture and processor optimization tools and methodology; inserting small problems to solve in class.

For each subject, a series of theoretical and practical exercises will be proposed to the students. Those exercises have to be solved and hanged in the forums created by the teacher at Athena. The problems to be solved in the forums will be a medium difficulty and designed to facilitate the laboratory activity, i.e., which also serve as preparatory work.
We create teams of about 5 students. The number of people should be equal to the number of exercises per subject and team and will be fixed for the course. In principle, allocated a minimum of 5 problems per team and topic.

The methodology to solve these problems consists of a personal work and teamwork. Basically consist of: (1) individually solve problems and upload the problem solved in the forum created by the teacher for each team, (2) discuss in the forum of the individual solutions of the team (the first feedback received by the students) and agree a solution to every problem and (3) post a final solution to every problem in a common forum for final solutions to all teams. For each team, an student will be responsible for a problem, and that student will post the final solution of that problem to the common forum before a certain date. The teacher will review those final solutions (second feedback) but can also check the forums for discussion of individual solutions.

The classes of problems and team activities will be discussed with all the teams (third feedback).
To allow students to compare their solutions with those of other students of the subject, the forums will be open to everyone. In addition, the course offers a website where students can send the solution of certain exercises of the course. The website will run programs, check their correctness and order the programs submitted by students considering the execution time.

The laboratory classes will support the theory and the exercises done in class of problems. Students will have the documentation before each practice session. The implementation of these practices will be entirely in the assigned lab hours. Practices will be divided into two groups: limited experiments to illustrate certain applications and optimizations, and others that will apply all knowledge to optimize a complete application.

At the end of the last laboratory session of each topic, students will have time to deliver some of the exercises done in the laboratory.

Evaluation methodology

In the evaluation of the student involved five components:

(1) Laboratory exercises: L - 40% of the subject mark.

There is a lab practice per lesson and there may be more than one session per lab practice. Each practice will be 20% of the overall lab mark.

L = 0.2*L2+0.2*L3+0.2*L4+0.2*L5+0.2*L6

The solution of lab exercises will be a brief report (between 1 and 5 pages), source codes and scrips that must be delivered to the teacher via the Raco to be evaluated. Also, some of the codes may be asked to be delivered to a server where they will be compiled and checked if they correctly accelerate the application.

(2) Exercises class of problems: P - 10% of the subject mark.

These exercises are requested by the teacher in class of problems and must work in teams and presented to Athena Forum or via Raco.


(3) Partial control mark: C - 50% of the subject mark.

During the course students must answer two controls: C1, C2. Its objective is to assess whether the student has achieved assimilate code optimizations explained so far.

C = 0.35 * C1 +0.65 * C2 .


(4) Final Exam Mark: F

Final: Final examination will include theoretical and practical cases. This exam is only compulsory for those students that have not pass the controls.

(5) Extra Challenge Point: cha

Challenge: We propose an optimization exercise delivered via a web portal or via Raco that will be awarded function of the final ranking position. The student with the best solution's final grade will be increased by 1 point (faster solution), 0.9 points (second quickest solution ),... and 0.1 points (tenth quickest solution) respectively.
These extra points will be only applied to those student that has C>=5.0.


The course can be approved in two ways:
===========================================

The grade exam (NEX) can be achieved in two ways, either by continuous assessment or final examination:

If (C> = 5)
NEX = MAX (C, F)
otherwise:
NEX = MAX(0.25*C+0.75*F, F)

The final grade is:

Final Mark = MIN (0.50 * NEX + 0.10*P + 0.40*L+ cha 10)

The competition evaluates each student considering cross their teamwork, leadership in finding effective solutions, alternative, creative and innovative to the problems to be solved which the student is responsible. That is, it will be evaluated their ability to find a solution to be a successful combination of the solutions found by a problem or the solution. Also part of this assessment to participate in finding creative solutions to problems that can not be responsible.

Bibliography

Basic:

Previous capacities

The requirements provided for.