Explicitly parallel instruction computing EPIC is a term coined in by the HP—Intel alliance [1] to describe a computing paradigm that researchers had been investigating since the early s. This was intended to allow simple performance scaling without resorting to higher clock frequencies. By , researchers at HP recognized that reduced instruction set computer RISC architectures were reaching a limit at one instruction per cycle. One goal of EPIC was to move the complexity of instruction scheduling from the CPU hardware to the software compiler, which can do the instruction scheduling statically with help of trace feedback information.

Author:Teshakar Nilrajas
Language:English (Spanish)
Published (Last):1 February 2004
PDF File Size:13.41 Mb
ePub File Size:2.53 Mb
Price:Free* [*Free Regsitration Required]

Another Loop with Dependences. If an array element with index:. The loop stores into an array element indexed by:. If a loop carried dependence exists, then :. Answer The following dependences exist among the four statements:. These dependences will force S3 and S4 to wait for S1 to complete.

There is an antidependence from S1 to S2, based on X[i]. There is an antidependence from S3 to S4 for Y[i]. There is an output dependence from S1 to S4, based on Y[i]. Eliminating false dependencies The following version of the loop eliminates these false or pseudo dependences. Points-to analysis. Relies on information from three major sources:. Type information, which restricts what a pointer can point to. Information derived when an object is allocated or when the address of an object is.

For example, if p always points to an object allocated in a given source line and q never points to that object, then p and q can never point to the same object. Information derived from pointer assignments. For example, if p may be assigned the. Eliminating dependent computations copy propagation, used to simplify sequences like the following:. Recurrence Recurrences are expressions whose value on one iteration is given by a function that depends onthe previous iterations.

Software pipelined loop. Loop: SD. F4,16 R1. F0,0 R1. Software pipelining cont. SW pipelining example. Iteration i:. F4,0 R1. SW pipelined loop with startup and cleanup code. F0,16 R1. F0,8 R1. F4,8 R1. Software pipelining versus unrolling. Software pipelining versus unrolling cont.

Global code scheduling- cont. Using super blocks. Trace Scheduling:. Code generation sequence. BNEZ R4,elsepart ;then part. J join. Trace Selection Likely sequence of basic blocks that can be put together. Trace Selection. Trace Example If the shaded portion in previous code was frequent path and it was unrolled 4 times :. Super Blocks for Global Scheduling. Analysis on Superblocks. Example :. Conditional move between registers. Another example.

Limitations of Conditional Moves. Can handle nonloop branches nicely. Eg : The whole if portion can be predicated if the frequent path is not taken.

Some Complications. Limitations of Predicated Instructions. Predication Some Complications. Solution I: Ignore exceptions Example. Solution II: mark speculative instructions. R1,0 R3. R4,0 R2. R4,0 R3.

Solution II: mark speculative instructions cont. Solution III: poison bits. Programs renames registers to start from 32 to Register Stack. On function call, machine shifts register window such that previous output registers become new locals starting at r Five execution unit.

Possible Template Values. Figure G. Stops are indicated by heavy lines. Straightforward MIPS code. R1,R1, The code scheduled to minimize the number of bundles.

The code scheduled to minimize the number of cycles assuming one bundle executed per cycle. Unrolled loop after it has been scheduled for the pipeline. F6,-8 R1. F10, R1. F14, R1. F8,-8 R1. F12,16 R1. F16,8 R1. Distributing Responsibility. Instruction Groups. Control flow parallelism Parallel comparison Multiway branches Influencing dynamic events Provides an extensive set of hints that the compiler uses to tell the hardware about likely branch behavior taken or not taken, amount to fetch at branch target and memory operations in what level of the memory hierarchy to cache data.

C code:. Execute multiple paths simultaneously Reduces mispredicted branches. Use predicates to eliminate branches, move instructions across branches Conditional execution of an instruction based on predicate register 64 1-bit predicate registers Predicates are set by compare instructions Most instructions can be predicated — each instruction code contains predicate field If predicate is true, the instruction updates the computation state; otherwise, it behaves like a nop.

Scheduling and Speculation. Control speculation Not all the branches can be removed using predication. Loads have longer latency than most instructions and tend to start time- critical chains of instructions Constraints on code motion on loads limit parallelism Non-EPIC architectures constrain motion of load instruction IA Speculative loads, can safely schedule load instruction before one or more prior branches.

Control Speculation Exceptions are handled by setting NaT Not a Thing in target register Check instruction-branch to fix-up code if NaT flag set Fix-up code: generated by compiler, handles exceptions NaT bit propagates in execution almost all IA instructions NaT propagation reduces required check points.

Speculative Load Load instruction ld. Data Speculation The compiler may not be able to determine the referenced pointers. Want to move calculations ahead of a possible memory dependency. Traditionally, given a store followed by a load, if the compiler cannot determine if the addresses will be equal, the load cannot be moved ahead of the store. IA allows compiler to schedule a load before one or more stores.

Use advance load ld. Data Speculation. Allows for loads to be moved ahead of stores even if the compiler is unsure if addresses are the same. A speculative load generates an entry in the ALAT. A store removes every entry in the ALAT that have the same address. Check instruction will branch to fix-up if the given address is not in the ALAT.


Explicitly parallel instruction computing



ACA Unit 8 Hardware and Software for VLIW and EPIC Notes -- Unit 8






Related Articles