-
Essay / Superscalar processor and its performance problems
C. Instruction Issuing and Parallel ExecutionNow that the execution tuples are ready, it is time to decide which execution tuple should be issued. Ideally, an instruction is ready to execute as soon as its operands are available. In practice, they depend on certain constraints such as execution units and recording file ports. Instructions are blocked in the instruction queue until operands are available. Figure 3: Methods for organizing the instruction issue queue There are three methods for organizing instruction queue buffers, as shown in Figure 3. The queue method Single wait is used when there is no out-of-order transmission. Therefore, no registry renaming is required. It uses a single reservation bit to indicate the end of the instruction. The multi-queue method has multiple queues that can issue or order instructions relative to each other. The last method is reservation stations which can issue instructions out of order. There is no strict FIFO order. These reservation stations do not contain actual data, but contain pointers to the data locations.D. Managing Memory OperationsAccessing memory is a crucial task. This should be done as quickly as possible. Therefore, memory hierarchies are used, that is, a small and fast memory cache is used. For example, the ARM Cortex A15 processor has 32 KB of data cache and instruction cache. The decoding phase does not identify which memory location will be accessed. Therefore, the memory address to be accessed must be calculated. This address must be translated in order to generate a physical address. A translation lookup buffer is used to speed up this process. There are several methods for incorporating multiple memory operations. But most of them are impractical. The conventional superscalar processor allows if...... middle of paper ...... me, the throughput cannot be sustained. To solve this problem, we can use a 2-port data cache. Analysis of the results in [3] shows that there was no significant performance improvement when the 2-port data cache was used. It is therefore a trade-off between cost and performance.REFERENCES[1] James E. Smith, Gurindar S. Sohi “The Microarchitecture of Superscalar Processors”, Proceedings of the IEEE, Volume: 83, Issue: 12, pp. 1609-1624, August 2002.[2] ND Shah, YH Shah, H Modi, “Comprehensive study of features, execution stages and microarchitecture of superscalar processors”, IEEE International Conference on Computational Intelligence and Computer Research (ICCIC), pp. 1-4, December 2013.[3] Steven Wallace, Nader Bagherzadeh, “Performance Issues of a Superscalar Microprocessor”, International Conference on Parallel Processing, Volume: 1, pp. 293-297, August 1994.