How can a program's size increase the rate of cache misses?

Why is this? Also, having a program's code size be larger due to large amounts of dead code won't increase the rate of cache misses, since dead code won't be executed?

939 9 9 silver badges 16 16 bronze badges asked Sep 7, 2016 at 21:59 Sasha Fonseca Sasha Fonseca 2,283 25 25 silver badges 43 43 bronze badges

3 Answers 3

Code is typically read into caches in whole cache lines, which might be 64, 128 or 256 bytes. If you have a 256 byte cache line and three quarters of that is dead code, then you are not using your cache memory very well. On the other hand, if you have a megabyte of completely unused code, that doesn't affect cache efficiency at all.

Some compilers will use heuristics or hints by the software developer to find out which code is probably very rarely used, and arrange code so that one cache line will be completely filled with used code or completely filled with unused code.

And unlike dead code, used code that is inflated by loop unrolling will increase cache misses.

answered Sep 7, 2016 at 22:13 gnasher729 gnasher729 52.2k 5 5 gold badges 79 79 silver badges 102 102 bronze badges

Profile-guided optimization is a really good way for compilers to find hot vs. cold code. (A lot of programs have code that runs at startup but then never again. Grouping all that cold code together so it's not mixed with hot code is a good thing, even though it's not actually dead code.)

Commented Sep 8, 2016 at 0:01

This isn't wrong per se, but the increased code size is the primary reason that loop unrolling can cause more i-cache misses, and I don't think this answer emphasizes that point enough.

Commented Sep 8, 2016 at 4:43

The Wikipedia article on loop unrolling lists several possible downsides of this technique, two of which are related to code size:

Increased program code size, which can be undesirable, particularly for embedded applications. Can also cause an increase in instruction cache misses, which may adversely affect performance.
If the code in the body of the loop involves function calls, it may not be possible to combine unrolling with inlining, since the increase in code size might be excessive. Thus there can be a trade-off between the two optimizations.

Loop unrolling increases the static code size because it duplicates part of the code that is in a loop. The hope is that it will decrease the dynamic instruction count because fewer iterations of the loop will need to be executed.

For a small loop body the increased number of instructions in the loop body probably won't have a negative impact on the instruction cache hit rate. But as the loop body gets larger the increased code size can cause other useful lines in the instruction cache to be evicted, and the overall hit rate might be reduced. This is primarily because of the larger amount of code that is executed.

Loop unrolling become particularly complex when the loop body contains function calls and the compiler has to determine whether to inline the functions or not. Or if the loop is part of a function that might be inlined elsewhere. Both inlining and loop unrolling have the potential to decrease dynamic instruction count, but they also increase static code size. So the compiler typically uses heuristics to choose how to combine these optimizations since the problem can't be solved optimally (at least not in a reasonable amount of time).

As mentioned in one of the other answer alignment issues might also cause more instruction cache misses. However, the primary reason loop unrolling can cause more cache misses is that it increases the amount of code actually executed in the loop body (as well as the additional prologue and epilogue code).