CPU cache

cachecache memorycachescache missL2 cacheinstruction cachecache lineset-associativeL1 cachecache memories
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory.wikipedia
644 Related Articles

Multi-core processor

multi-coredual-corequad-core
A cache is a smaller, faster memory, closer to a processor core, which stores copies of the data from frequently used main memory locations. Cached data from the main memory may be changed by other entities (e.g. peripherals using direct memory access (DMA) or another core in a multi-core processor), in which case the copy in the cache may become out-of-date or stale.
For example, cores may or may not share caches, and they may implement message passing or shared-memory inter-core communication methods.

Translation lookaside buffer

TLBprocess-context identifierTLBs
Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) that is part of the memory management unit (MMU) that most CPUs have.
A translation lookaside buffer (TLB) is a memory cache that is used to reduce the time taken to access a user memory location.

Memory management unit

MMUMMUsDAT box
Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) that is part of the memory management unit (MMU) that most CPUs have.
An MMU effectively performs virtual memory management, handling at the same time memory protection, cache control, bus arbitration and, in simpler computer architectures (especially 8-bit systems), bank switching.

Cache performance measurement and metric

cache performanceCache performance measurementcoherence miss
Cache performance measurement has become important in recent times where the speed gap between the memory performance and the processor performance is increasing exponentially.
The CPU cache is a piece of hardware which reduces the access time to the data in the memory by keeping some part of the frequently used data of the main memory in itself.

Computer data storage

main memorystoragememory
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory.
Processor cache is an intermediate stage between ultra-fast registers and much slower main memory. It was introduced solely to improve the performance of computers. Most actively used information in the main memory is just duplicated in the cache memory, which is faster, but of much lesser capacity. On the other hand, main memory is much slower, but has a much greater storage capacity than processor registers. Multi-level hierarchical cache setup is also commonly used—primary cache being smallest, fastest and located inside the processor; secondary cache being somewhat larger and slower.

Dirty bit

dirty
Alternatively, in a write-back or copy-back cache, writes are not immediately mirrored to the main memory, and the cache instead tracks which locations have been written over, marking them as dirty.
Dirty bits are used by the CPU cache and in the page replacement algorithms of an operating system.

Computer

computerscomputer systemdigital computer
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory.
In more sophisticated computers there may be one or more RAM cache memories, which are slower than registers but faster than main memory.

Hyper-threading

hyperthreadinghyper-threadedHT
Another technology, used by many processors, is simultaneous multithreading (SMT), orin Intel's terminologyhyper-threading (HT), which allows an alternate thread to use the CPU core while the first thread waits for required CPU resources to become available.
(The processor may stall due to a cache miss, branch misprediction, or data dependency.)

Athlon

Athlon XPAMD AthlonAthlon MP
For example, the level-1 data cache in an AMD Athlon is two-way set associative, which means that any particular location in main memory can be cached in either of two locations in the level-1 data cache.
The Athlon's CPU cache consisted of the typical two levels.

Loop nest optimization

blockingTilingloop tiling
There is a wide literature on such optimizations (e.g. loop nest optimization), largely coming from the High Performance Computing (HPC) community. Finally, at the other end of the memory hierarchy, the CPU register file itself can be considered the smallest, fastest cache in the system, with the special characteristic that it is scheduled in software—typically by a compiler, as it allocates registers to hold values retrieved from main memory, as an example loop nest optimization.
Loop tiling partitions a loop's iteration space into smaller chunks or blocks, so as to help ensure data used in a loop stays in the cache until it is reused.

Haswell (microarchitecture)

HaswellHaswell microarchitectureDevil's Canyon
Intel's Crystalwell variant of its Haswell processors, equipped with Intel's Iris Pro GT3e embedded graphics and 128 MB of eDRAM, introduced an on-package Level 4 cache which serves as a victim cache to the processors's Level 3 cache. Later, Intel included μop caches in its Sandy Bridge processors and in successive microarchitectures like Ivy Bridge and Haswell.
Micro-operation cache(Uop Cache) capable of storing 1.5 K micro-operations (approximately 6 KB in size)

Data (computing)

datacomputer datadata representation
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory.
CPU cache

Direct memory access

DMADirect Memory Access (DMA)DMA controller
Cached data from the main memory may be changed by other entities (e.g. peripherals using direct memory access (DMA) or another core in a multi-core processor), in which case the copy in the cache may become out-of-date or stale.
Further performance-oriented enhancements to the DMA mechanism have been introduced in Intel Xeon E5 processors with their Data Direct I/O (DDIO) feature, allowing the DMA "windows" to reside within CPU caches instead of system RAM.

IBM z13 (microprocessor)

IBM z13z13z13 chip
Caches are generally sized in powers of two: 4, 8, 16 etc. KiB or MiB (for larger non-L1) sizes, although the IBM z13 has a 96 KiB L1 instruction cache.
Each core has a private 96 KB L1 instruction cache, a private 128 KB L1 data cache, a private 2 MB L2 cache instruction cache, and a private 2 MB L2 data cache.

Register file

register-bankregister filesshadow register architecture
Finally, at the other end of the memory hierarchy, the CPU register file itself can be considered the smallest, fastest cache in the system, with the special characteristic that it is scheduled in software—typically by a compiler, as it allocates registers to hold values retrieved from main memory, as an example loop nest optimization.
The register file is part of the architecture and visible to the programmer, as opposed to the concept of transparent caches.

Out-of-order execution

out-of-orderin-orderout of order
Various techniques have been employed to keep the CPU busy during this time, including out-of-order execution in which the CPU (Pentium Pro and later Intel designs, for example) attempts to execute independent instructions after the instruction that is waiting for the cache miss data.
The benefit of OoOE processing grows as the instruction pipeline deepens and the speed difference between main memory (or cache memory) and the processor widens.

Sandy Bridge

Intel Sandy Bridge2500K2nd gen Core i3/i5/i7
Later, Intel included μop caches in its Sandy Bridge processors and in successive microarchitectures like Ivy Bridge and Haswell.
32 KB data + 32 KB instruction L1 cache (4 clocks) and 256 KB L2 cache (11 clocks) per core

ECC memory

ECCerror-correcting code memoryerror-correcting memory
(The tag, flag and error correction code bits are not included in the size, although they do affect the physical area of a cache.)
Many processors use error correction codes in the on-chip cache, including the Intel Itanium and Xeon processors, the AMD Athlon, Opteron, all Zen- and Zen+-based processors (EPYC, EPYC Embedded, Ryzen and Ryzen Threadripper), and the DEC Alpha 21264.

Cache coherence

cache coherencycache coherentcache-coherent
Communication protocols between the cache managers that keep the data consistent are known as cache coherence protocols.
When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, which is particularly the case with CPUs in a multiprocessing system.

Processor register

registersregistergeneral purpose register
In the early days of microcomputer technology, memory access was only slightly slower than register access.
Modern processors use either static or dynamic RAM as main memory, with the latter usually accessed via one or more cache levels.

Central processing unit

CPUprocessorprocessors
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory.
It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline.

Athlon 64

FXAthlon 64 FXAMD Athlon 64
To illustrate both specialization and multi-level caching, here is the cache hierarchy of the K8 core in the AMD Athlon 64 CPU.
"Newcastle" was released soon after ClawHammer, with half the Level 2 cache.

Zen (microarchitecture)

ZenZen-basedZen microarchitecture
AMD implemented a μop cache in their Zen (microarchitecture).
Newly introduced "large" micro-operation cache.

Von Neumann architecture

von Neumannvon Neumann bottleneckvon Neumann machine
Microprocessors have advanced much faster than memory, especially in terms of their operating frequency, so memory became a performance bottleneck.
The von Neumann vs. Harvard distinction applies to the cache architecture, not the main memory (split cache architecture).

Sum addressed decoder

See Sum addressed decoder.
In CPU design, the use of a Sum Addressed Decoder (SAD) or Sum Addressed Memory (SAM) Decoder is a method of reducing the latency of the CPU cache access.