A cache is a special type of memory that sits between the processor and memory. A cache can be seen as a special type of associative memory, associating a physical address to its content. When a processor needs to access a particular memory location, it transparently (no software control) inquires the cache first. If the cache successfully locate the content because it is cached, then there is no need to access the actual memory location. Otherwise, the actual location is accessed.
Obviously, a cache is significantly smaller than the total addressable memory space of a computer. A cache controller uses special algorithms to determine which memory locations should remain cached, and which ones should be flushed to physical memory and the resource be reused to cache other locations.
Modern computers have multiple levels of caches. L1 (level 1) cache is closest to the processor, and can keep up with the speed of ALUs of a processor. However, because L1 cache ties directly to registers and other internal constructs of a processor, they use up a lot of on-die routing resources, and the amount is usually quite limited. L2 (level 2) cache is one step further from the processor on a die. As such, a design can have more L2 cache than L1 cache. However, due to the longer physical distance (on the same die), L2 cache may or may not be able to keep up with the raw speed of a processor.
L3 (level 3) cache is usually shared by different cores of a multi-core design. L3 cache does not keep up with the processor’s speed, but is still much faster than access memory.