Day 19 — CS Fundamentals December — Exploring the hardware side of Cache

Yashvardhan Kukreja
5 min readDec 20, 2019

Usage and conceptual understanding of caching is great but have you ever wondered about the how exactly and why exactly cache memory is faster than RAM.

Or, What is meant by the statement, “Cache is closer to the CPU than RAM”?

For understanding that, we would have to delve into the hardware of cache memory.

And believe me, it is not as shady as it sounds, so chill XD

Let’s dive in!

Introduction

So, your computer saves memory in two places for its usage.

  • DRAM
  • SRAM

“But Yash! What about hard disk, there too right?”

Hard Disk is like a backing store. At the end of the day, the relevant data is brought in from hard disk to RAM, which is then, utilised by the CPU for processing. So, CPU reaches out to RAM for the data and not the hard disk.

So, from the CPU’s standpoint, the memory stores for its usage are:

  • Dynamic RAM or DRAM — This is, basically, RAM only which all of us call and have it in our computers. So, it stores data in capacitors and these capacitors have to be constantly refreshed with electricity in order for them to store data.
  • Static RAM or SRAM — This is the cache memory and this is what is basically used in the CPU cache. And it does not use anything like capacitors and that’s why, it does not need to be refreshed again and again. Hence, it is much faster than DRAM and well, it is more expensive ;_;

“Cache is closer to CPU than RAM” — But what is “closer”?

So, at the hardware level, the DRAM or the so-called RAM is separate hardware module in itself. It is a separate entity connected to the CPU.

But cache, well, it is not a “separate” entity from the CPU as it is literally constructed over the CPU. It is, like, CPU’s own internal memory.

So, cache’s job is to store copies of data and instructions from RAM, that’s frequently used by the CPU over and over again.

This is because when CPU wants to get some data, it firstly checks the cache memory first and if it is not found there, then it will check the RAM.

Hence, if there is some data or instruction which CPU accesses very frequently then, it makes a lot of sense to copy that data/instruction from RAM to the cache so that CPU can much quickly access it.

But, what’s the problem with RAM? It seems pretty fast too

Yes, RAM is pretty fast but not fast enough.

That’s because, CPUs have become so powerful that they can process the data much much faster than the speed with which they receive data from the RAM.

Due to this, a lot of the times, CPU sits idle waiting for the data to be received from the RAM.

Now see, the essence of the working of a computer is the best utilization of resources and hence, making sure that the CPU does not sit idle for anything so as to provide quickest possible user experience.

But with RAM alone, it “is” sitting idle and waiting for data at times, hence, the introduction of cache made sense because the data access from cache to the CPU was insanely faster than the RAM.

Hardware side of cache

Different levels of cache

So, there are three types of cache at hardware level which are denoted by levels:

  • L1 or Level 1 cache — closest to the CPU
  • L2 or Level 2 cache — farther than L1 cache from the CPU
  • L3 or Level 3 cache — farthest cache from the CPU

Level 1 cache — Primary Cache

It is located over the processor itself and it legit runs at the same speed as the processor, hence, making it the fastest kind of cache on the computer. It has two further ramifications —

  • L1 data cache — It saves frequent data.
  • L1 instruction cache — It saves frequent instructions.

Still, for the article, I’ll be considering it collectively as L1 cache.

Level 2 cache — External Cache

It is basically used to capture and store the data and instructions that were not caught by the level 1 cache.

Level 3 cache — Shared Cache

It is used to capture and store the data and instructions from the RAM that were not caught by the level 2 cache.

Flow of data and instructions through caches

These are the following steps which CPU performs to access data:

  • CPU looks for the data in L1 or Level 1 cache.
  • If it is not found in the L1 cache, it will look in the L2 or Level 2 cache.
  • If it is not found in the L2 cache, it will look in the L3 or Level 3 cache.
  • If it is not even found in the L3 cache, then finally it will look into the RAM for the data.

Hence, now it should be clear as to why L1 is the fastest cache, L2 is slower and L3 is the slowest cache.

Exact placement of these caches on the processor chip

Let’s assume that there is normal dual core processor, so, I’ll explain the placement of L1, L2 and L3 caches on it.

  • So, the L1 cache will be the smallest in size. And it will be directly connected to the respective core.
  • The L2 cache will be larger in size and it will be directly connected with the respective core too, being farther from L1 cache
  • The L3 cache will be the largest in size and it will be single and connected to all the cores.

So, just to clarify, every core will have its own L1 and L2 caches.

But there is going to be only one L3 cache for the entire processor, which will be shared by all the cores.

That’s it!

Thanks for reaching till here :)

I hope you understood the article and got a good idea of the hardware-side of the cache.

Stay tuned for another article which is going to come tomorrow associated with some other interesting CS fundamental.

LinkedInhttps://www.linkedin.com/in/yashvardhan-kukreja-607b24142/

GitHubhttps://www.github.com/yashvardhan-kukreja

Email — yash.kukreja.98@gmail.com

Adios!

--

--

Yashvardhan Kukreja

Software Engineer @ Red Hat | Masters @ University of Waterloo | Contributing to Openshift Backend and cloud-native OSS