Memory - Operations

Cache Memory

Buffers
Buffers are used extensively in computers, and appear anywhere where data has to be transferred between two different speed data buses. Buffers allow information to accumulate until there is enough to completely fill the bandwidth available. There is a buffer between the PCI bus, and the memory bus. Because the PCI bus is only 32-bits, there is not enough information per clock to fill the memory bus' 64-bits. And because the PCI bus operates at less than half of the memory bus, it is impossible for data to be effectively transferred without a buffer. Buffers allow data to accumulate at the end of the PCI bus, until there is enough data to make efficient use of the memory bus' increased speed and bit width. The same is true for the opposite direction. A buffer collects information from the memory bus, so that the memory bus can be used for other tasks while the PCI bus is still reading the data out of the buffer.
Cache
Caches are slightly different then buffers. They are more like RAM, only smaller, and quicker. In fact, the cache on most hard drives is made out of standard PC100 SDRAM. The cache act as both a buffer and memory, in that information accumulates there until it is efficient to be send. But unlike a buffer, caches keep the frequently used data, and do not discard it after transfer. Caches are used where certain types of data are repetitivily needed. Processors use their L1 and L2 caches for storing data for short periods of time. A loop structure in programming will make repetitive calls, and caches save the data from having to be reloaded from memory.
RAM
Main Memory
Main memory is the high speed storage. Unlike caches and buffers, main memory is made to hold a lot of data. Main memory is actually the slowest form of electrical storage in a computer. If data is unable to be stored in RAM, it is stored in a physical medium like a hard drive, where accessing it will mean a large performance penalty.

Virtual Memory
Virtual memory is the a technique to implement a secondary storage system to augment main system RAM. When more RAM is needed than there physically is, hard disk space is used to store the currently unused information. Because not all of the information in RAM is being accessed on a frequent basis, it is possible to store the infrequently used data on the hard drive. With programs requiring more than 200MB each, and the average computer only equipped with less than 64MB, this is an invaluable tool.

Virtual memory is often called a swap file, or swap disk. This is because it is both a file stored on the hard drive, and data is swapped in and out of it often.

The virtual memory in a Windows environment is stored by default in the C:\ directory on the hard drive in a file named WIN386.SWP. In a Unix, or *nix environment, virtual memory is stored in a separate partition.

RAM Operation

All RAM, whether it is Static RAM or Dynamic RAM consists of chips. These chips contain 2 dimensional cells of 1 bit each. This means that each single bit cell has an x address and a y address, know as Rows and Columns. For example, a chip with 256 rows and 256 columns contains 65536 cells/bits, for a total of 8192 bytes or 8 kilobytes. Note that there are 8 bits one byte, and 1024 bytes in one kilobyte. Find out more here.

The CPU doesn't see RAM as a 2 dimensional configuration, but as a 1 dimensional or linear arrangement. It is up to the RAM controller and chipset to interpret what location the CPU wants and act as a "translator" between the two devices.

For SRAM, the bandwidth needed for a location in double that of the DRAM. This means more pins, but faster speed.

Chip Operation
SRAM
For a SRAM access to take place, a number of things have to occur. The first that needs to take place for a read, the WE, write enable has to be turned off. Because of the notation, this pin is deactivated with a high signal (or a 1), and activated with a low signal (or a 0). This will tell the chip that information is not being written to. If information is to be written or stored to the chip, than this pin needs to be deactivated. Next the chip select pin is activated. This will tell the chip that it is to accept information that is sent to it via the address bus. This pin is used to distinguish between multiple chips because memory is made to share the same bus. Next, the address needs to be sent to the chip. The RAS and CAS address are sent to the chip through the address bus. The SRAM finds and opens the cell specified, and if this was a write cycle it would store the information that is sent to it on its DataIn pin. If a read was specified, than it will output the contents of the cell to the Dataout pin.

DRAM
DRAM access is a little more complicated. The RAS signal is send to the chip via the address bus. Remember that CAS and RAS signals are multiplexed, so they are not sent at the same time. To tell the chip that this address is a RAS signal, the RAS pin is activated. The chip sends the right RAS line to the precharge amps which prepare the row for access. If this is to be a write access, than the Write Enable pin is activated. If this access is to be read only, then the Write Enable pin is deactivated. Once the chip is ready for the CAS location, it is sent it via the same address bus that it received the RAS address. Only difference here is that the RAS pin is deactivated, and the CAS pin is activated. If the RAS line is already charged because the last access used that row, than that is referred to as a Page Hit. If the line is not already charged, than it has to recharge the sense amps before the row can be accessed. This will usually take between 2 to 4 cycles. When the wrong page is open, it is referred to as a Page Miss. If the wrong page is open, it first has to be closed before the correct row can be charged. This will cost another 2 cycles. It is up to the chipset to try and keep the correct rows precharged, because correcting Page Miss can take as much time as a complete Page Hit access would take. For reading, the CAS location is sent to the sence amps which send the contents to the data bus for output. If this was a write, then information is sent to the chip via the data bus to the sense amps which in turn send the data to the correct cell for storage.

As you can see, DRAM access has a few more steps, and access can take a little bit longer. But newer versions allow for pipelining and burst operation which can "cover up" this access time by being able to do multiple access tasks and bank interleaving.


Memory Modules
Memory chips are usually used in what is called a module. In early computers, instead of modules, RAM was part of the motherboard. This was a bad idea because it made upgrading of RAM nearly impossible. So modules came along as a way of connecting chunks of RAM to the rest of the computer. A modular contains any number of chips, with almost any number of pins, like 30, 72, 168, and is responsible for either 16 bits, 32 bits or 64 bits of bandwidth. So with all of these variables, how does it work.

A basic memory chip on a module is responsible for 1 bit of information. This isn't really that much, considering the CPU probably wants more than that. So multiple chips are used to store multiple bits.

In this example, if one byte (or 8 bits) needs to be stored, 8 chips each with the ability to store one bit each could be used. All of the chips could collectively store the information. This is the principle behind a memory modular works. Modules are used in configurations so that there are always as many bits able to be stored at one time as there are to be transferred through the memory bus. This is the most efficient arrangement. In the previous example, the memory bus would have only been 8 bits wide. Some chips are able to store up to 8 bits by themselves. The way that this works is that they have internally 8 different sections which would act like separate chips. The 8 bit chip would have the same number of address pins, but have more input and output pins. This technique is used to make modules of higher density and using fewer chips.

Today's memory buses are mostly either 16-bits, 32-bits or 64-bits in width, so memory modules are made so that they can properly fill this bandwidth. For example, 486 computers use a 32bit system bus, so they naturally would have memory which is capable of 32-bit access. Newer computers, Pentium and up, all use a 64-bit system bus, so they memory which is capable of 64-bit access.

The details of SIMM's, DIMM's, and RIMM's are similar, but slightly different in operation.

SIMM's
SIMM is an acronym for Single Inline Memory Module. SIMMS came in a number of configurations. One of the first designs was the 30-pin SIMM which was capable of 8-bit access. This means that to be used in a 32-bit system, 4 modules would need to be used to fill up the bus. This module was one of the first to be designed, and it was small in size and used chips like the 16 k 1 bit that I used as an example for RAM pinouts. This meant to have 8-bit access, there were 8 chips on each module.

Next came the 72-pin SIMM. This mostly was used in 486 systems, because only one module needed to be used to fill the 32bit bus. These modules were capable of more storage because they used newer chips that could store 4 bits a piece, so instead of needing 32 chips on each module to create 32bit access, only 8 were needed. Some 72pin SIMM's used 8-bit chips, which meant only 4 chips were needed on one SIMM. Even still, some used 16-bit chips, so only 2 chips were on each SIMM. This was usually not done because it would make chip addressing a difficult task for the SIMM engineers, because each chip would have 16 DataIn pins and 16 DataOut pins. That is a lot to keep track of.

DIMM
DIMM is an acronym for Dual Inline Memory Module. These are referred to as "dual" because they were double sided. Instead of chips on only one side, chips were used on both sides, in separate configurations. You can think of these as if you had 2 SIMMS, and you glued them back to back. DIMM's were needed in the transition from a 32-bit to 64-bit system bus which that the Pentium used. This of course meant that the pin could also rose, up to 168 pins. Either DIMM's or pairs of 72-pin SIMM's can be used in today's systems, although motherboard manufactures have lost support for SIMM's because they could never reach the speed that DIMM's could.
RIMM
RIMM is an acronym for Rambus Inline Memory Module. Instead of having only one channel for data to travel through, RIMM's have a series of smaller, high speed channels that offer dedicated bandwidth. Each of these channels is able to supply memory bandwidth to different devices simultaneously.


NORMAL RAM                                                         RAMBUS

This means if a hard drive needs to access memory, along with the CPU and video card, all 3 can access at the same time. This major change in functionality necessitates a different system set up to handle these dedicated channels. Each RIMM uses the same DRAM chips, but instead of being parallel like DRAM, RIMM's use them in series, on parallel channels.

Chip Pin-Out

Chip Pin-Out
These are the pins for 16 k X 1-bit chips. What this notation means, is that the chip contains 16k of data, and is able to store 1 bit of data per cell. This does not mean that there are 16000 cells. There are actually 16384. This is because there are 14 address pins, (7 multiplexed pins in DRAM), and this creates an array of 214 cells. Most pins are activated with a 1 or high signal, and deactivated with a 0 or low signal. Exceptions to this are pins that have been over lined. Another notation for over lining is to use a / before the pin. An example is WE is the same as /WE.
16k X 1-bit SRAM16k X 1-bit DRAM
Pin 1A0Pin 20Vcc
Pin 2A1Pin 19A13
Pin 3A2Pin 18A12
Pin 4A3Pin 17A11
Pin 5A4Pin 16A10
Pin 6A5Pin 15A9
Pin 7A6Pin 14A8
Pin 8DOUTPin 13A7
Pin 9WEPin 12DIN
Pin 10GNDPin 11CS
Pin 1Pin 16Vcc
Pin 2DINPin 15CAS
Pin 3WEPin 14DOUT
Pin 4RASPin 13A3
Pin 5A0Pin 12A4
Pin 6A1Pin 11A5
Pin 7A2Pin 10A6
Pin 8VddPin 9
Address Pins
These pins are used to send to the chip the array coordinates of the information that is to be read/written.

In SRAM chips, half of the address pins are used for the RAS signals and the other half are used for the CAS signals.

In DRAM chips, the address pins are Multiplexed, meaning that the same pins are used for both the RAS and CAS signals. The way that this works is that the RAS signals are sent first and are indicated by activating the RAS pin. The CAS signals are sent in a later clock cycle, and are indicated to the chip by activating the CAS pin.

DOUT (Data Output)
This is the pin used to output the value of the selected cell during a read request. The cell location is indicated by the RAS and CAS values.
DIN (Data Input)
This is the pin used to output the value of the selected cell during a write request. The cell location is indicated by the RAS and CAS values.
WE (Write Enable)
This is the pin used to tell the chip that information will be written to a cell, rather than output.
CS (Chip Select)
Since multiple RAM chips are connected to the same address and data buses, this pin is used to distinguish which chip is suppose to be used.
GND (Ground)
This pin serves as a ground to the chip.
VCC (Voltage)
This pin supplies the chip with the power that it needs to operate.
RAS (Row Address Strobe)
This is the pin is only found on DRAM, and not SRAM. It is used to tell the chip that the information that is received via the address bus is the row address to be used.
CAS (Column Address Strobe)
This is the pin is only found on DRAM, and not SRAM. It is used to tell the chip that the information that is received via the address bus is the column address to be used.
Pin Differences
SRAM is large and bulky, each cell is made of 4 transistors and 2 resistors. DRAM is small and compact, each of its cells are made of only 1 transistor and 1 capacitor. SRAM has an address pin for both RAS and CAS locations. These means a chip with 128 rows and 128 columns, and a total of 16384 cells would have 7 pins for RAS signals and 7 rows for CAS signals. This is because there are 128 different combinations for the values of the 7 pins. Each pin can be either on or off, so for 2 pins there would be 4 combinations. One pin could be on, the other could be on, both could be on, or neither could be on. This is ok for SRAM, but bad for DRAM. DRAM is a lot smaller and more cells can fit on each chip. A 16MB chip is not uncommon, and that would require 24 address pins. That is too many pins for a chip so the address pins are multiplexed. This means both the RAS and CAS signals use the same pins, so a 16MB DRAM chip would only need 12 address pins instead of 24. This not only saves on pins but allows CAS signals to be sent continuously which is required for BEDO and S DRAM types.
What IS RAM?
RAM is an acronym for Random Access Memory. RAM is basically a form of electronically "remembering" something, and being able to access it randomly. This means that any piece of information can be read or written to until either it is removed or the power is taken away. There are 2 basic types of Random Access Memory. There is Static RAM and Dynamic RAM. These 2 basic forms of RAM operate in very different ways, and both have their advantages and disadvantages. Almost all personal computers have both types of memory.
Static RAM
SRAM is static in nature. It is composed of 4 transistors and 2 resistors. Unlike dynamic RAM, SRAM does not need to be refreshed, because both reading from SRAM is not destructive and there is no problem with a capacitor leaking.
Dynamic RAM
DRAM is very unstable, so it is called Dynamic. DRAM, depending on what type, has to be refreshed approximately every 64ms, or 15.6 times per second. The reason for this lies in DRAM's composition. All DRAM is made up of memory cells. These cells are composed of 1 capacitor and one transistor. Capacitors by nature hold electrons. A capacitor that is full of electrons is a considered on or having the value of 1, and an empty capacitor is considered to be off, or having the value of 0.

When a capacitor is charged, it powers the gate of the transistor. The gate of the transistor controls how much electrical current is allowed to travel from the emitter, also know as the drain, to the collector, also know as the source. If the capacitor has a charge, than the gate will allow electrons to move across the transistor. If the capacitor does not have a charge, than very few electrons will be able to traverse the transistor. Since the gate isn't 100% efficient, electrons will slowly leak out of the capacitor until it is empty. This would be bad because the cell has lost it's value, and therefor to prevent this from happening the capacitor needs to be refreshed continuously.

At least every 64ms, the capacitor has to be recharged, called "refresh". The capacitor also has to be recharged when ever it is read from, because reading from it discharges the capacitor. If the capacitor is not refreshed, it will loose it's electrical charge and the cell will have a value of 0.
SRAM Versus DRAM
We haven't gotten into why we need both SRAM and DRAM. SRAM is good because it is fast, has low latency, and doesn't need to be refreshed. This memory takes approximately 2-3 cycles until it can have the information output. SRAM is bad because it is large, and therefore expensive, and it requires more power to operate, and therefore it produces a lot of heat. That is why we need DRAM. DRAM is simple, small and space efficient. DRAM may be slower and have a longer latency than SRAM, but it certainly is still very useful. Todays DRAM can take from 2 to 9 seconds of latency until information is output. SRAM is good for low amounts of memory, anything even over 4MB is very bulky. SRAM is good for internal memory in processors, and cache, but DRAM is best for main system memory, where the average computer has need for 32MB, and some high end computers need more than 8 gigabytes. DRAM is used where it's small size and power efficiency outweigh its slowness compared to SRAM.

SRAM is almost as best as we can make it. We currently don't have the technology to mass produce SRAM small enough to replace DRAM. That is why DRAM is still used in computers. Unlike SRAM, it has taken DRAM many decades to evolve into what we use in our computers, and there is even new DRAM types in the distant future.

DRAM Evolution
The first types of DRAM were asynchronous to the clock speed the bus they were connected to. This meant that they operated at the speed that they wanted to. The later DRAM types are synchronous, meaning that they operate, or at least try, to operate at the speed of the bus they are connected to. If synchronous DRAM is unable to keep up to the speed of the bus they are connected to, than they won't work at all. Each step in synchronous memory is allocated a specific number of clocks For example CAS2 ram is allowed 2 clock cycles to perform Column and then cell access. If the clock cycles are too short to perform this operation errors occur. On the other side, for asynchronous memory, the DRAM always operates at the same speed no matter what the clock speed of the bus is, and will input wait clocks to the bus until it has finished and is ready for the next request. Something also to note is the difference between the time rating for synchronous and asynchronous memory. Asynchronous memory time is reported for the length of time needed by the DRAM between when the request is made and when the output is available. And Example is EDO DRAM rated at 70ns will take 70ns to complete one entire access cycle. Synchronous memory time is reported as the time it takes to burst the information from the memory cells to the bus. It excludes the time it takes to access it. An example is 10ns SDRAM, 10ns is equal to one 100MHz clock cycle.

Page Mode DRAM
This is the first real type of DRAM. It was very slow because for every bit of information both the RAS and CAS line locations needed to be sent. This would take upwards of 120ns for each access.

Fast Page Mode DRAM
This improved on Page Mode DRAM in that the RAS location wasn't needed for consecutive accesses to the same RAS line. Therefore one RAS location could be sent, and the CAS lines could be pulsed in to continue the access without a delay. There was also a speed increase in access times from 120ns to 60ns. This would only allow operation on a 28MHz bus without the use of wait states. Some newer FPM DRAM took advantage of faster timing circuits and added an output buffer so that the output could pile up there while the next access was being started. This buffer would allow the chip to start on the next request without having to wait for the memory bus to be ready for its previous output. This was a very primitive form of pipelining, but it did give this newer version of FPM DRAM a significant speed increase. FPM DRAM would operate at only 5-3-3-3 timing on a 66MHz bus.

Extended Data Out DRAM
As computers started to use a 66MHz bus, FPM DRAM proved to be too slow. So a small improvement to it was added, the ability to send the next CAS location before the previous access was complete. This would allow some what of an overlap in CAS cycles. The net effect of EDO DRAM was the ability to make CAS cycles shorter, while still allowing them the same amount of time to complete. This was available in some newer forms of FPM, but EDO made it "official". The major advancement in EDO was its speed. EDO DRAM was designed to operate at 40MHz with zero wait states, but was commonly used with 66MHz buses with wait states enabled. It would operate at 5-2-2-2 burst timings at 66MHz.

Burst Extended Data Out DRAM
BEDO DRAM never really caught on because the far superior SDRAM was soon introduced as its replacement. BEDO would "burst" information out quicker after the first access, allowing a timing of 5-1-1-1 on a 66MHz bus. What it did was fully pipeline the memory access procedure, by dividing the access into steps. Each step wouldn't have to worry about the previous step or the next step because all of it's output would be stored in a buffer, and all of its input would be taken from the previous step's buffer. Also, BEDO's input memory latch was replaced by a register, allowing faster access and removing the need to send continuous CAS signals for a 4 word burst. The burst would increment the register on its own, or the burst could be cut short if a new value was detected. BEDO speed topped out at 66MHz with zero wait states.

Synchronous DRAM
Synchronous DRAM is what is used in almost all of todays computers. It has no problem reaching speeds of 133MHz and above.

One advantage of Synchronous ram is that it allows two banks, each with two rows in memory to be open simultaneously. This saves time in executing commands and transmitting of data. Being able to switch, also know as interleaving, between banks can hide row precharge/refresh, and first access delays. It allows one bank to be precharging (RAS and CAS activation), while the second bank is transferring data. What this means is that there can be a constant flow of input/output, and able to exploit the maximum amount out of the RAM.

Another feature of SDRAM is the use of a SPD, short for serial presence detect chip on the ram DIMM that will provide basic information about the RAM to the motherboard, such as timing, speed, manufacturer, and so forth. SDRAM contains the same ability to burst data as BEDO does, multiple RAS and CAS lines do not need to be sent, the DRAM knows enough to burst out the adjacent cells.

SDRAM is available in either PC66 (66MHz), PC100 (100MHz), or PC133 (133MHz) with CAS timings of either 2 or 3. This yields a bandwidth of 528MBps, 800MBps, and 1064MBps respectively.

Double Data Rate Synchronous DRAM
This is very similar to SDRAM, except data is transferred twice per clock, on both the rising edge and falling edge. This effectively doubles the bandwidth that the chip has.

DDR SDRAM will soon be available as system RAM, and DDR is currently being used on high performance video cards were a lot of bandwidth is needed.

RAMBUS DRAM
This is a different approach to the way that memory is constructed. Instead of using a 64bit bus to transfer data, a narrower 16bit bus is used at a very high clock speed. Rambus memory is Double Data Rate, and was designed for optimum bandwidth at the cost of latency.

Rambus DRAM is available in PC800, (400MHz DDR) with a bandwidth of 1.6GBps (1600MBps).

Video DRAM
This type of memory is used only in high performance video cards. It is different than normal SDRAM in that it is dual ported. This means that it is able to be read from and written to at the same time. This was necessary to keep up with the output needed to redraw the information on screen, but still be able to update the information without interruption.