CPU Architecture

Instruction Set Architecture

Early computers were designed to include any processor instructions that the engineers wished to include. This made programming for these specific instructions hard because there was no specific set of instructions that all processors had. Software had to be specifically written for each processor, making code incompatible between most computers. Software was also incompatible between different versions of the same processor.

Along comes the x86 instruction set. This is a standard set of instructions that is included in all consumer PC computers. This made programming for different processors easy. The x86 has been expanded, by the addition of SIMD code, but none of the original instructions have been altered.

CPU Micro-Opts
The x86 specifies what instructions a processor has to include, and what the instructions have to do. It does not specify how the instructions have to be computed. This has allowed the over 10 year old x86 instruction set to evolve with the current processors. Most processors break each x86 instruction down into smaller operations, called micro-ops. These are much simpler, and can be computed faster, resulting in an overall speed increase over the older processors.
RISC Vs CISC
Reduced Instruction Set Computing (RISC) and Complex Instruction Set Computing (CISC) are two different ways to increase speed. In the beginning, computer memory was slow, and very expensive. To make computers more economical, the complexity was added to the hardware which allowed for more compact code. There were so many instructions, most went unused. The CISC instructions were so complex that they take multiple clock cycles to complete. If a small instruction that takes 3 cycles to process is being processed in one pipeline of a superscalar processor, while another instruction was being processed in another pipeline, a misalignment could occur. If the second instruction took only 1 cycle to complete compared to the 3 cycle instruction, the resulting code would be out of order. Early CISC computer frequencies were limited by this timing issue because each instruction had to monitored. There was a constant need for keeping track of instructions so could be reassembled in the same order that it was input. All consumer PC's use CISC processors.

InstructionRISCCISC
Multiplicationa X ba X b
Squaringa X aa2
Cubinga X a X aa3

Processing Time=(time/instruction)x(instructions/operation)
CISC=(+1)x(1)
RISC=(1)x(+1)

Apple computers and most high end servers and workstations use RISC processors. In general, RISC processors are superior to CISC processors. With RISC processors, all instructions are 1 cycle, making the need for this monitoring obsolete. RISC was founded later then CISC, when memory was cheaper than processing time, so the bulky CISC ISA was no longer needed. CISC had already a big market, and because software was incompatible between the two, RISC never caught on. Most consumers were unwilling to have to go out and buy all new software for if they wanted a RISC computer. This didn't stop RISC from being accepted into the server market, where performance is more important than money.

RISC processors were designed with simplicity in mind. This means that the RISC processors would naturally be smaller than CISC processors because they would have less transistor logic. To fully exploit the manufacturing process, RISC processors use this extra space to add more registers and buffers on die. What this means is that they are able to keep more information in memory which is faster than L1 cache, and allows more operations to be performed before new information has to be loaded. This is what gives RISC processors their speed. The average workstation RISC processor can calculate more than twice the floating point operations of an equally clocked CISC processor.

The current processors are neither RISC or CISC. Some of today's "RISC" processors actually have more instructions then "CISC" processors, while today's "CISC" processors have included the greater buffer sizes that at one point only RISC processors had.

CPU Chips

Intel
Pentium III Katmai
These were the original P3 CPUs. Like all P3 processors, these processors use a 32Kbyte L1 cache. These processors have a .25 micron die and an external 512KB of L2 Cache running at 1/2 the core speed. The external L2 cache meant that they used only the Slot1 interface. They operated on a 100MHz FSB at speeds of 450, 500, 550, and 600MHz.

Pentium III Katmai B
These were the second version of the P3 to be released, and offered no new features except that they operated on a 133MHz FSB. Again, they were only Slot1. They were available in speeds of 533MHz and 600MHz.

Pentium III Coppermine E
These processors added the Intel SSE instructions to the processors, along with 256k of on-die L2 cache. This on-die L2 cache meant that they were able to be run in both Slot1, and Socket370 format. Despite the name, these chips do not use copper interconnects, they still use aluminum. They still used 0.25 micron fabrication and operated at speeds of 500 and 550MHz.

Pentium III Coppermine E
These chips use the same core as the original Coppermines, but they are made using a 0.18 micron process which allows them to reach higher frequencies. They are available in 600, 650, 700, 750, 800, and 850 frequencies.

Pentium III Coppermine EB
These chips used the same core as the P3 E's, but were able to operate on a 133MHz FSB. This chips are specifically for socket370 format, with the exception of the 1000EB and 1133EB They are available in 533, 600, 667, 733, 800, 866, 900, 1000, and 1133MHz frequencies.

Celeron A
Celeron processors are Intel's budget line. The first Celerons operated on a 66MHz bus and had no L2 cache. These had miserable performance and were soon discontinued. They used a 0.25 micron process, Socket370 and Slot1 interface and were available in speeds of 266, 300 and 333MHz.

Celeron
These chips are what replaced the Celeron A line. They feature a 128Kbyte L2 cache. These chips come in speeds of 366, 400, 433, 466, 500, and 533MHz.

Celeron II
These chips are similar to the Celerons, but they have a FCPGA design for Socket370 interfaces, and use the Coppermine core at a 0.18 micron. They only have 128kbytes of L2 cache, and can only operate on a 66MHz FSB. They are available in speeds of 533, 566, 600, 633, and 700MHz.

Advanced Micro Devices
Athlon
This was AMD's first and only Slot1 interface CPU, which was originally code named the K7. This core features a 128kbyte L1 cache, 4X as large as Intel's fastest processors. This processor used a SECC format because its L2 cache wasn't on die. A 512kbyte L2 cache was mounted on the SECC, which operated at speeds of 1/2, 2/5, or 1/3 of the core frequency, depending on the processor, and the SRAM available during production. The fastest SRAM that was ever used was 350MHz. This wasn't a problem, but as the frequencies increased to over 700MHz, it became a bottleneck. Currently no more Athlon processors are being produced, but there are many still on the market. All AMD production has shifted to the Thunderbird and Duron line. Notably, this was the first retail processor to reach the 1GHz mark. They available speeds were 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, and 1000MHz.

Athlon (Enhanced/Thunderbird)
This is the current highest performance processor line from AMD. It is based on the Athlon core, with 256kbytes of integrated L2 cache that can operate at full core frequency. It was originally code named Thunderbird, but now sells as just Athlon, because the original Athlon is no longer produced. It was the first processor to use copper interconnects, and is currently being sold using both copper and aluminum. It was originally designed for only SocketA interfaces, but some SlotA versions are being sold to OEMs until their SlotA supply of motherboards runs out. This chip is currently available in speeds of 700, 750, 800, 850, 900, 950, 1000, and soon to be 1100MHz.

Duron
Code named Spitfire, the Duron is AMD's current budget line. It is exactly the same as the Thunderbird processor, except it features only 64Kbytes of on die L2 cache. This chip is only available in speeds of 600, 650 and 700MHz.

Transistor Materials
A transistor is a tiny electrical switch. An average CPU contains over 25 million transistors. Transistors are made out of materials know as semiconductors. Semiconductors are the transitional materials between Conductors/Metals and Nonconductors/Nonmetals. These elements don't really prevent or insulate electrical current, and they don't really allow or conduct electrical current. Instead, we can control their behavior and make them selectively conductive and selectively nonconductive. Is how we can use them as a switch.

The element used to make transistors is called Silicon. The major source for silicon is sand. But pure silicon isn't ideal for electronics, so a form of silicon is used called dope silicon. Dope silicon is created by combining silicon with either Boron or Phosphorus, depending on the properties needed.

Boron and silicon form what is know as P-Type dope silicon. P-Type silicon has a positive charge. Phosphorus and silicon form what is know as N-Type dope silicon. N-Type has a negative charge. Using these to materials, a wide variety of transistors and logic gates can be created.

Transistors
A transistor is a semiconductive device with three leads or connections. At the Gate connection, a very small electrical current can be used to control the amount of current that is allowed to pass from the Drain to the Source. A transistor has purposes, it can be an amplifier or a switch. So there are 2 different types of transistors, Bipolar and Field Effect. In order for a CPU to operate, the transistors must act as switches and be very small so field effect transistors must be used. Computers only work with on or off signals, 1's or 0's.

The type of transistor used in processor cores are named Metal Oxide Semiconductor Field Effect Transistors. This is because of the way that they are made. MOSFET's can be either N-Type or P-Type.


This is the basic MOSFET layout.

N-Type MOSFET Operation

The Source and the Drain are both made out of N-Type silicon. The is a section of P-Type silicon in between the two connections with the Gate right above it. Without any current at the gate, electrons are not allowed to travel from the drain to the source. When a positive current is applied to the Gate, electrons from the lower sections of the P-Type silicon are attracted to the positive charge and move up to be near the gate. This neutralizes the P-Type's positive charge in the section right under the gate, creating a negative "N-Type" bridge between the drain connector and the source connector. The stronger the voltage that is applied at the gate, the more electrons will be allowed to move from the drain to the source.
P-Type MOSFET (NMOS) Operation
The Source and the Drain are both made out of P-Type silicon. The is a section of N-Type silicon in between the two connections with the Gate right above it. Without any current at the gate, electrons are not allowed to travel from the drain to the source. When a negative current is applied to the gate, the extra electrons under the gate spread out into the lower sections of the N-Type silicon being repelled by the negative gate. This neutralizes the N-Type's negative charge in the section right under the gate, creating a positive "P-Type" bridge between the drain connector and the source connector. The stronger the voltage that is applied at the gate, the more electrons will be allowed to move from the source to the drain.
Logic Gates
Transistors themselves are of limited use, but they can be arranged into patterns to create Logic Gates that have tremendous functionality. These logic gates have know patterns for know inputs, which can be arranged in what is known as a True Table.

AND OR NOR  NAND   NOT  XOR XNOR
aby
000
010
100
111
aby
000
011
101
111
aby
001
010
100
110
aby
001
011
101
110
ay
01
10
aby
000
011
101
110
aby
001
010
100
111

Truth tables are fairly simple to understand. Logic gates are made out of numerous transistors, and they have been named according to their operation. These gates, except for the NOT gate, all have 2 inputs, labeled a and b, and have one output, labeled y. These circuits can be designed to have more, but this is their simplest form.

The AND Gate : Output for this logic gate is always false, or 0 value whenever both of the inputs are not true, or 1. For output to be 1, both a AND b both have to be true.

The OR Gate : For output to be true, either one or both of the inputs have to be true.

The NOR Gate : This logic gate will always have the opposite value of the OR gate. It is basically can be thought of as an OR gate with a NOT gate right after the output.

The NAND Gate : Output will always be the opposite of what an AND gate would output. Again, this can basically be thought of as an AND gate with a NOT gate right after the output.

The NOT Gate : This gate has one input because it doesn't do "comparisons", the output is just the opposite of the input.

The XOR Gate : This is similar to an OR gate, but differs in that only one input can be 1 for the output to be one. It is exclusive, if both inputs are either 1 or 0, than the output will be zero. Both inputs must be different, although it doesn't matter which one is high, or which one is low.

The XNOT Gate : This is similar to the XOR gate, but the output is reversed. It can be basically thought of as a XOR gate with a NOT gate right at the output. Both inputs must be the same for the output to be 1.

Transistor Advancements
Transistor manufacturing performance have made transistor speeds increase exponentially in the past decades, from the very first processors like the 68000 which operated at a speed of 8MHz, to the recent chips which are breaking the 1GHz barrier.

Transistor Manufacturing
Transistors are made through a process called photolithography. This is where light is used to etch the transistors into a sheet of silicon called a wafer. How this is done is the silicon is coated with a thin sheet of light reactive material. Then light is projected through a negative screen, reacting only portions of the coating. When the coating reacts, it hardens to the silicon. Then the unreacted coating is removed from the silicon wafer, and the wafer is exposed to acid or hot ions. This will dissolve the exposed silicon, but will not dissolve the silicon under the coating. After this process, the coating is removed by special reactive materials, which do not effect the silicon.

Interconnect Materials
Most of todays processors use connectors made out of aluminum to connect the transistors. This is because aluminum is economical, and can more accurately be placed on the silicon because it has less surface tension. Technology is just beginning to allow for copper based processors, which are far superior than aluminum based ones. Copper produces less heat, and cares higher frequency signals with less resistance.

Transistor Size
Transistor sizes are the main reason that current processors are able to hit such high speeds. One property of electrical currents is that they operate much quicker and with less resistance at lower operating temperatures. The major factor limiting CPU speeds is heat dissipation from the processors. Electricity will always experience resistance when moving through materials. This is why some poorer quality electrical chords get warm when electricity is flowing through them, and the reason for why light bulbs and fuses work. Electrical currents passing through the processor create heat which, if hot enough will cause the processor to stop working, malfunction or burn out.

Smaller transistors in the processor use less voltage to operate, and therefore create less heat. As processor frequencies increase, more voltage is needed to keep up signal strength for each clock signal. The newer top of the line processors use heatsinks that are larger than a medium sized alarm clock just to keep cool.

Transistor size is calculated in microns (u), properly named, micrometers (um), which are 10-6meters. There are exactly 1000um in one 1mm. When referring to micrometers in regards to transistor size, the unit is represented with a "u". This is referred to as the "micron process" of a processor, because the processor uses transistors which are "X"um's in width. Early processors used a 3.5u process, with today's processors using a 0.18u process.