Chipsets and Memory Architectures

July 31, 2016 | Author: Piers Lawrence | Category: N/A
Share Embed Donate


Short Description

1 Chipsets and Memory Architectures Since the Front Side Bus (FSB) connects the processor and the northbridge chip, all ...

Description

Chipsets and Memory Architectures Since the Front Side Bus (FSB) connects the processor and the northbridge chip, all the instructions and data the processor works on passes through it. Connected to the northbridge chip are the memory, hard drive, graphics card and just about everything else. That is, until, things changed with the introduction of AMD's HyperTransport and Intel's QuickPath Interconnect bus architectures. Increased frequencies and innovations such as double and quad data rates/pumping have helped, but with multi-core processors, insanely powerful graphics cards and everything else, it's just become flooded. Data rates have topped out at 12,800 MB/s at best, which is just not fast enough. Intel's QuickPath Interconnect and AMD's HyperTransport involve processors with onboard memory controllers. No longer does the processor have to funnel data from the main memory down the same bus it’s trying to access the hard drive and transmit data to the graphics card. Moving the memory controller off the FSB means the old limitations on memory speed have been removed too – it's no longer limited to the speed of the FSB, so dual or triple channel memory connections are the norm. Dual channel DDR3-2000, for example, can deliver a maximum (on paper) of 32GB/s, considerably more than the FSB would ever be able to handle (see the end of this handout). The second aspect of these new architectures is the bus itself, which sits between the processor and the northbridge chip (or what was the northbridge chip). QPI and HT are not buses, but point-to-point connections. A bus is a set of wires that can be used to connect a number of devices, while point-to-point, as you might have guessed, is just for connecting two devices. They are still called buses, because it still involves a set of wires on the motherboard transferring data. Both systems use similar approaches and have features, although the technical implementations are different.

Intel and AMD Chipsets Basic Differences in Processor/chipset Designs: FSB vs. Direct Connect, HyperTransport Technologies and Quick Path Interconnect What is the difference between AMD and Intel processors? One answer is in AMD's Direct Connect Architecture and its use of HyperTransport Technology (HTT). A main function of the processor is to read and write information to and from memory as fast and efficiently as possible. When developing the AMD Opteron and AMD Athlon 64 processors, AMD’s engineers needed to find a method to efficiently deliver data to the processor from main memory even when handling the large data loads of 64-bit and multi-core computing. To keep all cores as productive as possible, they looked to processor designs outside of what had been used for 32-bit x86-based processors. The integrated memory controller, one of the key innovations of its Direct Connect Architecture, has it roots in the architecture of RISC processors. The benefit of an integrated memory controller for AMD64 processors, along with the power efficiency of processors, resulted in AMD's performance-per-watt leadership; however, Intel’s Core 2 Duo architecture has knocked AMD off that perch–for now.

1

In traditional x86 system design, a front side bus (FSB) connects the CPU to the main memory of a system. A separate memory controller chip is responsible for coordinating the data traffic as it passes from the memory to the CPU. However, in AMD64 and RISC processor design, the memory controller is integrated onto the processor, which can reduce the amount of time it takes for data to move from one component to another. For example, until the DDR2 Direct Connect bus was introduced, an AMD Athlon 64 accessed RAM memory by transferring data and the leading and trailing edge of the clock pulse, making its maximum transfer rate with the RAM memory be 3,200 MB/s, which was the same maximum rate of the DDR400/PC3200 memories. It still does this but now has two channels to DDR2. Thus the Athlon 64 (X2, Opteron, etc.) has two external buses: the first accesses RAM memory (either using a single channel to DDR or dual channel to DDR2 RAM), and the second to access the chipset (typically the Northbridge). This second bus is called HyperTransport bus. Theoretically this architecture is better than Intel’s FSB/Chipset architecture. Like Intel’s FSB, the HT bus is used to communicate with the Northbridge chip (which is used as a gateway to the video card and the slower devices off the Southbridge chip); however it is faster and can be easily extended in both bus width and speed. In theory, the Athlon 64 can communicate with the memory and with the other circuits of the computer at the same time. Another advantage of the HyperTransport is that it has one path for the transmission of data and a second for its reception—it is thus two-way or full-duplex. In the traditional architecture used by the other processors, a single path is used both for the transmission and for the reception of data. In theory, the Athlon 64 can both transmit and receive data to the chipset at the same time. Thus the HyperTransport bus works at 3,200 MB/s in each direction: that is why it is listed as being a 6,400 MB/s bus, which is not true, and is one of the major misunderstanding by computer practitioners. In brief, it is as if people claimed that a motorway had a speed limit of 130 kph just because there is a speed limit of 65 Kph in each direction. It is a significant advantage over Intel’s FSB as it permits the processor(s) to send (e.g. to the video card) and receive (from the Internet/DVD/HDD) data. A typical HTT (Version 1.1) bus works at 800 MHz, with two 2 bytes of data being transferred at both the leading and trailing edge of the clock pulse: in terms of speed this equates to 800*2 = 1600Mhz bus transferring 2 bytes of data per clock pulse. In effect the bandwidth is calculated by multiplying 800Mhz * 2 bytes (switched on the leading edge) * 2 bytes (on the trailing edge) giving a bandwidth of 3,200 MB/s. The former calculation leads to another misunderstanding by saying the external bus of the Athlon 64 is 1,600 MHz. Nevertheless, the older AGP 8x bus works at > 1 GB/s: therefore, a HyperTransport bus is 3 times faster than this, leaving plenty of bandwidth for remaining I/O devices. However, emergence of PCI Express with higher bandwidth, especially for video I/O meant that HTT would have to provide higher bandwidth speeds. All this means that Hypertransport Technology had to evolve: from HTT 1.1 to 2.0 and now to 3.0. It's interesting that within the HyperTransport system, performance is nominally measured in billions of transfers per second, or gigatransfers or GT/s—not gigabytes. That's because each interconnect or bus can range in width from two to 32 bits. Furthermore, as indicated it is also a "Double Data Rate" or DDR technology, meaning nodes send data on both the rising and falling edges of the clock signal. Thus the aggregate bandwidth of the original HyperTransport 1.1 link is 1.6 gigatransfers/second or 800 Mhz * 2: if the bus width is 16 bits or 2 bytes, this translates to 3.2 GB/sec of data per direction. HyperTransport 2.0 adds three new speed grades:

2

2.0, 2.4, and 2.8 gigatransfers per second, with clock rates of between 1.0GHz and 1.4GHz, So, a 1 Ghz Hypertransport bus with a 16 bit width will deliver a bandwidth of 2 GT/s * 2 = 4GB/s ( 1* 2 (DDR) * 2 (2 bytes= 16 bits). Thus with a 2 byte bus a HyperTransport 2.0 system with speeds of 2.0, 2.4 and 2.8 GT/s (gigatransfers/second) translate to 4.0, 4.8 or 5.6 GB/s.

HyperTransport frequency specifications HyperTranspor Max. HT Yea t Frequenc r Version y

Max. Link Widt h

Max. Max. Max. Aggregate Bandwidth at Bandwidth at Bandwidth 16-Bit 32-Bit (biunidirectiona unidirectional directional l * )

1.0

2001

800 MHz

32-bit

12.8 GB/s

3.2 GB/s

6.4 GB/s

1.1

2002

800 MHz

32-bit

12.8 GB/s

3.2 GB/s

6.4 GB/s

2.0

2004

1.4 GHz

32-bit

22.4 GB/s

5.6 GB/s

11.2 GB/s

3.0

2006

2.6 GHz

32-bit

41.6 GB/s

10.4 GB/s

20.8 GB/s

3.1

2008

3.2 GHz

32-bit

51.2 GB/s

12.8 GB/s

25.6 GB/s

HyperTransport is primarily a chip-to-chip interconnect, so an important element of its design is its bridge capabilities to board-level bus systems, such as AGP, PCI, PCI-X and PCI Express. That's required for providing a HyperTransport-based system board with access to the huge array of I/O devices, ranging from AGP desktop graphics cards to Ethernet, Fibre Channel and SCSI adapters. Each HyperTransport link consists of a host device and an endpoint; any of those devices may be a bridge to one of those slower board-to-board networks. The original HyperTransport specification defined the PCI and AGP bridges. HyperTransport 1.05 created a bridge to PCI-X; HyperTransport 2.0, which appeared in 2004, added in the PCI Express mappings and appropriate bridge technology. AMD Athlon 64, Athlon 64 FX, Athlon 64 X2, Athlon X2, Athlon II, Phenom, Phenom II, Sempron, Turion series and later use one 16-bit HyperTransport link. AMD Athlon64 FX (1207), Opteron use up to three 16-bit HyperTransport links. Common speeds for these processor links are 800 MHz to 1 GHz (older single and multi socket systems on 754/939/940 links) and 2 GHz to 2.6 GHz (newer single socket systems on AM2+/AM3 links). While HyperTransport itself is capable of 32-bit width links, that width is not currently utilized by any AMD processors. Some chipsets though do not even utilize the 16-bit width used by the processors. Those include the Nvidia nForce3 150, nForce3 Pro 150 and the ULiM1689 which utilize a 16-bit HyperTransport downstream link but limit the HyperTransport upstream link to 8-bit.

Intel QuickPath Interconnect The Intel QuickPath Interconnect (QuickPath, QPI) is a point-to-point processor interconnect developed by Intel which replaces the Front Side Bus (FSB) in Xeon, Itanium, and certain desktop platforms. It was designed to compete with HyperTransport Tchnology. Intel first delivered it in November 2008 on the Intel Core 3

i7-9xx desktop processors and X58 chipset. Intel developed QPI at its Massachusetts Microprocessor Design Center (MMDC) by members of what had been DEC's Alpha Development Group, which Intel acquired from Compaq and HP. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI). Earlier incarnations were known as YAP (Yet Another Protocol) and YAP+.

The QPI is an element of a system architecture that Intel calls the QuickPath architecture that implements what Intel calls QuickPath technology. In its simplest form on a singleprocessor motherboard, a single QPI is used to connect the processor to the IO Hub (e.g., to connect an Intel Core i7 to an X58). In more complex instances of the architecture, separate QPI link pairs connect one or more processors and one or more IO hubs or routing hubs in a network on the motherboard, allowing all of the components to access other components via the network. As with HyperTransport, the QuickPath Architecture assumes that the processors will have integrated memory controllers, and enables a nonuniform memory access (NUMA) architecture. It was first released in Xeon processors in March 2009 and Itanium processors in February 2010.

4

Each QPI comprises two 20-lane point-to-point data links, one in each direction (full duplex), with a separate clock pair in each direction, for a total of 42 signals. Each signal is a differential pair, so the total number of pins is 84. The 20 data lanes are divided onto four "quadrants" of 5 lanes each. The basic unit of transfer is the 80-bit "flit," which is transferred in two clock cycles (four 20 bit transfers, two per clock.) The 80-bit "flit" has 8 bits for error detection, 8 bits for "link-layer header," and 64 bits for "data". QPI bandwidths are advertised by computing the transfer of 64 bits (8 bytes) of data every two clock cycles in each direction. Although the initial implementations use single four-quadrant links, the QPI specification permits other implementations. Each quadrant can be used independently. On high-reliability servers, a QPI link can operate in a degraded mode. If one or more of the 20+1 signals fails, the interface will operate using 10+1 or even 5+1 remaining signals, even reassigning the clock to a data signal if the clock fails.

The initial Nehalem implementation uses a full four-quadrant interface to achieve 25.6 GB/s, which provides exactly double the theoretical bandwidth of Intel's 1600 MHz FSB used in the X48 chipset. 5

Although some Core i7 processors use QPI, other Nehalem desktop and mobile processors (e.g. Core i3, Core i5, and other Core i7 processors) do not—at least in any externally accessible fashion. These processors cannot participate in a multiprocessor system. Instead, they directly implement the DMI and PCI-e interfaces, obviating the need for a "northside" device or a processor bus of any type.[citation needed]

QuickPath Interconnect frequency specifications QPI operates at a clock rate of either 2.4 GHz, 2.93 GHz, or 3.2 GHz. The clock rate for a particular link depends on the capabilities of the components at each end of the link and the signal characteristics of the signal path on the printed circuit board. The nonextreme Core i7 9xx processors are restricted to a 2.4 GHz frequency at stock reference clocks. Bit transfers occur on both the rising and the falling edges of the clock, so the transfer rate is double the clock rate. Intel describes the data throughput (in GB/s) by counting only the 64-bit data payload in each 80-bit "flit". However, Intel then doubles the result because the unidirectional send and receive link pair can be simultaneously active. Thus, Intel describes a 20-lane QPI link pair (send and receive) with a 3.2 GHz clock as having a data rate of 25.6 GB/s. A clock rate of 2.4 GHz yields a data rate of 19.2 GB/s. More generally, by this definition a two-link 20-lane QPI transfers eight bytes per clock cycle, four in each direction. The rate is computed as follows: 3.2 GHz × 2 bits/Hz (double data rate) × 20 (QPI link width) × (64/80) (data bits/flit bits) × 2 (unidirectional send and receive operating simultaneously) ÷ 8 (bits/byte) = 25.6 GB/s QPI is specified as a five-layer architecture, with separate physical, link, routing, transport, and protocol layers. In devices intended only for point-to-point QPI use with

6

no forwarding, such as the Core i7-9xx and Xeon DP processors, the transport layer is not present and the routing layer is minimal.

PCI Express (2.5 GHz Clock Frequency) PCI Express Lanes

Bandwidth per stream

Bandwidth, duplex

1

256 MB/s

512 MB/s

2

512 MB/s

1 GB/s

4

1 GB/s

2 GB/s

8

2 GB/s

4 GB/s

16

4 GB/s

8 GB/s

32

8 GB/S

16 GB/s

Basic PCI Express x1 buses, which are single channel, permit 256 MB/s transfer rate. Each channel or lane of a PCI Express connection contains two pairs of wires; one to send and one to receive. Packets of data move across the channel/lane at a rate of one bit per cycle. An x1 connection, the smallest PCI Express bus, has one lane made up of four wires. It carries one bit per cycle in each direction. An x2 link contains eight wires and transmits two bits at once, a x4 link transmits four bits, and so on. Other configurations are x12, x16 and x32. For example, the x16 bus (16 channels) used for video boards have a bandwidth of 4 GB/s (16 * 256) in a single direction, with 6.4 GB/s possible in both directions. PCIe transfers data at 250 MB/s, per channel to a maximum of 32 lanes, a total combined transfer rate of 8 GB/s. It must be noted that PCIe is able to transfer data in both directions at once (full-duplex). This effectively doubles the data transfer rate allowing 500 MB/s per channel giving a total combined transfer rate of 16 GB/s when 32 channels are employed.

The 975X and 965 Intel Chipsets There are only two high-end core logic products for socket 775 available: Nvidia's nForce 590 SLI chipset, and Intel's 975X. Both support all current processors, and high-end motherboards come with comparable feature sets. You need the Nvidia chipset to install an Nvidia SLI dual graphics solution, or the 975X to run an ATI Crossfire dual graphics setup. (It's ironic to see an ATI setup running on an Intel system these days, because the Canadian graphics company was recently purchased by AMD.) The 975X is Intel's state-of-the-art core logic for high-end desktop and entry-level workstation systems. It supports all socket 775 Pentium 4, Pentium D, Core 2 Solo and Core 2 Duo processors at FSB800 and FSB1066 speeds, and it comes with a Dual DDR2800 memory controller. As already mentioned, the chipset will support two graphics cards, each running at PCI Express x8 link speed, or the graphics interface can be configured to support a single graphics card at x16 link speed. For device connectivity, Intel uses its well-known ICH7R Southbridge component, which supports High Definition audio, eight USB 2.0 ports, six x1 PCI Express lanes, a conventional PCI bus and four Serial ATA/300 ports for hard drives and modern optical drives. There is also one UltraATA/100 channel for your legacy devices, and Intel has implemented its Matrix

7

Storage technology, which allows you to install multiple RAID sets across a common set of hard drives.

Intel's 975X Chipset

Intel Core 2 Duo Chipset

8

Intel released the P965 Express MCH and ICH8/R chipsets in 2006. The major features of the P965 MCH include Intel's new Fast Memory Access technology, 1066MHz front side bus support, 800MHz DDR-2 memory support, and full support for the new Core 2 Duo processor lineup. The integrated graphics versions, G965 and Q965, will ship with the new GMA X3000 graphics engine for the ViiV and Corporate markets respectively. The new ICH8/R chipsets offer ten USB 2.0 ports, up to six 3Gb/s SATA ports, Intel's new Quiet System Technology, and the removal of Parallel ATA support. While one could argue that the removal of PATA support is a needed step forward in technology, the industry believes that Intel should have waited until the next generation ICH for this change. The Optical Drive market is still about 98% PATA based and does not seem to be changing anytime soon. While this development might spur the optical drive suppliers into offering additional SATA drives in the near future, it does not address the requirements of the current PATA installed base. This most likely means there will be additional costs and complexity on the motherboards using ICH8 as the manufacturers will have to add an additional chipset for PATA support.

Intel X58 Express Chipset (2009) The Intel X58 Express Chipset supports the latest 45nm Intel Core i7 processor family at 6.4 GT/s and 4.8 GT/s speeds via the Intel® QuickPath Interconnect (Intel® QPI). Additionally, this chipset delivers dual x16 or quad x8 PCI Express* 2.0 graphics card support, and support for Intel® High Performance Solid State Drives on ICH10 and ICH10R consumer SKUs.

9

Random Access Memory in Personal Computers Dynamic RAM (DRAM) is a type of RAM that employs refresh circuits to maintain data in its logic circuits. Each memory cell in DRAM consists of a single transistor and a capacitor. The capacitor is responsible for holding the electrical charge that designates a 1 bit. The absence of a charge designates a logical 0. Capacitors loose their charge over time and therefore need to be recharged or refreshed. A more expensive and faster type of RAM, Static RAM (SRAM), uses between 4 and 6 transistors in a special ‘flip-flop’ circuit that maintains a 1 or 0 while the computer system is operating. SRAM can be read or written to like DRAM. DRAM logic, on the other hand, is refreshed several hundreds of times a second. To do this, the DRAM controller logic merely reads the contents of each memory cell, and because of the way in which cells are constructed, the reading action simply refreshes the contents of the memory. This action puts the dynamic into DRAM. However, refreshing takes time and increases the latency (the time taken for a memory access request to the time the data is output) of DRAM. DRAM is used in all computers and associated devices for main system memory even though DRAM is slower than SRAM, due to operation of the refresh circuitry. Also, DRAM is used because it is much cheaper and takes up less space—typically 25% the silicon area of SRAMs or less. To build a 256 MB of system SRAM memory would be prohibitively expensive. However, technological advances has led to faster and faster forms of DRAM, despite the disadvantages of the refresh circuit. As indicated, DRAM modules are smaller and less expensive than SRAM because the latter employs four to six transistors (or more) to store one single bit, as opposed to DRAM’s single transistor (the switch) and capacitor (the 1/0 charge store). SRAM is mainly employed in L1 and L2 cache memory on Intel Pentium CPUs and in L1, L2 and L3 cache in the Itanium family. In 2004, Intel released Pentium IVs with and onboard L3 cache and its L2 cache reached a whopping 1 MB. All things being equal, the more cache on board a CPU package, the larger the scale of integration as the number of transistors runs into the millions. Itanium II processors have up to 500 million transistors, the majority of which are in the L1, 2 and 3 caches. DRAM technology involves very large scale integration (VSLI) using a silicon substrate which is etched with the patterns that make the transistors and capacitors. Each unit of DRAM comes packaged in an integrated circuit (IC). By 2003, DRAM technologies had evolved to the point where several competing technologies existed, however, the older of these are slower (i.e. have a higher latency) and contain less MB of storage per unit. Ceterus paribus, adding more memory to a computer system increases its performance. Why is this? Well, if the amount of RAM is insufficient to hold the processes and data required, the operating system has to create a swap file on hard disk, which is used to create virtual memory. On average, it takes a CPU about 200 nanoseconds (ns) to access DRAM compared to 12,000,000ns to access the hard drive. More RAM means less swapping and speeds up a system.

Synchronous Dram (SDRAM) In late 1996, synchronous DRAM began to appear in computer systems. Unlike previous RAM technology, which used asynchronous memory access techniques, SDRAM is synchronized with the system clock and the CPU. SDRAM interface is synchronized with the clocked operation of computer's system bus (i.e. clock pulses trigger the gates to open and close) , and thus with the processor. In the SDRAM module itself clock pulses are used to drive logic circuits that pipelines incoming read/write commands. This allows the chip to have a more complex pattern of operation than DRAM which does not 10

have synchronizing control circuits. Pipelining means that the chip can accept a new command before it has finished processing the previous one. In a pipelined write, the write command can be immediately followed by another command without waiting for the data to be written to the memory array. In a pipelined read, the requested data appears a fixed number of clock pulses after the read command. This delay is called the latency—this is a key performance variable for all types of RAM. Thus SDRAM employs interleaving and burst mode functions, which make memory retrieval even faster. SDRAM dual inline memory modules (DIMMs, as opposed to the older single inline memory modules, SIMMs) were available from numbers vendors and at several different packing densities (e.g. the amount of MB on each DIMM) and speeds. The speed of SDRAM chips were closely associated with the speed of the front-side bus, in order to synchronous with the operation of the CPU. For example, PC66 SDRAM runs at 66MHz, PC100 SDRAM runs at 100MHz, PC133 SDRAM runs at 133MHz, and so on. Faster SDRAM speeds such as 200MHz and 266MHz appeared later.

Double Data Rate Synchronous SDRAM (DDR SDRAM) DDR SDRAM was targeted at Intel’s 7th Generation Pentium CPUs, as its key innovation is that the memory control logic gates switch on both the leading and trailing edge of a clock pulse, rather than on just the leading edge as with normal gate operation. With typical SDRAM technology binary signals on the control, data and address potions of the system bus from the Northbridge chip to the memory unit are transferred on the leading edge of the clock pulse that opens the bus interface logic gates. Until the advent of the Pentium IV CPU, bus speed was dictated by the system clock speed, which ran at 100 MHz and 133 MHz on the Pentium III. The Frontside Bus (FSB) speed to the Northbridge chip and the portion of the system bus from the Northbridge chip to the memory chips ran at 100 MHz and 133 MHz. The Pentium IV Willamette, however, had a FSB speed of 400 MHz (the 100 MHz system clock was ‘quad pumped’ (x 4) to achieve this). With a data bus width of 64 bits (8 bytes), this gives a data bandwidth of 3.2 Giga Bytes (400 MB x 8 = 3,200 Mbps or 3.2 Gbps). Note, data transfer rates within a computer are rated in Kilo, Mega, or Giga Bytes per second due to the parallel method of transfer, while a computer network (client/server etc.) it is measured in bits per second (Kilo, Mega or Giga bits per second). Older SDRAM technologies (PC100 SDRAM and PC133 SDRAM) operate at system bus speeds and therefore constitute a bottleneck for Pentium IV systems. Hence, the advent of DDR technology which helped alleviate the bottleneck, and Intel’s support for RAMBus DRAM, which it felt was a better solution. With DDR technology, special logic circuitry enables data to be transferred on the leading and trailing edge of the ‘1’ clock pulse (remember each clock cycle consists of a ‘1’ followed by a ‘0’). Taking the data bus for example, a clock pulse transition from the ‘0’ of the preceding cycle to the ‘1’ of the next opens logic gates to allow 64 bits of data onto the data bus, while the transition from ‘1’ to a ‘0’ results in another 64 bits being switched onto the bus. Of course the gates on the Northside chip that serves the memory segment of the system bus open and close in unison. This effectively doubles the speed of operation of SDRAM, hence the term the term Double Data Rate. With DDR SDRAM, a 100 or 133MHz system clock rate yields an effective data rate of 200MHz or 266MHz when double switched Newer designs (PC2700 etc.) are based on DDR SRRAM running at 166MHz, which is double switched to give an effective rate of 333 MHz.

11

Speed Ratings for DDR SDRAM As indicated, in the past, the speeds that SDRAM chips operated at were dictated by the bus speeds, so PC100 and PC133 SDRAM DIMMs operated on 100 and 133 MHz FSBs. DDR-DRAM ratings are not based on clock speed, but on the maximum data bandwidth or throughput.

   

DDR-200: DDR-SDRAM memory chips specified to operate at 100 MHz DDR-266: DDR-SDRAM memory chips specified to operate at 133 MHz DDR-333: DDR-SDRAM memory chips specified to operate at 166 MHz DDR-400: DDR-SDRAM memory chips specified to operate at 200 MHz

Hence, with a 100 MHz system bus on a Pentium IV Willamette system, the maximum data bandwidth is 1600 Mbytes per second (100 x 2 x 8) or 1.6 GB. Hence, the industry designation for DDR SDRAM DIMMS on 100 MHz systems is PC1600. Likewise, with a 133 MHz system clock speed, the designation for DDR SDRAMs that operate at 133 MHz is PC2100 (133 x 2 x 8 = 2133 MB per second). The reason for this rating system lies with manufacturer’s marketing strategies. For example, RAMBus DRAM RIMMs are designated PC800, because of the rate in MHz at which the memory chips operate. This is the internal and external rate of operation to the Northbridge chip; however, the data bus between memory and the Northbridge chip is a mere 16 bits at which it operates. This gives a bandwidth of 1600 MBytes/second (800 X 2 bytes), the same as DDR (however, the actual throughput of data is higher and latency lower in RAMBus DRAM RIMMs. The manufacturers of DDR SDRAM were reluctant to badge their chips with smaller designations (e.g. PC200 or PC266) as potential customers might not buy their chips even though the difference in performance was negligible. Further advances in DDR SDRAM technologies saw DDR SDRAM-based Intel and VIA chipsets which accommodated PC2400 and PC2700 DDR SRRAM running at 150 MHz and 166 MHz respectively, which is double clocked to 300 and 333 MHz (so called DDR 300 and 333). However, the evolution of DDR366 and new chipset design led to the PC3000 DDR SDRAM being released with even higher bandwidth speeds. New the market is Dual Channel DDR 400 with 3.2GBytes/s (200 x 2 x 8) peak bandwidth. Dual Channel DDR: Making DDR Perform Faster Pentium IV/ Intel 915/925 chipsets had an 800 MHz FSB memory bus that is twice the speed of the 400 MHz DDR400, which would effectively slow things down to 400 MHz clock speed due to the single channel bottleneck. Ideally, you would want to match the processor front side bus and the memory bus, so DDR SDRAM running at 800 MHz would be an optimal solution. However, back then the technical challenges of getting an 8 byte memory bus to operate at 800 MHz were not insignificant. The solution to this was to adopt a dual channel approach, just like that in RAMBus memory technologies. Practically speaking, dual channel DDR400 requires two DIMM slots and two modules. The architecture, while offering 6.4 GBytes/s of peak bandwidth, simultaneously splits the back-and-forth signaling with the CPU. The signal from each channel comes from one of two sockets in the chipset. In May 2002, Intel began to play catch-up with Via and SiS by releasing Pentium IV chipsets with a 500 MHz FSB that supported DDR SDRAM. This seemed to increase Intel’s commitment to DDR technology. Remember Intel had based its first Pentium IV chipset on RDRAM and the company took a long time it considered adding SDRAM, much less DDR support. However RAMBus was never

12

popular with the computer manufacturing industry, especially among JEDEC1 members and memory suppliers. Even with this, it came as some surprise when at a recent Intel Developers Forum (IDF), the company indicated its support for single- and dual-channel DDR400 in its new 800 MHz FSB chipsets. More significant, however, was that Intel’s memory roadmap for the future did not include RAMBus memory.

DDR2 and DDR3 With a clock frequency of 100 MHz, single-data-rate SDRAM transfers data on every rising edge of the clock pulse, thus achieving an effective 100 MHz data transfer rate. However, both DDR and DDR2 are double switched; that is their computer logic is switched to transfer data on the rising and falling edges of the clock, at points of 0.0 V and 2.5 V (1.8 V for DDR2). As indicated above, this achieves an effective rate of 200 MHz (and a theoretical bandwidth of 1.6 GB/s) with the same clock frequency. DDR operates both the internal logic circuits of the memory chip and its I/O bus (to the Intel Northbridge Memory Control Hub (MCH) and AMD Athlon 64 memory controller) at the same speed. DDR2 memory logic operates at half the speed of I/O clock. For example, the I/O clock is the rate at which DDR2 logic gates open and connect to the Northbridge MCH. The internal logic and external I/O bus of DDR PC-3200 operates at 200 MHz; in contrast the internal logic and memory circuits of DDR2 PC2-3200 operates at 100 MHz , while the I/O interface logic switches at 200 MHz. DDR2's bus frequency is boosted by electrical interface improvements, on-die termination, prefetch buffers and off-chip drivers. However, latency is greatly increased as a trade-off. To compensate the DDR2 prefetch buffer is 4 bits wide, whereas it is just 2 bits wide for DDR, which has a lower latency. The prefetch buffer for DDR3 is 8 bits wide, thus indicating it has greater latency problems. Another mportant feature of DDR2 is that it consumes less power: power savings are achieved primarily due to an improved manufacturing process, resulting in a drop in operating voltage (1.8 V compared to DDR's 2.5 V). Chip Specifications 

DDR2-400: run at 100 MHz, I/O clock at 200 MHz, PC2-3200, 3.200 GB/s bandwidth



DDR2-533: run at 133 MHz, I/O clock at 266 MHz , PC2-5300, 4.267 GB/s bandwidth



DDR2-667: run at 166 MHz, I/O clock at 333 MHz PC2-4200, 5.333 GB/s bandwidth1



DDR2-800: run at 200 MHz, I/O clock at 400 MHz PC2-64006.400 GB/s bandwidth

DDR3 SDRAM (Double Data Rate 3 Synchronous Dynamic Random Access Memory)

Joint Electron Device Engineering Council (JEDEC) is the semiconductor engineering standardization body of the Electronic Industries Alliance (EIA), a trade association that represents all areas of the electronics industry. 1

13

DDR3 SDRAM comes with a promise of a power consumption reduction of 40% compared to current commercial DDR2 modules, due to DDR3's 90nm fabrication technology, allowing for lower operating currents and voltages (1.5 V, compared to DDR2's 1.8 V or DDR's 2.5 V). "Dual-gate" transistors will be used to reduce leakage of current. DDR3's prefetch buffer width is 8 bit, whereas DDR2's is 4 bit, and DDR's is 2 bit. In a prefetch buffer architecture, when a memory access occurs to a row the buffer grabs a set of adjacent datawords on the row and reads them out ("bursts" them) in rapid-fire sequence on the IO pins, without the need for individual column address requests. This assumes the CPU wants adjacent datawords in memory, which in practice is very often the case. For instance, when a 64 bit CPU accesses a 16-bit-wide DRAM chip, it will need 4 adjacent 16 bit datawords to make up the full 64 bits. A 4n prefetch buffer would accomplish this exactly ("n" refers to the IO width of the memory chip; it is multiplied by the burst depth "4" to give the size in bits of the full burst sequence). An 8n prefetch buffer on a 8 bit wide DRAM would also accomplish a 64 bit transfer. The prefetch buffer depth can also be thought of as the ratio between the core memory frequency and the IO frequency. In an 8n prefetch architecture (such as DDR3), the IOs will operate 8 times faster than the memory core (each memory access results in a burst of 8 datawords on the IOs). Thus a 200 MHz memory core is combined with IOs that each operate eight times faster (1600 megabits/second). If the memory has 16 IOs, the total read bandwidth would be 200 MHz x 8 datawords/access x 16 IOs = 25.6 gigabits/second (Gbps), or 3.2 gigabytes/second (GBps). Modules with multiple DRAM chips can provide correspondingly higher bandwidth. Initially, DDR3 modules could theoretically transfer data at the effective clockrate of 400-800 MHz (for a single clock bandwidth of 800-1600 MHz), compared to DDR2's then range of 200-533 MHz (400-1066 MHz) or DDR's range of 100-300 MHz (200-600 MHz). Such bandwidth requirements were been mainly on the graphics market, where vast transfer of information between framebuffers is required. Please note the following comparisons between core memory and I/O clock. New versions of DDR3 run significantly faster. 

DDR3-800 : runs at 100 MHz, I/O clock at 400 MHz



DDR3-1066: runs at 133 MHz, I/O clock at 533 MHz



DDR3-1333: runs at 166 MHz, I/O clock at 667 MHz

Graphics Double Data Rate 3, is a graphics card-specific memory technology, designed by ATI Technologies. Today, DDR3 modules can transfer data at a rate of 800–2133 MT/s using both rising and falling edges of a 400–1066 MHz I/O clock. Sometimes, a vendor may misleadingly advertise the I/O clock rate by labeling the MT/s as MHz. The MT/s is normally twice that of MHz by double sampling, one on the rising clock edge, and the other, on the falling. In comparison, DDR2's current range of data transfer rates is 400–1066 MT/s using a 200–533 MHz I/O clock, and DDR's range is 200–400 MT/s based on a 100– 200 MHz I/O clock. High-performance graphics was an initial driver of such bandwidth requirements, where high bandwidth data transfer between framebuffers is required. GDDR3 memory might sound similar to DDR3 but is more like DDR2: it has been in use for several years in high-end graphic cards such as ones from NVIDIA or ATI

14

Technologies, and as main system memory on the Xbox 360. It is sometimes incorrectly referred to as "DDR3". It has much the same technological base as DDR2, but the power and heat dispersal requirements have been reduced somewhat, allowing for higher-speed memory modules, and simplified cooling systems. Unlike the DDR2 used on graphics cards, GDDR3 is unrelated to the upcoming JEDEC DDR3 specification. This memory uses internal terminators, enabling it to better handle certain graphics demands. To improve bandwidth, GDDR3 memory transfers 4 bits of data per pin in 2 clock cycles. Standard name

Memory Cycle clock time (MHz)

(ns)

I/O bus clock (MHz)

Data rate

Module name

Peak transfer rate

(MT/s)

DDR3-1066E DDR3-1066F DDR3-1066G DDR3-1333F* DDR3-1333G DDR3-1333H DDR3-1333J* DDR3-1600G* DDR3-1600H DDR3-1600J DDR3-1600K DDR3-1866J* DDR3-1866K DDR3-1866L DDR31866M* DDR3-2133K* DDR3-2133L DDR3-2133M DDR3-2133N*

100

133⅓

166⅔

200

233⅓

266⅔

10

7 1⁄2

6

5

4 2⁄7

3 3⁄4

400

533⅓

666⅔

800

933⅓

1066⅔

800

PC36400

1066⅔

PC38500

1333⅓

PC310600

1600

PC312800

1866⅔

2133⅓

PC314900

PC317000

CAS latency

(CL-tRCDtRP)

(ns)

6400

5-5-5 6-6-6

12 1⁄2 15  

8533⅓

6-6-6 7-7-7 8-8-8

11 1⁄4 13 1⁄8 15  

10666⅔

7-7-7 8-8-8 9-9-9 10-10-10

10 1⁄2 12   13 1⁄2 15  

12800

8-8-8 9-9-9 10-10-10 11-11-11

10   11 1⁄4 12 1⁄2 13 3⁄4

14933⅓

10-10-10 11-11-11 12-12-12 13-13-13

10 5⁄7  11 11⁄14 12 6⁄7  13 13⁄14

17066⅔

11-11-11 12-12-12 13-13-13 14-14-14

10 5⁄16 11 1⁄4 12 3⁄16 13 1⁄8

(MB/s)

DDR3-800D DDR3-800E

Timings

* optional CL - Clock cycles between sending a column address to the memory and the beginning of the data in response tRCD - Clock cycles between row activate and reads/writes tRP - Clock cycles between row precharge and activate

15

View more...

Comments

Copyright � 2017 SILO Inc.