Wednesday, February 01, 2006

AMD64 is five generations ahead of INTEL

It's impossible for me to cover this topic in great detail, so I will hit the key points only.



AMD64 Instruction set

In Feburary 2003, on the eve of AMD's launch of the AMD64 family CPUs, INTEL expressed its disblief. According to Richard Wirt, an INTEL senior fellow, four separate design teams at Intel had examined how the company could take one of its 32-bit chips and transform it into a 64-bit machine, all four Intel teams concluded that such a feat was not doable.

INTEL did try hard to do 64 bit on x86, but their engineering didn't know how.

But the grand masters at AMD did what INTEL thought was impossible. Opteron 64 hit the market in April 2003 and quickly won almost all performance benchmarks.

Seeing is believing, INTEL tried to reverse engineer AMD's instruction set onto Pentium IV and Pentium 4 based Xeon. Emulating AMD64 instruction set was easier on Pentium IV, because it had a 36 bit physical address. However, benchmarks show INTEL's EM64T runs slower under 64 bit mode than 32 bit mode. Moreover, INTEL used some old AMD PDF files, and did a bad job, some Microsoft and Linux code developed on AMD64 failed on run on INTEL's clone. As of today, INTEL's EM64T is still missing some crucial capabilities of AMD64.

But running AMD64 instructions on Pentium III proves to be much harder, as of today, INTEL hasn't yet figured out how to do 64 bit on Pentium M and Core Duo.

And AMD is not sitting idle, it's adding a new set of instructions to the AMD64. INTEL engineers will have more sleepless nights digesting AMD PDFs.

True Multi-core

AMD64 architecture was designed to be true multi-core from the ground up. A multi-core CPU is much like a multi processor system, the cores must communicate with each other to maintain consistency. Inside the AMD64 CPU, there is a crossbar switch that connects the multiple cores together, so they communicate internally and at extremely high speed. We see from benchmarks that dual core Opteron is almost twice as fast as a single core Opteron at the same clock speed.

In comparison, INTEL's dual core implementation is a kludge. In INTEL's design, the two cores share the same FSB, when they need to communicate, they first go out to FSB and come back again, without knowing they are sitting next to each other. The result? Poor performance .

This AnandTech article provides good explanation of the dual core designs.

The Embedded Memory Controller

Chip design gurus have long realized that a major bottleneck in system performance is memory latency. Just like memory is much faster than hard disk, the CPU is much faster than memory. When a CPU needs to access memory for instructions or data, it has to wait for the memory content to be retrieved, the time of waiting is the latency. During the waiting period, the CPU can't do anything.

In the old FSB based architecture (all INTEL's), the memory controller is in an external chip called the north bridge, while the CPUs run at 2-3GHZ, the conventional memory controller runs at about 200MHZ. Furthermore, in the old FSB design, the data have to make two hops, from memory to memory controller, then to the CPU. As we can see from this article, memory latency in a Pentium 4 design is between 300 to 400 clock cycles.

In AMD64 design, the memory controller is embedded in the CPU and runs at CPU frequency, the CPU connects directly to the memory without any intermediate. As we can see from this IBM test on single and dual core Opteron, memory latency on the Opteron is only about 50 nano second for local memory access.

Like the Opteron, all modern CPUs, such as Alpha EV7, IBM Power5, SUN UltraSparc T1, AMD Geode LX, Athlon 64, Sempron 64, Turion 64, have embedded memory controller(s).

From INTEL roadmap as far as 2009, we don't see an embedded memory controller design.

Cache Coherent HyperTransport (ccHT)

In a N processor AMD system, since each CPU has its own memory controller and associated banks of memory, there are N memory controllers which provide N times the memory bandwith. To have these N memory controllers act coherently, there are multiple ccHT links between AMD CPUs, which is used for fetching memory from another CPU. As we can see from the IBM document referenced above, in the case of remote memory access, the latency is also quite small.

INTEL is rumored to work on something similar to ccHT called CSI, however, since the cancelation of the Whitefield project, CSI is missing from INTEL's foreseeable roadmap.

Direct Connect Architecture

In FSB based architecture such as INTEL's, the CPU, Memory and I/O share the bandwith of a uni-directional bus, just like many folks share one phone line in a conference call --- only one guy can talk in either direction. In AMD64 architecture (Opteron, Athlon 64, Turion 64, Sempron), there are separate dedicated connections between CPU and Memory, between CPU and I/O, between CPU and CPU, between CPU core and CPU core. In AMD64, there is no crosstalk, and everything is bi-directional--traffic goes both ways the same time.

From INTEL's roadmap, it's stuck with FSB architecture until at least 2009.

Conclusion

INTEL is 5 generations behind AMD, and there are other major areas that INTEL is lacking, such as IOMMU for fast DMA. To match AMD in 2 core performance, INTEL will have to use very large cache size, which will negate its shrink to 65nm. At 4 core and up level, INTEL is simply hopless.

14 Comments:

Anonymous Anonymous said...

Excellent find, sharikou! That old article certainly puts the lie to the claim that Intel had worked out a 64-bit extension to x86 a long time ago and just abandonned it in favour of Itanium, and then dusted it off again after AMD made it popular.

8:38 PM, February 02, 2006  
Anonymous Anonymous said...

Intel has built-in memory controller four years before AMD. Remember the Timna processor? It even has a built-in graphics engine. Intel also the inventor of the PCI-E which is basically the same point-to-point connection. AMD only uses that in their CPUs and call it hyperTransport. If these so called advantage are really that nice, why IBM, HP and SUN are still using the "OLD" technology on their chips? The reason is easy, these technology pushed the CPU design to a dead end. Heard anything new from AMD in recent two years? No, cause AMD will be the one to pursue giga-hertz game since it cannot improve the architecture any further. It's the same top guys made Alpha made AMD64 and it is foreseeable the family will hit the same wall in the coming years.

As to the instruction set, it just a lie. Intel can do a dynamic translation from Itanium to X86 at runtime, you believe they cannot add 64bit to X86? And how many client application are 64bit and do you really think it's necessary?

Anyway, this is a nice page and that's it.

7:05 PM, February 10, 2006  
Anonymous Anonymous said...

"And how many client application are 64bit and do you really think it's necessary?"

Yep. Just ask your local MS SQL or Terminal Server admin.

8:15 PM, February 10, 2006  
Anonymous Anonymous said...

Great read! This article provides good insight for the technology used, and proves what needs to be proved.

6:24 AM, February 11, 2006  
Anonymous Anonymous said...

to the person two posts up,

if intel can do it, why dont they? It's obvious that AMD can perform better and have proved it, take a A64 3500+ with an intel at 3.6ghz, on both 90nm technology, its obvious who will win because of their architecture, even though they have slower clock speeds, they outperform intel anyday, and no i dont beleive they will run into problems, because they came out with true multicore into one cpu, whereas intel has to put two cpu dies into a chip casing and so they opterate via FSB, =not multicore at all. AMD has successfully made multicore a success and will continue to make it a success with their quad core, and eventually octa-core processors.

2:11 PM, February 12, 2006  
Anonymous Anonymous said...

"And how many client application are 64bit and do you really think it's necessary?"

Yep. Just ask your local MS SQL or Terminal Server admin.


Are those CLIENT Applications? How many PC users run SQL server at home or even at office?

5:37 PM, February 17, 2006  
Anonymous Anonymous said...

Are those CLIENT Applications? How many PC users run SQL server at home or even at office?

Not many, but how many run Photoshop? Try this list. http://www.3dvelocity.com/articles/win64compatibility/win64nativesoftlist.htm

And there are more every week.

10:48 AM, February 28, 2006  
Anonymous Anonymous said...

I agree with the article, Intel hasn't been able to touch AMD since the first Northwood cores. I believe tomshardware.com has a great article on Intel’s "dual core" architecture. I don't think Intel cares much as long as it ships processors. I mean for the last 2 years we haven’t heard anything, except for the baking of our motherboards under a 3.8 GHz P4, now Intel is struggling to keep up using architecture based on its laptop processors.

In Intel’s defense their mobile architecture is much better than AMD, the whole Centrino pack is stable and fast; far better than AMD's Turion in my opinion. And if everyone has watched the news, laptops are becoming a dominant market force, so Intel may not be missing out on much as far as the money goes...and that’s what it comes down to anyway. Although, specializing in one market segment could run Intel into the wall...as the American auto industry found out with SUVs.

After Intel’s first quarter results, I think it is clear that the processor wars will heat up again after a year of nothing (my FX55 is still top class) and we can all look forward to tons of cores…I can’t wait to get my first processor with as many cores as my new Gillette fusion, or better yet perhaps some future computer specs: 32GB RAM, 32 Cores, 32 TB…sounds like it will run battlefield or SQL server.

9:32 PM, March 07, 2006  
Anonymous Anonymous said...

wow, you must own AMD stock eh ?

8:58 PM, March 15, 2006  
Anonymous Anonymous said...

Nice article. I definatly have to agree. I've always thought AMD had better performance, all the way back to the AMD 486 100Mhz (out performed the Pentium 75Mhz with a 486 core).

I see lots of comments here about how we haven't heard much new from AMD. Well, we haven't heard much groundbreaking news from Intel either. Intel is now the one playing catchup, kind of like Microsoft. The let their technology stagnate while competitors have inovated. OEM lock-in was their friend, but the OEM's are getting smarter.

Perhaps the notebook is coming into a mature market, but AMD brought it a good portion of the way. They drove prices down with cheaper technology, forcing Intel to compete on price and features. Heh, my AMD Sempron notebook from HP run WoW fairly well. Almost as well as my main system, but that's a memory issue, not CPU.

In the long run, the better technology will eventually win the market, that's a given. I'm just enjoying the $750 laptop, down from a few grand just a few years ago.

4:22 AM, March 16, 2006  
Anonymous Anonymous said...

Kinda late to respond to this, but I have to.

Intel also the inventor of the PCI-E which is basically the same point-to-point connection. AMD only uses that in their CPUs and call it hyperTransport.

No. And absolutely no. PCI-Express and Hypertransport are very different.

PCI-Express was developed by PCI-SIG, and their notations page is located here: http://www.pcisig.com/specifications/pciexpress/


HyperTransport, while created and headed by AMD, is looked after through the HyperTransport Consoturium : http://www.hypertransport.org/

The following quote, granted, is taken from the HyperTransport.org pages. This is because PCI-SIG does not bother to compare or contrast their technology with Hyper Transport

http://www.hypertransport.org/consortium/cons_faqs.cfm#b

As compared to newer serial I/O technologies such as RapidIO and PCI Express, HyperTransport shares some raw bandwidth characteristics, but is significantly different in some key characteristics. HyperTransport was designed to support both CPU-to-CPU communications as well as CPU-to-I/O transfers, thus, it features very low latency. Consequently, it has been incorporated into multiple x86 and MIPS architecture processors as an integrated front-side bus. Serial technologies such as PCI Express and RapidIO require serial-deserializer interfaces and have the burden of extensive overhead in encoding parallel data into serial data, embedding clock information, re-acquiring and decoding the data stream. The parallel technology of HyperTransport needs no serdes and clock encoding overhead making it far more efficient in data transfers

Essentially, the overall point is that HyperTransport is a true Peer-to-Peer topology connection, while Serielized buses such as PCI-Express are exactly that, serielized.


As to the instruction set, it just a lie. Intel can do a dynamic translation from Itanium to X86 at runtime, you believe they cannot add 64bit to X86?

Okay, question for you. Exactly HOW slow is X86 code on Itanium in emulation mode? 50%? 40%? 30%?

According to Wikipedia's Itanium page, initial performance was about 1/8th the speed clock for clock. Granted, Wikipedia's accuracy is generally somewhere around the same expected from a liberal democrat during voting season. That is very little accuracy for those in MIT.

So the answer is yes, Intel could use the emulation. But we already know that this is slow to begin with just using Northwood and Prescott processers as an example.

Now that Intel's roadmap is even clearer at this writing, returning to the true-32 bit design of the Pentium Pro, it is going to be even more difficult to try to convert the architecture into 32 bit. So at this point, I think it's more than a belief, but established fact.

8:43 AM, April 13, 2006  
Anonymous Anonymous said...

As to the instruction set, it just a lie. Intel can do a dynamic translation from Itanium to X86 at runtime, you believe they cannot add 64bit to X86?

I heard that in a later version of Itanium Intel physically put an x86 core inside the chip, and took out about 1/3 of the total chip area...

2:24 PM, May 03, 2006  
Anonymous Anonymous said...

well since this is outdated i might as well let u know that amd is still in the same place doing nothing to further its proccesors while intel just released the core 2 duo exteme that out performs the best AMD by lightspeeds. Intel also just showcased their quad core proccesors. and where's AMD?

3:36 PM, October 10, 2006  
Anonymous Anonymous said...

Hi Sharikou,

You have a very cool blog here…loved the content.
U know there is an awesome opportunity for people like you who have ur own blogs n sites…I came across this site called Myndnet.com…it’s a platform for people to buy and sell IT related information. and everytime you sell some information you get paid for it…Good money for people like us in the IT domain. Here the link http://www.myndnet.com/login.jsp?referral=alpa83&channel=OM
Sign up is free…check it out…
You can contact me at my id here for more questions : barot.alpa@gmail.com

Cheers
Alpa

11:20 PM, March 11, 2007  

Post a Comment

<< Home