Friday, May 12, 2006

AMD next generation architecture will be super powerful

Nebojsa Novakovic is complaining that next generation AMD Deerhound quad-core Opteron will have only 2MB of shared L3.

Think it in another way, AMD won't need huge caches (like Intel's 16MB) because its architecture will be so advanced, memory latency will be so low, it will have no need to waste die area on caches!

Look at another optimized multi-core architecture, the 8 core UltraSparc T1, it has only 3MB shared L2, and the instruction set is RISC (larger code).

9 Comments:

Anonymous Anonymous said...

I think in the same way, but there are 2 opposite signals about that. Z-RAM is the first - AMD took the license for special type of memory to have minimal size cache. If they are planning to have 3MB L3 in quad-core, why the hell they need Z-RAM?
Second is memory outlook for 2007/08. DDR2 and DDR3 have high latencies, FB-DIMM also is not very powerful in this area. Integrated memory controler vs. large caches means Intel win in most cases.

So AMD is going to reveal something to change that. This can be 4-channel 256-bit memory controller to increase paralell transfers to RAM. Shared cache L2 can be helpful (sure for benchmarks). Or something else - maybe this reverse Hypertheading?

9:54 PM, May 12, 2006  
Anonymous Anonymous said...

Z-RAM can be used for decreasing the die size and use those space for something else.

10:04 PM, May 12, 2006  
Anonymous Anonymous said...

There is one thing that AMD can do to be "innovative" against Intel.
It's very simple to do: keep Sempron and Athlon as a single-core.
This is capacity, price and profit effective strategy. If single core will cost 75% of dual, I will choose dual for my work, but for secretary standard Athlon 3000+ is above requirements. For sure.

12:31 AM, May 13, 2006  
Blogger DBA said...

If the high latencies is the problem, I do not understand how 4-channel 256-bit memory controller can help(This only increase the throughput).

2:57 AM, May 13, 2006  
Anonymous Anonymous said...

AMD do their homework running supercomputer simulations on their architecture buiding CPU's for optimal performance in server/multithread applications sizing caches so that the cache thrash which happens when prediction logic gets thing wrong doesn't choke the overall performance as well as keep thermal heat down.
Look for low latency DDR2 on 23rd...
Zram is good for size and low thermals its all about winning realworld perfomance not just benchmarks.!

4:43 AM, May 13, 2006  
Anonymous Anonymous said...

I wonder why it has to be so powerfull. Woodcrest will be a total flop as you've said earlier..

5:10 AM, May 13, 2006  
Anonymous Jeach! said...

The K8L is suppose to be for 2H of 2007 and 'only' be on par with Intel's offerings in desktop and mobile).

The K10 is due in early 2009 and we know nothing about it except that it will have HT3 and rumored to be like IBM's Cell processor (a central Opteron surrounded by many risk-based processors).

AMD will still be in 65nm production in late 2007, while Intel will be at 45nm.

Not that I want to doubt AMD, but their silence is just killing me!

Is Intel catching up (slowly)?

7:07 AM, May 13, 2006  
Anonymous Jeach! said...

Integrated memory controler vs. large caches means Intel win in most cases.

I think that's where AMD is heading!

Just on a cost basis...

How much will it cost Intel to produce a 4-core processor with 16MB of level 3 cache? Their yields will be just awful even at 45nm.

If AMD's 4-core solution is very small then it will have amazing yields on 45nm.

Also Sharikou, the more I'm reading about the K10 and its IBM 'cell-like' design, the more it makes sence that AMD has all that capacity (with Chartered).

If AMD will be producing 4, 6 or 8 tiny 'risk-based' cores attached to a single Opteron core in a single package, they will need huge capacity production.

That also supports the rumors about going 'reverse Hyperthreading', because you can dispatch to each sub-processor and achieve parallel processing.

Thus the need for huge HT bandwidth instead of a 'shared-cache' hack.

Now I really like the sound of that design!

7:18 AM, May 13, 2006  
Anonymous Anonymous said...

To the first poster:

Intel puts extremely large caches on there to supplement for the bottlenecked Front Side Bus, and AMD is using the L3 Cache for not just DDR2 Latencies, but rather ccHTT Latencies.

There is over 200ns delay in a 4P=> system if CPU0 wants RAM from CPU3's bank, and having faster ccHTT, they could possibly reduce that latency to 2/3 or 1/2 of that, which is where this si really going to shine.

As for the Quad-Channel RAM, FB-DIMM's support at max, 6-Channels, and since there is an ODMC where each CPU has its own memory, there poses a problem: Real Estate.

If you have a board with 4-Channels of RAM per CPU, you're looking at 32x DIMM's on a 4-Way Opteron 64 board, and 64x on an 8-Way Board, which brings in price and other concerns, though the performance will be outstanding.

The big advantage will be putting Quad-Channel on Socket AM2 systems, which is very feasible, and will leave Conroe in the dust for Memory Bandwidth.

7:39 AM, May 13, 2006  

Post a Comment

Links to this post:

Create a Link

<< Home