Wednesday, August 23, 2006

K8L does asymmetric operation

We all know that the 4 cores of the K8L can operate at different clockspeeds. This makes asymmetric operation possible. For example, if the load is currently single threaded, K8L can shut down 3 cores and overclock the remaining core by 50% to 4.5GHZ, all done within the same power envelope. There are some thermal issues to solve, but those should be trivial, compared to designing a CPU.

Pentium D can independently clock the two cores, because it's a double die super glued solution. I am not sure if Conroe can do it, considering that L2 is shared.

INQ reported that SuperMicro finally finds the courage and takes the liberty to put its AMD solutions on its home page. Before, you can only find the link from amd.com . The story was that SuperMicro had to design AMD boards in a basement, due to fear of agents.

55 Comments:

Anonymous Anonymous said...

Your own link shows that this is just a slightly more advanced version of Cool and Quiet! There's absolutely nothing there which suggests an auto-overclocking mechanism.

10:27 AM, August 23, 2006  
Anonymous Anonymous said...

Umm, i have 2.4Ghz AMD X2 system at work and E6400 overclocked to 3.4Ghz at home. Compiling same project with VS 2005, Intel machine is faster TWO times. Call me an Intel fanboy (given that 90% of machines at my work are AMD), but K8L has huge performance gap to cover.

10:46 AM, August 23, 2006  
Anonymous enumae said...

Maybe its me, but the article from The Inq. states clock speeds, as taken from HARD... Well the article from HARD does not show any clock speeds...

Again if I missed it let me know, but 4.5GHz doctor...

Lets be realistic, you stated a few days ago about a 25% clock increase going to 65nm, now they suddenly get a 60% clock increase?

Where did they get an extra 35%?

I understand your numbers are pure speculation, but... WTF?

10:47 AM, August 23, 2006  
Blogger "Mad Mod" Mike said...

Though I don't know if 4.5GHz is truly attainable, having a 3.6GHz Core and the rest @ 1GHz will be enough to give AMD the performance crown back if it was K8, but since its K8L, 2.6GHz should be more than enough to beat anything Kentsfield can reach.

11:08 AM, August 23, 2006  
Anonymous Anonymous said...

For example, if the load is currently single threaded, K8L can shut down 3 cores and overclock the remaining core by 50% to 4.5GHZ, all done within the same power envelope.

Is that actually a confirmed feature of K8L or just speculation on your part? It's interesting, but I haven't heard anything to that effect from AMD. They've just said the cores can downclock independantly to save power. I don't think they would encourage overclocking especially since different motherboards may not be able to sustain it.

11:40 AM, August 23, 2006  
Anonymous Edward said...

What's interesting to me is that the 4 cores in K8L share a large L3 cache. So the PLL must be closer to the core than the L3 cache. And if AMD can do this, Intel should also be do the same with Core 2.

11:54 AM, August 23, 2006  
Blogger "Mad Mod" Mike said...

"Is that actually a confirmed feature of K8L or just speculation on your part? It's interesting, but I haven't heard anything to that effect from AMD. They've just said the cores can downclock independantly to save power. I don't think they would encourage overclocking especially since different motherboards may not be able to sustain it."

I would imagine the ability to overclock a single core if the rest are underclocked, and if they do it right, the BIOS should be able to lock the other 3 cores into 1GHz and set overclocking just for a single core. If that can be the case, a single core shoudl easily reach 4.0GHz

12:07 PM, August 23, 2006  
Blogger Sharikou, Ph. D said...

K8L has huge performance gap to cover.

K8L will be 40% faster than Conroe on integer performance/core. On floating point, K8L will be 3x of conroe.

12:24 PM, August 23, 2006  
Anonymous Anonymous said...

"Umm, i have 2.4Ghz AMD X2 system at work and E6400 overclocked to 3.4Ghz at home. Compiling same project with VS 2005, Intel machine is faster TWO times."

Yea, comparing stock to overclocked with such a massive gap in frequency is a REAL fair comparison. I bet you think Anandtech's benchs with overclocked Cornrows in the mix were perfectly fair too eh? Pathetic.

"Call me an Intel fanboy (given that 90% of machines at my work are AMD), but K8L has huge performance gap to cover."

Haha, obviously you haven't been paying any attention to AMD's Developer Conference they had reguarding the K8L.

http://en.wikipedia.org/wiki/AMD_K8L

Take the Conroe and double everything it has on the core reguarding SIMD units, FP units, added L3 shared cache with AMD's bus technology(at HT 3.0 by K8L's release date) and various other improvements and you will have the rough K8L performance envelope. AMD going with a true 4 core design vs. Intel's dualcore processor sandwich is only going to make things uglier for Intel as thier archaic, weak GTL bus cannot handle the traffic of Woodcrest to scale properly.(they better start praying for CSI to bring a miracle, and soon) In other words, I expect K8L to spank Conroe/Woodcrest/Kentsfield/Merom much, much harder than it spanked the K8 family judging by the early specs provided, which isn't saying alot for Intel's brand new processors vs. the 3-year old K8. It will be P4 vs. K8 all over again IMHO.

12:40 PM, August 23, 2006  
Anonymous Anonymous said...

Seems others know what the K8L is bringing to the table, know what is really up with Core 2 and how they will compare.

"http://www.infoworld.com/article/06/08/23/35OPcurve_1.html"

12:51 PM, August 23, 2006  
Anonymous Anonymous said...

Shut down three cores??? What happens to the three memory banks connected to them in the wonderful NUMA architecture?

1:36 PM, August 23, 2006  
Anonymous Anonymous said...


And if AMD can do this, Intel should also be do the same with Core 2.

There is a world of difference between "can do" and "does". Intel can do IMC, but it doesn't. Intel can do NUMA, but it doesn't. AMD can do a good core, but it doesn't!

The bottom line is: Core 2 kicks @$$ as it is, and K8L at this point is vaporware!!

1:40 PM, August 23, 2006  
Anonymous enumae said...

Anonymous said...

"Seems others know what the K8L is bringing to the table, know what is really up with Core 2 and how they will compare."

This guy is so biased its not even funny, I have read 3 or 4 of his articles and they all sway heavily in AMD's favor.

Doing a little searching, it would seem that K8L is going to go up against Penryn, and not Conroe.

Here is a link from the Inq, stating that at 45nm the leakage problems have been solved...[LINK]

2:01 PM, August 23, 2006  
Anonymous Anonymous said...

You guys are all idiots speculating about something with soo much confidence when absolutely NOTHING is for sure about K8L. When conroe was benchmarked everyone was crying about how a 6600 can not beat an FX62. Now since its AMD everyone seems to think the k8l will be twice the speed of conroe based on some peer review wikipedia site...... you guys are pathetic.

2:09 PM, August 23, 2006  
Anonymous Anonymous said...

Sharikou, Ph. D said...
K8L has huge performance gap to cover.

K8L will be 40% faster than Conroe on integer performance/core. On floating point, K8L will be 3x of conroe.


The doctor has been taking waaayyy too much of your own meds. Where do u get this information from pray tell and out of your ass is not a valid reply.

I am guessing the same source that told you that the exploding dell laptops were core duos when the whole world was claiming it was battery you still have to post a retraction about that and admit your stupidity or will you continue with your ignorance.

I dare u to post and reply to this with some reliable info other than the links from the inquirer which has started mocking you as well....

2:13 PM, August 23, 2006  
Anonymous Edward said...

"Shut down three cores??? What happens to the three memory banks connected to them in the wonderful NUMA architecture?"

You get confused. K8L could enable NUMA across socket, but internally its memory access is uniform among all 4 cores. Look at its shared L3 cache and you can realize that.

2:19 PM, August 23, 2006  
Anonymous Edward said...

"Intel can do IMC, but it doesn't. Intel can do NUMA, but it doesn't. AMD can do a good core, but it doesn't!"

First, AMD doesn't DO NUMA per se, it just enables it in its memory controller. There've been 3rd party solutions to enable NUMA for Intel processors (mostly as add-on cards). But as the speed gap between CPU and IO/memory widens, those add-on solutions have less advantage.

Second, the more revealing questions are, why? Why doesn't Intel do IMC, and why doesn't AMD do a good core? Ultimately you'll find out that they don't because, well, given their respective stances and points of view, they can't.

2:25 PM, August 23, 2006  
Anonymous Anonymous said...

enumae said...

"This guy is so biased its not even funny, I have read 3 or 4 of his articles and they all sway heavily in AMD's favor."

Wow. And everything that anyone else reports that swings in Intel's favor is not biased? DUDE BUY A CLUE PLEASE, not everyone who likes AMD is like Sharikou or some one on the verge fanatical about AMD. Even I admit Core 2/newest Xeon is a great chip, but it's not the second coming of Christ man.

The man makes some convincing and logical conclusions in respective to Intel's and AMD's architechtures IMHO. Face it man, the facts are Intel needs to shape up or ship out thier bus technology, period or they will never have a hope of catching up to AMD when K8L arrives on server front!(HT 3.0's specs are already making CSI's planned specs obsolete) It's finally taking a guy like him to have to balls to say it to the rest of the industry that Intel's new server processors ain't all that and a bag of chips, especially in light of it's competition's dated hardware.(socket 940/AM2 Opterons)

"Doing a little searching, it would seem that K8L is going to go up against Penryn, and not Conroe."

With Penryn slapping on yet more X amount of cache, to fix and bandaid it's bus problem (there is a word for this and it's called a "KLUDGE") and getting a process shrink that may or may not increase it's performance?(see 90nm Prescott) Sorry but I don't see anything Conroe or even Penryn at this point is going to have against K8L with it's estimated 40% performance increase, as K8L will double about every SIMD unit, FP unit and amount of total cache it has, as well as many other architechtual and bus improvements vs. it's predecessor, the K8. It's performance jump will be far bigger with all it's improvements in light of Conroe's improvements and performance vs. the K8.

"Here is a link from the Inq, stating that at 45nm the leakage problems have been solved...[LINK]"

Yep Intel will need 45nm to stay competitive with AMD's K8L at 65nm, no surprise here as past history has shown.(Intel 90nm vs. AMD 130nm) It's the only hope Intel has IMHO to stay competitive and why they are rushing thier 45nm process to try to stay ahead of AMD.

2:49 PM, August 23, 2006  
Anonymous Anonymous said...

Some delusional fool wrote...

"You guys are all idiots speculating about something with soo much confidence when absolutely NOTHING is for sure about K8L."

Uhhhh, HELLO DUMBASS! K8L is TAPED OUT, that means it's READY FOR PRODUCTION. Go google it yourself, it's mentioned on this blog already for Pete's sake!

"When conroe was benchmarked everyone was crying about how a 6600 can not beat an FX62. Now since its AMD everyone seems to think the k8l will be twice the speed of conroe based on some peer review wikipedia site......"

OH LOOK! A COW! *rolleyes*

Yea, it's some dumb peer review wikipedia site with data about the K8L that IS BACKED UP BY THE HARD DATA FROM THE RECENT AMD DEVELOPER CONFERENCE ABOUT THE K8L. Look at the picture about the K8L die shot Capt. Oblivious, it's straight from AMD's slides and presentation at that conference. The numbers and data are all there on various sites across the web.

Yet another reference that is linked on this blog some posts down on Sharikou's main blog page.

"you guys are pathetic."

No, the pathetic individual is you, try researching a little harder about something you're clueless about before mommy catches you on the computer after your bedtime.

3:00 PM, August 23, 2006  
Anonymous Anonymous said...

For example, if the load is currently single threaded, K8L can shut down 3 cores and overclock the remaining core by 50% to 4.5GHZ, all done within the same power envelope. There are some thermal issues to solve, but those should be trivial, compared to designing a CPU.

does the word hot spot mean something to you? i predict lots of amd cpu explosions!

3:52 PM, August 23, 2006  
Blogger Sharikou, Ph. D said...

does the word hot spot mean something to you?

This is already solved.

4:01 PM, August 23, 2006  
Anonymous Anonymous said...

--Taped out does NOT mean "ready for production", moron. In fact, it isn't unheard of for a team to tape out A0, and then have it come back DOA. There is a lot of validation and debug that needs to go on before the thing will be anywhere near ready for production.

4:27 PM, August 23, 2006  
Anonymous enumae said...

Anonymous said...

"It's the only hope Intel has IMHO to stay competitive and why they are rushing thier 45nm process to try to stay ahead of AMD."

How are they rushing, thay are still on there 2 year cycle?

Its not like they are the same as AMD, they are using new FABS, they are not swapping out equipment like AMD, they are giving there chipsets and other technologies the ability to be produced at 65nm.

Who cares how they maintain a performance advantage?

Me as a consumer could care less, if its better its better, same goes for K8L.

How much money will they save in production cost using 45nm vs 65nm?

What kind of power savings will there be at 45nm?



Sharikou, Ph. D said...

"This is already solved."

Please elaborate.

5:38 PM, August 23, 2006  
Anonymous Anonymous said...

"Uhhhh, HELLO DUMBASS! K8L is TAPED OUT, that means it's READY FOR PRODUCTION. Go google it yourself, it's mentioned on this blog already for Pete's sake!"

I would reply to that but sumone already owned you must feel nice to be the dumb ass!!

OH LOOK! A COW! *rolleyes*

If it was that obvious then why was everyone going nuts about it including our beloved doctor. Wasnt he planing on making his own woodcrest and running benchmarks i wonder whatever happened to that plan.......

"Yea, it's some dumb peer review wikipedia site with data about the K8L that IS BACKED UP BY THE HARD DATA FROM THE RECENT AMD DEVELOPER CONFERENCE ABOUT THE K8L. Look at the picture about the K8L die shot Capt. Oblivious, it's straight from AMD's slides and presentation at that conference. The numbers and data are all there on various sites across the web."

Oh yes please show me where it says that the k8l will be beating conroe by 40% and 3x on floating point as the doctor suggests. I am guessing those slideshows and pictures look plenty fast for you amd idiots. Look there its a pretty slide AND ITS GOT PICTURES MUST BE 10 times faster than conroe. I am guessing you still get amused with picture books.

Please refer me to the site which says and benchmarked the k8l to do DOUBLE of what conroe will do. You are an idiot for telling me to look at slides and pictures when the doctor is pulling numbers out of his ass which i challenge.

"No, the pathetic individual is you, try researching a little harder about something you're clueless about before mommy catches you on the computer after your bedtime."

Oh yes let me research more about the TAPPED OUT K8l whats next you going to tell me you have one now and are benchmarking it since its tapped out you imbecile. THE FACT IS THERE ARE NO NUMBERS OFFICIAL OR OTHERWISE THAT THE K8L WILL OUTPERFORM CONROE. Infact when is K8L coming for the desktop consumer anyways..... NOT ANYTIME SOON.

GEt comfy with daddy intel amd fan boy its going to be a long year.

6:05 PM, August 23, 2006  
Anonymous Anonymous said...

WOW this technology is freaking awesome kinda like the reverse hyperthreading i have on my new AM2, THANK YOU AMD for the BIOS upgrade and thank you MadModMike and Sharikou for telling me its coming otherwise i would have bought a conroe.

I love coming to this blog its like daily comedy hour with people believing everything that is said. Inquirer reports that if you fart at your amd AM2 you will get a 60% speed boost take that conroe!!!

6:09 PM, August 23, 2006  
Anonymous Anonymous said...

"Stupid ass AMD FANBOI says:
Yea, it's some dumb peer review wikipedia site with data about the K8L that IS BACKED UP BY THE HARD DATA FROM THE RECENT AMD DEVELOPER CONFERENCE ABOUT THE K8L. Look at the picture about the K8L die shot Capt. Oblivious, it's straight from AMD's slides and presentation at that conference. The numbers and data are all there on various sites across the web."

Same fan boy who wouldnt believe the conroe benchmarks when it was being performed infront of a live crowd with an actual conroe chip at IDF is willing to believe slides and pictures from AMD....BIAS perhaps...stupidity definately.....Pathetic FOR SURE!!!!

6:15 PM, August 23, 2006  
Anonymous Anonymous said...

The PhD Pretender tries again but he has no clock ticking in his head.

Lets see AMD's secret to success is a 4x4 core with shared L3 on 65nm.
It has to minimally make up > 40% peformance gap as it is competing against a 4x4 Conroe/Woodcrest. Even handicapping Woody's lame FSB compared to hyper figure the 4x4 K8L better offer better then 50% performance improvement. Also remember flies and shitter that the K8L really needs to hold the line in 2nd half 2007 and 2008 against INTELs 45nm tock. Expect that to get 30-40%. Figure 10-15% minimum for 45nm transition. I expect big things from INTEL at this process node. They have been very quiet. Add to that design enhancments and you are looking at 30% maybe even more!.

Notice there are no performance numbers for the K8L, why, because the damm thing just taped out. Figure 6 weeks before silicon, another 2-3 weeks to get testing and some benchmarks. Then figure another design/fab turn 8 weeks before you have any semblence of peformance. It'll be early 2007 before AMD has any data to show a benchmark, what a joke....

4x4, what a crock, what a broken strategy, have twice the silicon area as a dual-core but run most of it at slow clock speed. Tell me how this is faster vs running two cores at higher utilizaion and clock speed. What a huge waste of silicon. In the end I am skeptical that you really save power as you have the other cores standby leakage and get low clock speed output for all the leakage. WHile if you run 2 cores at higher load you don't pay anyleakage penalty.

By the way flies.. running one or two core hot your power density and your TJ will restrict your average headroom. Sure average power looks good, but high power density on the fast core will limit you.....


Then later pretender says Hotspot already solved... He knows nothing... how was it solved pretender? Does the PHD pretender even know what a hot spot is? The only hot spot he knows is the burn in his ass has Hector does him.

6:19 PM, August 23, 2006  
Anonymous Anonymous said...


Second, the more revealing questions are, why? Why doesn't Intel do IMC, and why doesn't AMD do a good core? Ultimately you'll find out that they don't because, well, given their respective stances and points of view, they can't.

Not entirely true. Intel will have IMC and NUMA (external, not internal) in Nehalem (well, at least on MP servers). So it's not a problem of belief, it's a problem of decision-making. Few years ago, when Conroe was being laid out (that would be approx 2002), Intel probably thought that they did not need IMC and NUMA, since there were no competitors in x86 server space. Guess what, AMD came into that space, somewhat unexpetedly, and caught Intel on the wrong foot. BTW Intel realized its blunder, it was 2003-2004, and the only processor they could intercept with these features was Nehalem. In this space timing is everything.

The reason K8L does not have 4-wide issue is because AMD never expected Intel to ditch Netburst and take IPC leadership. I know a bunch of folks here love K8L, but in my opinion, except for some memory-hungry benchmarks, it is again going to be inferior to Core 2, clock-for-clock, and overall. We can wait and see how this plays out. But Core 2 is a superb micro-architecture (AMDers like to call it P3, but heck, if P3 kicks K8's @$$, that tells something about K8--and BTW, it is not P3).

You can bet your @$$ that AMD's next muArch (one after K8L) will be better than Core 2. But by then, Intel will have yet another rev of their uArch, and no one knows how those two will compare. The design turn-around times are so long, it is very easy to lose leadership for a couple of years if you guess your opponent's move incorrectly.

That being said, I think Intel overall has advantage because of their process leadership. Missed boat on IMC? Doesn't matter, add cache and get data closer. Also, the large cache allows them to be aggressive with prefetching without worrying about cache thrashing. And the strategy clearly wins considering UMA Tulsa is kicking NUMA Opty's butt! Leadership in process technology enables Intel to keep multiple weapons in its arsenal.

7:22 PM, August 23, 2006  
Anonymous Anonymous said...


Face it man, the facts are Intel needs to shape up or ship out thier bus technology, period or they will never have a hope of catching up to AMD when K8L arrives on server front!(HT 3.0's specs are already making CSI's planned specs obsolete) It's finally taking a guy like him to have to balls to say it to the rest of the industry that Intel's new server processors ain't all that and a bag of chips, especially in light of it's competition's dated hardware.(socket 940/AM2 Opterons)


OK, tell me this: how can you explain Tulsa's leadership over Opteron? CLUE: Larger cache. Most of the AMDers here like to dismiss the advantage of a large cache saying if the work load doesn't "fit" in cache, the cache is useless. Get a life! No workload today fits in any cache, heck it doesn't even fit in memory. The bottom line is if the large cache increases your hit rate from 90% to 97%, it reduces your bandwidth requirements by a third. Add to that reduction in snoop traffic because of cache sharing, and you get the picture. Even a measly 1066 shared FSB can kick Opty's butt with 16 MB of L3 cache.

Intel's leadership in process technology allows them to come up with such solutions. The can afford to put more transistors in cache to reduce the bus traffic (and anyone who says large cache is a kludge doesn't know basics of computer architecture, so please, don't even give me that BS. Large cache helps greatly on any benchmark--period). Now that is not the most economical way of using the transistor budget: if Intel had CSI today, they could reduce the cache, reducing the die size, increasing margins. But the bottom line is, they can build these large caches, and still turn out profit. AMD would go bankrupt if they tried building 16 MB or 24 MB on-die cache. Thus, if CSI is even half as good as HT 3.0, and if Intel just keeps the cache size the same, the resulting processor will kick K8L's butt all the way to the moon.

If anyone does not understand why large caches help performance, I would recommend read some sophomore-year book on Computer Architecture, and solving some exercise problems at the end of a chapter called "Memory Hierarchy". Save your bad breath. It's not worth it!!

7:36 PM, August 23, 2006  
Anonymous Anonymous said...

You get confused. K8L could enable NUMA across socket, but internally its memory access is uniform among all 4 cores. Look at its shared L3 cache and you can realize that.

Good point! (At least I acknowledge when I am wrong)

7:38 PM, August 23, 2006  
Anonymous Anonymous said...

I admit, I come here to read what foolish notion you'll put forward next. Half the time you make up numbers in order to sound authoritative, and in others you posit wildly invalid arguments and make them out as fact. Check out Xbitlabs analysis for an interesting read. on the anticipated K8L properties. You're so out of your skull pointing fingers at people like Anand (who does seem to have an Intel bias, but that's something else) and yet don't expect us to wonder why you're so dead set on only AMD is good for the world? and this is coming from an AMD fan! I haven't owned an intel pc in almost 8 years!
By the way, Enumae... what does the leakage have to do with which product K8L will line up against? just curious, as it only seems to really indicate how good pennryn should do in the mobile arena.

7:55 PM, August 23, 2006  
Anonymous enumae said...

Anonymous said...

"Enumae... what does the leakage have to do with which product K8L will line up against? just curious, as it only seems to really indicate how good pennryn should do in the mobile arena."

I will admit that while looking all of the stuff up on future roadmaps, I got a little dizzy, but what I read, and please don't quote me on this, is at 45nm, the leakage problem, having been solved, would reduce power consumption, and allow for higher clocks due to the lower TDP.

I will look for the link tonight, check back tomorrow. Again please don't quote me, let me find the link.

9:37 PM, August 23, 2006  
Anonymous Edward said...

"OK, tell me this: how can you explain Tulsa's leadership over Opteron? CLUE: Larger cache."

Sorry I really don't know where that leadership is based on.

"The bottom line is if the large cache increases your hit rate from 90% to 97%, it reduces your bandwidth requirements by a third."

Bandwidth is not what cache is trying to improve, latency is. If hit rate increases from 90% to 97%, effective memory access latency could go down as much as 60%, roughly estimated.

"Add to that reduction in snoop traffic because of cache sharing, and you get the picture. Even a measly 1066 shared FSB can kick Opty's butt with 16 MB of L3 cache."

For multiple-socket processors, larger cache only makes snooping harder, not easier. Tulsa's L3 cache is shared by two cores of the same processor, but snooping happens across multiple processors.

"Intel's leadership in process technology allows them to come up with such solutions. The can afford to put more transistors in cache to reduce the bus traffic"

Yes, Intel has leadership in process technology. If my information was correct, Intel has different processes for different purposes, e.g., low-power for notebook and high-performance for server, etc. Intel does not need to resort to SOI to combat leakage power, and their 65nm chips on non-SOI wafers have very respectable TDP (it's typical usage, I know).

AMD's APM is good at process management, tuning, and upgrading, but IMO it's different from the process technology itself. APM is great, but IMO in terms of process technology, Intel has the better ones.

"(and anyone who says large cache is a kludge doesn't know basics of computer architecture, so please, don't even give me that BS. Large cache helps greatly on any benchmark--period)."

Die area budget is not the only thing that limits the size of a cache. Because larger cache will always be slower to access, having larger cache could sometimes hurt real-world performnce. If a 2x larger cache that improves hit rate by 1% but slows down access time by 25%, will performance improve? (Remember "speed-up the common case"?)

Note that hit rate improvement with large cache is generally much higher for benchmarks than for real-world applications. Thus a super large cache usually looks better for benchmarks than for real-world applications.

Note also that a larger cache is not the same as an additional layer of cache. For example, changing a 1MB L2 cache to slower 4MB could hurt real-world performance, but adding a slower 4MB L3 cache in addition to the 1MB L2 cache will almost always improve performance for all types of applications.

"if Intel had CSI today, they could reduce the cache, reducing the die size, increasing margins. But the bottom line is, they can build these large caches, and still turn out profit. ... Thus, if CSI is even half as good as HT 3.0, and if Intel just keeps the cache size the same, the resulting processor will kick K8L's butt all the way to the moon."

The core of Core 2 is a good one, but my opinion is that CSI (or HT) is not applicable to its microarchitecture. I have no detail of Core 2's design so this is just my speculation.

One thing is evident, though, that K8 was designed from the beginning to work with HT and NUMA, to have great scalability (i.e. to be a great server chip). Core 2 Duo, OTOH, was a well-improved version of p6. You can see, partially, where my speculation came from, then.

10:00 PM, August 23, 2006  
Anonymous Edward said...

Sorry Sharikou, the format of my comment at 9:47pm was messed up. Please feel free to delete/ignore it if you could. Thanks.

10:11 PM, August 23, 2006  
Anonymous enumae said...

Hey Edward, what happened to your formating...just kidding :)




I said earlier...

"I will look for the link tonight, check back tomorrow. Again please don't quote me, let me find the link."

I found one statement which was similar but not the one I was looking for, and also found some more confusion.

The more I looked, the more I get your point about mobile, I had read from a few about Penryn being a shrink of Conroe, but in some articles its really in refernece to Merom, so no, it would seem that Penryn in not a contender to K8L.

My bad guys...

10:11 PM, August 23, 2006  
Anonymous Anonymous said...

Uhhhh, HELLO DUMBASS! K8L is TAPED OUT, that means it's READY FOR PRODUCTION. Go google it yourself, it's mentioned on this blog already for Pete's sake!


At the risk of being redundant, or piling on Uhhhh, HELLO DUMBASS! but have you ever heard of ANYONE shipping A0 silicon? In 10 years in the business, at more than one company, and probably >100 A0 tapeouts, exactly ONCE I have seen A0 silicon shippable, and that was with an extra 6 months of pre-tapeout validation on a shrink of a product that had been around for years and was well modeled AND had lots of redundancy on-board for micro-code patching of errors (it was a server part). Better go out and buy that Lotto ticket Mr. "I read it on the internet"...

10:44 PM, August 23, 2006  
Blogger pointer said...

AMD going with a true 4 core design vs. Intel's dualcore processor sandwich ...

AMD has no choice but has to go native because if it combine the 2 dual core, it wil be ended up as a NUMA internally, which is bad to be used in mobile, desktop or even as a server MP node.

my detail explantion here:
http://computing-intensive.blogspot.com/2006/08/untold-reason-why-amd-will-have-to-go.html

12:16 AM, August 24, 2006  
Anonymous Anonymous said...

Great news!! Bravo!! Your news interpretation skills have clearly reached a new low. Anyone that can come up with the total BS of a core independently overclocking one core to some laughable number like 4.5 ghz, when AMD has yet to release a 3ghz chip, from an article that talks about an enhanced version of cool and quiet independent to each core is truly stuff of genius.

Only an AMD fanboi can come up with such delusional crap and only the truly brain dead cough...mad mod mike... can believe it. So you kiss hector's ass and mad mod kisses urs.

I like that farting on your AM2 for a 60% increase idea its as good as any crap you report on this blog.

12:19 AM, August 24, 2006  
Anonymous Anonymous said...

To understand the lowdown of what K8L can do. You have to know how to read the core arcs.

Look its as simple as this. The ALU is 1.4th more then it is in the K8's. The FPU is 2x larger then the K8. It has more out of order buffers. And a 4 decoder. This will allow the ALU to be 25% faster on top of that.

The specs AMD provides with dual 128-bit SSE and 256-bit L1 L2 and L3 with 128-bit wide pipes for the cpu. The specs say this will be 50% faster then K8 and 40% faster then Conroe. If you can understand the 5 SOI 65nm processes it goes on then you will understand why it will be this much faster and also beable to do more cycles per clock as well as clock higher then before. The gates are K-type, the stages are deeper. It has 4x smaller L1 L2 and L3, only possible with zram btw.

The desine is conroe like as well. Notice some of its features. But its far more advanced.

Taped out means the silicon is ready for ES since nov 2005. They will be ready for production on the 1H of 2007. This is all from a arc perspective, the numbers don't lie. The arc simply was desined to be 50% faster then K8's. Maybe one reason also why its called K8/50.

Expect exactly what you hear from others that know what they are talking about. Its simply not wishful thinking. Look at the damn die shots and tell me I'm wrong if your smart enough to understand what every part on the die does. If you can't then so what go ahead and say conroe is better. If you don't understand what or how K8L can do the things it was made to do then you can't say anything about it.

Its simply progress. Conroe will not last forever. K8L isn't even desined to go agenst conroe but DUO 3. Conroe is aready out performed by k8l and POWER6. So don't get me started on specs. Conroe simply does not have the arc capability. As simple as that. Lucky it can actouly beat K8's Conore is only conpeditive agenst a arc thats 3 years old btw. Understand K8's never changed since 2003.

Conroe is beating a last gen cpu from 2003 hello. What do you thinks going to happen when AMD comes out with K8L. Conroe can not compare to K8L because its really K9. A totally new core just like conroe. Sure conroe may have been nice 3 years ago but it won't cut it until DUO3 comes out agenst K8L, but then K10 comes out shortly after that.

Look it will be like this forever. A comes out with better product, B comes out with better product to beat A. And so forth between 6 to 12 months. Thats just how its going to be. Intel now, AMD in 6 months, Intel 6 months, AMD 6 months later.

Face the music.

4:24 AM, August 24, 2006  
Anonymous Anonymous said...

And btw most of the specs are aready out for the K8L. Do the math intel kids if you can. Specs don't lie when its on the damn silicon aready physically made as real as day.

Unless your blind to the facts or just don't understand what the specs mean. Btw the specs also allow conroe like cycles per clock so they will nearly match mhz for mhz with eachother. But K8L's FPU and ALU will really but it out of its ball park. Blow the blast doors off that CONroe baby.

4:31 AM, August 24, 2006  
Anonymous Anonymous said...

Thats how K8L suddenly got 40% faster then conroe. *Rolls eyes* Its called progress ppl. K8L is not even K8. Its K9. A totally different arc.

4:33 AM, August 24, 2006  
Anonymous Anonymous said...

Interesting thought Sharikou, but I haven't seen anything official like this reported anywhere, and it doesn't quite pass the sanity check.

So you're saying that when 3 cores are powered down, 1 core can overclock by 50% without any power dissipation issues? Thermal power does not have a linear relationship with frequency, and it's VERY unlikely you could dissipate the extra heat generated over 1/4 of the die space if you overclocked by so much.

Also, for all you doing K8L vs. CWM comparisons, keep in mind K8L is shooting at a moving target. K8L is still 8+ months away. With Intel's 4 core Kentsfield solution expected to arrive in 3-4 months, even that comparison may be a little pre-mature. Wait for benchmarks and real tests before drawing your conclusions (and I don't mean blogger benchmarks... credible benchmarks!).

5:38 AM, August 24, 2006  
Anonymous Anonymous said...

"--Taped out does NOT mean "ready for production", moron. In fact, it isn't unheard of for a team to tape out A0, and then have it come back DOA. There is a lot of validation and debug that needs to go on before the thing will be anywhere near ready for production."


http://www.answers.com/topic/tape-out

taped out

Refers to the completion of the design of a chip. The next stage is to put it into production. The term comes from the early days when designs were transferred to the fabricator via magnetic tape.


GEEE! That sounds like K8L is ready for production to me!

Who's the dumbass now? Spare me you Intel fan boy's drivel and pull your head out of your ass when you try to use that excuse you call for a brain next time you try to give a rebuttal with some actual thought put into it. Thanks for playing the fool, please come and humliate yourself again!

7:40 AM, August 24, 2006  
Anonymous Anonymous said...

Some idiot koolaid-drinking Intel mouthbreather said...

"Same fan boy who wouldnt believe the conroe benchmarks when it was being performed infront of a live crowd with an actual conroe chip at IDF is willing to believe slides and pictures from AMD....BIAS perhaps...stupidity definately.....Pathetic FOR SURE!!!!"

Yes, keep drinking the coolaid that Intel and co. tells you to and ignore the man behind the curtain. Please. If you believe everything you see and hear I have some beautiful beachfront property to sell you in Arizona. Pathetic is those who will believe anything and everything immediately without question. Yes, I'm still waiting to see if AMD will deliver on thier claims but so far, they have a far better track record than Intel at making good on thier promises.

I'll say this again, Conroe ain't all that and a bag of chips. Yea, it's got the crown at a mere 10% advantage against a 3-year old chip. Whoopty doo. Enjoy your wunder chips intel fanboys, they are going to be put into near obsolete-ness in less than a year.

7:48 AM, August 24, 2006  
Anonymous Anonymous said...

Yet another inbecile, mouthbreathing Intel fanboy wrote...

"I would reply to that but sumone already owned you must feel nice to be the dumb ass!!"

OH NOES HE'S ALREADY WRONG! I GUESS THAT MAKES YOU BOTH A DUMBASS!

Here's a tip:

1. Go to a pet store.
2. Buy a puppy.
3. Name him Clue.
4. Now you'll have one!

"If it was that obvious then why was everyone going nuts about it including our beloved doctor. Wasnt he planing on making his own woodcrest and running benchmarks i wonder whatever happened to that plan......."

Because the good doctor already knows what most people who actually have researched into the matter know already. At 2P Woodcrest at best maintains parity with Opteron, at 4P and up, it's no contest, Woodcrest is late out of the gate. No need to rehash the obvious work of other resourceful individuals.

"Oh yes please show me where it says that the k8l will be beating conroe by 40% and 3x on floating point as the doctor suggests. I am guessing those slideshows and pictures look plenty fast for you amd idiots. Look there its a pretty slide AND ITS GOT PICTURES MUST BE 10 times faster than conroe. I am guessing you still get amused with picture books."

It's apparent to me that you wouldn't know the truth if it fell out of the sky, landed on top of you and started flopping madly.

Here's the facts genius:

As I stated before, take every new little improvement that Conroe has over the K8 and DOUBLE it, add even further improvements such as RAS, Pacifica virtualization, HT 3.0 and you effectively have the K8L. This information is there in print and explained on the diagram of the PRETTY PICTURE OF THE DIE SHOT OF THE K8L for even simple minded people like yourself to comprehend.

"Please refer me to the site which says and benchmarked the k8l to do DOUBLE of what conroe will do. You are an idiot for telling me to look at slides and pictures when the doctor is pulling numbers out of his ass which i challenge."

It must really suck to be you man. It doesn't take a genius much less a person with half a brain to figure out that AMD's successor to the K8 is going to double it's units on chip with other architectual improvements with the K8L and get such a hypothetical performance number.(Sharikou estimates 40%) Like I said about Conroe that I will say about the K8L, if a company doesn't have a product that surpasses it's previous generation or competition's current offerings in performance, that company is headed for problems.

For the record, quit putting words in my mouth. I SAID IT WILL DOUBLE ALL OF THE UNITS, (SIMD, FP AND PIPE WIDTHS ON DIE) NOT IT'S PERFORMANCE. YOU GOT THAT YET?

"Oh yes let me research more about the TAPPED OUT K8l whats next you going to tell me you have one now and are benchmarking it since its tapped out you imbecile. THE FACT IS THERE ARE NO NUMBERS OFFICIAL OR OTHERWISE THAT THE K8L WILL OUTPERFORM CONROE. Infact when is K8L coming for the desktop consumer anyways..... NOT ANYTIME SOON."

GEEEE! That's not any different from what you Intel fanboys were doing half a year ago when Conroe was being pimped out by Intel like a cheap 10 dollar whore! Imagine that! It's okay to tout figures and facts of an upcoming processor when Intel does it but it's NOT okay when AMD does that. Wow, the double standards are blinding here.

Again, GET A CLUE man, it really sucks to be you.

"GEt comfy with daddy intel amd fan boy its going to be a long year."

Haha, in a pig's eye. I could care less honestly as I already have my system setup and I'm not updating for at the least another year or two. Yep, it will be a long year(and even longer one next year) as it's going to be fun and interesting watching Intel squirm finacially even with thier wunderchip out.

8:14 AM, August 24, 2006  
Anonymous Anonymous said...

Some shortsighted moron said...

"OK, tell me this: how can you explain Tulsa's leadership over Opteron? CLUE: Larger cache..."

I will refer you to a quote by HardOCP forum member Visaris in a thread in the AMD forums that sums up some of my thoughts and viewpoint about this:

Quote:
"A large cache hides a multitude of sins"

Said by Intel's own Jeffrey Gilbert in a speech at the Hot Chips conference.

Everyone here knows that Intel's massive caches are hacks added to make up for other problems. Cache does have it's place and is an important part of any architecture; however Intel crosses the line and Intel's own employees are more than happy to admit it. I don't know why so many of you Intel fans are unwilling to admit that.


Another angle as well...

Put simply, more cache can add performance but at a cost of the die's real estate. With cache being the most likely point of failure in a manufactering process, Intel relying on increasingly massive amounts of cache to fix it's problems is a recipe for trouble. Even with all thier capacity and 65nm/45nm processes to try to help that situation, it will not alievate the low yields of having such massive amounts of cache on die. Why do you think AMD has eliminted all of thier 1 meg cache midrange models in thier lineup??? To get even higher yields and cheaper manufacturing costs in the price war Intel has intiated! Those who have the cheapest costing processor to produce and most amount of useable silicon at the right price point out the door wins in a price war, period.

9:40 AM, August 24, 2006  
Anonymous enumae said...

Anonymous said...

"Yea, it's got the crown at a mere 10% advantage against a 3-year old chip."

Well, like it or not that is still a very large improvement over P4, and your looking at this out of context, the 10% is only in gaming.

If I recall correctly it was 20-30% in the other benchmarks.

3 years old, yes.

Without improvements, no.

It is widely accepted that the GHz race was a watse of Intels time, but things have changed so it will be interesting to see what happens next.

Anybody read the article about K8L on theinq today?

[Link]

10:29 AM, August 24, 2006  
Anonymous Anonymous said...

--haha you fool. Taped out does NOT mean the design is finished. Since you don't know anything about chip design and manufacturing, I'll go through some of it for you. (note I'm simplifying A LOT):

1) chip is designed.
2) chip is validated in pre-silicon simulation.
3) backend design is completed (i.e.-layout, timing etc).
4) chip rev A0 is taped out. you are here.
5) rev A0 masks are created
6) A0 is manufactured.
7) A0 samples are returned to the debug team(s)
8) debug/val teams go through testing and validation.
9) issues found during testing are fed back into the design process.
10) chip rev A1 or B0 is taped out.
11) etc etc etc.

Don't pretend you know what you are talking about. Just because you read it on the internet does not make it true.

10:40 AM, August 24, 2006  
Anonymous Anonymous said...

Wow I was shocked too how limited CONroe was in 64bit aps. It loses 1/4th of its benchmarking power in 64-bit mode and has more power in 32-bit mode.

Benches don't reflect real world power but wow, its enough to really notice when 64-bit is the future. CONroe doesn't have real 64-bit power. Just try sandra in 64-bit mode and go up agenst true benches agenst 64-bit mode Conroes and they arn't kidding.

I tryed many benches on both systems and AMD's perform enough to match conroes clock for clock. If not beat them in all benches when in 64-bit mode. Just wow! What a find so it IS true.

11:22 AM, August 24, 2006  
Anonymous Anonymous said...

Some delusional fool spewed...

"--haha you fool. Taped out does NOT mean the design is finished. Since you don't know anything about chip design and manufacturing, I'll go through some of it for you. (note I'm simplifying A LOT):"

LETS SEE...

Either taped out means it's READY FOR PRODUCTION, or it doesn't and you're living in some delusional universe of yours. I don't think this can be made or stated by AMD any simpler man, IF THEY SAID IT'S TAPED OUT, THAT MEANS THEY ARE FIXING TO START PRODUCTION OF K8L. Which means they will be stockpiling chips from now until it's release, that's a pretty damn good reason why it's taped out NOW if they are releasing it in the 1H of 07'.

This isn't Intel we're talking about man, AMD does not play by thier book when it comes to chip production or design OR YOUR PRECONCIEVED NOTIONS OF HOW THEY PRODUCE A CHIP. AMD likes to do things right THE FIRST TIME when it comes to manufacturing as they can't afford any screw ups against Intel's vastly more numerous fabs.(go research about AMD's APM)

You can believe whatever you want or hold your ears and scream lalalala all day for all I care man. You live in your little dream believing whatever you want and I'll be going by what AMD has to say about the matter SINCE IT'S THIER CHIP, THEIR FABS, THIER DESIGN TEAM and I'm going to safely assume judging by your response here that...

YOU DON'T WORK FOR AMD.

And your guesses on how they operate in-house are about as good as anyone elses, in other words, not worth a damn.

In short:

AMD's word > Your word

I'm sure I'm not the only one here who feels the same way either.

"Don't pretend you know what you are talking about. Just because you read it on the internet does not make it true."

I don't pretend to know anything, I don't like to pretend I know everything and I don't assume anything. I spend alot of time out here on the net reading about the computer industry as a whole and in my findings pertaining to AMD's reps/press releases in various posts/site on the net. They all say the same thing: It's "taped out" which from what I've researched it means K8L is ready for production, END OF STORY.

THE.

END.


So, thanks for being CLUELESS! Please feel free to come make yourself look like a pompus-know-it all ass again anytime!

2:20 PM, August 24, 2006  
Anonymous Edward said...

Some comments on tape out...

From my personal experience, tape-out is usually the last step the chip designers are the primary workers/contributors. A month or two later the first production samples are back and a few guys are picked to participate the validation.

How far is tape-out from final product? It depends. It is determined mainly by how good the design team is. Good designers make a fault-tolerant design with adequate simulation and formal verification.

So if a design is good, then tape out is very close to revenue production. If a team screws up somewhere with the design, every iteration brings in another quarter of delay or so. If K8L was taped out in July and everything goes well, I won't be surprised that AMD starts to mass product it by the end of this year. AMD's own mid-2007 timeline might be a conservative one.

3:01 PM, August 24, 2006  
Anonymous Anonymous said...

--How nice. A clueless blog poster continues to claim that tapeout=production ready. Let me ask you this: Of all the AMD processor parts you have ever purchased, how many were rev A0? (hint: zero)

As far as you claiming "AMDs word > your word", AMD never claimed they are in production. They claimed they have taped out. I'm trying to explain to you that the extrapolation that you arrogantly made is false.

Edward is correct that iterations bring delay and poor pre-silicon validation can kill a product schedule. However, "fault tolerant design" doesn't protect against the types of problems that cause a re-spin. Additionally, formal verification is not going to prevent any bugs since it merely checks logical consistency between logic structure and a higher level description of the design (e.g.-RTL).

Please, (and this isn't directed at edward) don't pretend you know something about chip design, when you clearly don't.

4:56 PM, August 24, 2006  
Blogger pointer said...

How far is tape-out from final product? It depends. It is determined mainly by how good the design team is. Good designers make a fault-tolerant design with adequate simulation and formal verification.

So if a design is good, then tape out is very close to revenue production. If a team screws up somewhere with the design, every iteration brings in another quarter of delay or so. If K8L was taped out in July and everything goes well, I won't be surprised that AMD starts to mass product it by the end of this year. AMD's own mid-2007 timeline might be a conservative one.


Number of Bug is correlated to the number of transistors, excluding those repeating pattern such as cache (a some what rephrased quote from former Intel chief architect, in a talk he gave in university). It is some what reduced with derivative work, increased with more major make over. You can get some sense of it base on the recent historical data. I think i saw someone (Sharikou?) quoted AMD took 10 months for its last product. Personnally i'm not sure what was the last product and how much derivative work it has. I also do not know how much derivative work of the AMD just taped-out product.

I WILL be surprised if AMD start mass produce it by end of the year. Has any of the AMD CPU products been shipped with A0/A1?

8:16 PM, August 24, 2006  
Anonymous Edward said...

"Please, (and this isn't directed at edward) don't pretend you know something about chip design, when you clearly don't."

Please, (and this isn't directed at anyone) don't pretend you read and got the full of my comments, when you clearly didn't. ;-)

Formal verification prevents bugs that happen when translating the RTL to gate logics. Are bugs from such sources not any bugs? Simulation in the end is the final (and slowest) step to "debug" a design before tape out.

A design that is not fault tolerant might have low yield and need re-spin (to add fault tolerance to the design). I do not mean fault tolerance as redundancy, but in a general way such as a well-balanced clock tree or a slightly wider gate width, e.g.

Anyway, these are quite beside the point; they have nothing to do with K8L or AMD.

11:35 PM, August 24, 2006  
Anonymous Anonymous said...

--I think the point is that everyone runs FV, so design team A isn't going to have better results than design team B. Even the "bad" teams run FV.

3:37 AM, August 25, 2006  

Post a Comment

Links to this post:

Create a Link

<< Home