Journal of Pervasive 64 bit Computing
Main Blog Page

Analysis on IT trends and competitive strategies, with emphasis on micro processors, computer systems and networks. Based on latest news, backed up with real data, this site intends to provide a true and realtime picture of the fast changing IT landscape. This journal strives to be accurate on facts and sharp on criticisms. You may email your opinion to sharikou@yahoo.com or post comments here, be cool and intelligent.

Name: Sharikou, Ph. D.

Freelance journalist on IT matters. Some of my writings have been published on online IT journals. Any original content on this journal is Copyrighted, but it's free for non-commercial use. Any Trademarks used on this site belong to their respective owners. Some of the pictures are links. If there is any issue with the content of this site, please email sharikou@yahoo.com .

View my complete profile

Sunday, December 03, 2006

Charlie at INQ not using his brain

He complains that AMD boot Windows 2003 Server Enterprise Edition on K8L showing 16 cores cranking, but did not dare to run Windows Minesweeper. I guess Anand would ridicule AMD the same way: K8L is only good at running Task Manager on Windows 2003 Server Enterprise Edition--nothing else.

See the brain damage here? Even if we assume that AMD somehow got Windows 2003 Server to boot and running all those enterprise services without a crash by pure luck--somehow the 100 million lines of noodle code Microsoft programmers put into that OS did not cause K8L to run into a bug on any of the 16 cores, and AMD was lucky to also have Task Manager running without crash, there is something more. On the Task Manager, CPU usage was near 100% on all 16 cores--which means something heavy duty is running.

Use your brain, folks, or you are like a talking monkey.

BTW, Intel is pretty much finished. A Sun Fire x4600 will have 32 K8L cores.... Intel will be at 25% of AMD performancewise.

On Floating point performance, two dual core Opterons get a score of 119, two quad core Xeons get a score of 101. Conclusion, a dual core Opteron is faster than a Quad core Xeon, as far as FP performance is concerned.

120 Comments:

Christian H. said...: Aaaah, first post. Lucky me. Though we don't always agree, I can say that it IS PISSING ME OFF THAT NO ONE NOTICED THAT ALL 16 cores were runnign at 100% load. That is the same demo done for QFX and this is an entirely "new" architecture where QFX was just an Opteron with an X2 IMC.

I still wonder though if they meant per core or overall beause it sem sliek just adding the beefier XBar and two extra cores would get that perf boost alone.

Extra branch history, larger fetch and enhanced SSE4A, 128 bit L1, etc should do a little better in practice, but then I am not a CPU designer.; 11:17 AM, December 03, 2006
Anonymous said...: Use your brain, folks, or you are like a talking monkey.

most of your posts indicate, that you're the talking monkey not using its brain...; 11:29 AM, December 03, 2006
Anonymous said...: only those ppl who say Hail AMD! can be considered to have brains; 12:00 PM, December 03, 2006
Anonymous said...: Intel will be at 25% of AMD performancewise.; 12:13 PM, December 03, 2006
Anonymous said...: "Intel will be at 25% of AMD performancewise."

Perhaps in massively parallel system. But want bet that clock-to-clock per-core K8L performance on average will be a bit slower than C2? Sure K8L is almost as big jump as Core -> Core2 but not nearly as big as Netburst -> Core2.; 12:53 PM, December 03, 2006
Anonymous said...: when do you think k8l will be out in the market? june 2007?; 1:03 PM, December 03, 2006
Anonymous said...: You got it wrong - task manager was at almost 100% only because the AMD sample was running at a clock frequency of 5 MHz :); 1:47 PM, December 03, 2006
Anonymous said...: Any comments on the DOJ inquiry into AMD's graphics business?; 2:59 PM, December 03, 2006
Anonymous said...: The floating point scores are on two different operating systems; one on Solaris and the other on Windows Server 2003 EE 32bit.

So that is not comparing apples to apples.; 3:03 PM, December 03, 2006
Anonymous said...: Funny thing is that K8L is compared against 2.8GHz dualcore Opteron, 3GHz dualcore Xeon and a 2.66GHz quadcore Xeon. Compared to the dualcores, K8L should be ~70% faster. Compared to the 2.66GHz quadcore the difference is around 15-20%.

First, assuming that FP performance per core doesn't decrease, K8L will be <2.8GHz. It seems that with Intel, performance increases nearly linearly with added copres and clock speed changes, even though it doesn't have true-quadcore.

Secondly, do you honestly believe that Intel can't increase its clock speed in half a year? By summer next year it should have first 45nm CPU's out that should definitely have sigificantly increased clock speed compared to today's CPU's.; 3:08 PM, December 03, 2006
Anonymous said...: How can you claim to be a ph.d and a "journalist" when you can't string a coherent sentence together?; 3:32 PM, December 03, 2006
Anonymous said...: It's nice to see a lot of people focusing on quad-core.

Because the dual-core stuff is getting really really cheap.

And as many tests have shown, there is not a great performance boost going from 2 cores to 4 cores.

So for the time being, it is nice to upgrade single-core systems to dual-core for a reasonable price.

It won't be until late 2007 that we find mature quad-core systems and reasonable prices. Motherboards have to evolve, especially in the server world, and this sort of evolution does not happen quickly.

In a year, we will find many desktop systems with a single quad-core processor. These will be great systems. Apple no doubt will be quad-core across the line by then, if not earlier. It will be fun to buy quad-core.

So in a way, AMD is right on time. Early quad-core is a giant marketing victory for Intel, but prices are 4X what they will be in 1Y. Smart buyers will wait.; 4:58 PM, December 03, 2006
Anonymous said...: Except when K8L comes out next year it will take ages to ramp up production to a significant (say 25% or so) of AMD's production. Think back to 2003 and the Athlon 64 launch. AMD only supplied around 100,000 Athlon 64 CPUs a month for the first few months. It wasn't until well into 2004 that they got production ramping up nicely.

That 25% of AMD's performance is utter nonsense as well. Even if we accept that AMD is being 100% truthful about it's "performance estimates" that's only ~10% increase in speed over Intel's Xeon 5300 range.
Who said Intel is going to stand still and take this?

Tigerton is out next year for 4P and above, all the way to 32P. 45nm native quad core as well.; 5:36 PM, December 03, 2006
Anonymous said...: btw, I was reading anandtech when they reviewed the performance of the QUAQ 4x4.
Im totally looling at their bullshit
I mean when they were reviewing pentium 4's D's and stuff, they never dared to say bad stuff about
power consumition of Pentium products, but now they atack on all fronts the AMD products..
anand is completely sold its souls to intel.; 6:28 PM, December 03, 2006
Anonymous said...: Sharikou is claiming "A Sun Fire x4600 will have 32 K8L cores.... Intel will be at 25% of AMD performancewise."

The Dr loves to keep bringing up the Sun Fire X4600 in SPECfp_rate2000 running Solaris 10, there are no X4600 benchmarks for SPECint_rate2000, so I will take a X4200 (2.6GHz 4 cores, 2 chips) and use linear scaling for a X4600...

AMD SPECint_rate2000

Sun Fire X4200

SPECint_rate2000 = 75.6 (4 cores)

75.6 * 4 = 302.4 for an X4600 (16 core) in SPECint_rate2000.

302.4 * 1.7 = 514.08 for an X4600 (32 core) in SPECint_rate2000.

Intel SPECint_rate2000

IBM System X 3650 (2.67 GHz Xeon X5355, 8MB L2 Cache)

SPECint_rate2000 = 198 (8 cores)

198 * 4 = 792

792 * 0.80 (scaling) = 633.6, or if you think the scaling will be worse 792 * 0.7 = 554.4

This gives you an idea of the performance of a 8P Tigerton, going up against an X4600 with quad cores in SPECint_rate2000, this is hardly a win for AMD, and I am not seeing the 25% of AMD's performance here, lets keep looking for it...

AMD SPECfp_rate2000

Sun Fire X4600

SPECfp_rate2000 = 231 (16 cores)

231 * 1.4 = 323.4 for an X4600 (32 core) in SPECfp_rate2000.

Intel SPECfp_rate2000

IBM System X 3650 (2.67 GHz Xeon X5355, 8MB L2 Cache)

SPECfp_rate2000 = 101 (8 cores)

101 * 4 = 404

404 * 0.80 (scaling) = 323.2 or if you think the scaling will be worse 404 * 0.7 = 282.8

This gives you an idea of the performance of a 8P Tigerton, going up against an X4600 with quad cores in SPECfp_rate2000, again this is hardly a win for AMD, and I am not seeing the 25% of AMD's performance here either, the Dr's claims are ridiculous and in general outright lies.

The only conclusion I can come up with is that the initial clock speeds will be quite low, somewhere around 2GHz for the upcoming Rev H quad cores due to there TDP, but this is hardly a win for AMD and would seem to mean AMD is having trouble with 65nm, further proof is the FX series for Quad FX still being on 90nm.

At the same time this may or may not take into account Hyper Transport 3, the presentation on AMD's site said nothing about it, that will most likely increase performance, but will it be enough?

I am not trying to say anything negative about AMD, but Intel is showing there manufacturing muscle and AMD is in trouble.; 6:28 PM, December 03, 2006
Anonymous said...: "I still wonder though if they meant per core or overall beause it sem sliek just adding the beefier XBar and two extra cores would get that perf boost alone."

Khalif - I completely agree and think you are dead on. if this is a chip to chip comparison, I don't think this (simulated) data is so great. If it is normalized for core count, than yes it is very impressive....

Sharikou - perhaps the chip is so bad it was just having a hard time running VISTA and near all 16 cores just to open up task manager? (just kidding) I've read somewhere that K8l has not been verified to be Vista ready (kidding again)

The truth is noone knows because this was a marketing ploy, simply meant to say look we have new technology too - ignore all those crappy 4x4 benchmarks because in 6-12 months we'll be doing 8x4 and all those people buying 4x4 chips will be throwing them away (or maybe trying to jam them into AM2 sockets)?

There was no info - no speeds quoted for the CPU, no benchmarks (real or synthetic), no system specs. Could one imagine the outcry if Intel showed off similar, SIMULATED, data on a new architecture without providing any data or allowing anyone to see the HW?

Oh wait I don't have to imagine this I just need to go back in time when the early Core 2 info started coming out and everyone was screaming rigged! and reviewers were paid off! And at least then there was actual benchmark data (not "simulated"). I even recall Sharikou questioning a Kenstfield demo of Kentsfield operating (which showed task manager with 4 cores operating), because noone had taken the heatsink off or ripped open the chip packaging to verify there really were 2 Conroes in the one socket!

Pot.....kettle.....black......; 6:45 PM, December 03, 2006
Anonymous said...: Sharikou: Here's a hypothetical for you....

What if this stepping had a critical speed path which will need to be fixed on a future stepping...and say the CPU was running at 500MHz or 750MHz or 1 Ghz. Would the CPU cores be taxed at all then? (I'm not familiar with Windows 2003 so I have no idea what background processes run...)

What speed was the demo chip running at again? Oh that's right it was so fast AMD wanted to keep it a secret right? (Because if it was working real well, there'd be no PR benefit for AMD to tell analysts we have a next gen chip running at our targeted speed, right?).

The only thing you can conclude from near 100% CPU usage is that the cores are getting taxed heavily - you don't have enough data to conclude "something heavy duty was running". It is just as possible that the cores are running below final specs and are being taxed by "lightweight" processes; ESPECIALLY since AMD declined to provide any detail.; 6:56 PM, December 03, 2006
Sharikou, Ph. D. said...: Intel SPECint_rate2000

IBM System X 3650 (2.67 GHz Xeon X5355, 8MB L2 Cache)

SPECint_rate2000 = 198 (8 cores)

198 * 4 = 792

Don't you know Intel's Core 2 architecture is for ultra low end 2P only?

For 4P and above, Intel's chip is called Tulsa.; 7:19 PM, December 03, 2006
Anonymous said...: LOL!!! I had something witty to say but I would just be waisting my time arguing with this so called doc. I really hope Amd does well with their next chip release, but this guy must be smoking something if he thinks that their next chip will be better then a few points, in fact they may not even be able to catch up since Intel is a moving target and lately has been moving even faster.; 7:33 PM, December 03, 2006
Anonymous said...: That's all you have?

Tulsa is based on Netburst.

Tigerton could/should be out about the same time, which is based on the Core architecture.

You are trying very hard to grab straws huh doc, or just over looking the fact that we are comparing two future products, and you don't like the results?; 7:57 PM, December 03, 2006
Anonymous said...: The paper presentation says "Estimated". That's SPECULATION. No real benchmarks to back it up. Period.; 9:25 PM, December 03, 2006
Anonymous said...: You can't compare BASE and PEAK scores for the SPEC benchmarks.; 10:32 PM, December 03, 2006
Anonymous said...: You can't compare BASE and PEAK scores for the SPEC benchmarks.; 10:32 PM, December 03, 2006
Anonymous said...: What if this stepping had a critical speed path which will need to be fixed on a future stepping...and say the CPU was running at 500MHz or 750MHz or 1 Ghz. Would the CPU cores be taxed at all then? (I'm not familiar with Windows 2003 so I have no idea what background processes run...)

What speed was the demo chip running at again? Oh that's right it was so fast AMD wanted to keep it a secret right? (Because if it was working real well, there'd be no PR benefit for AMD to tell analysts we have a next gen chip running at our targeted speed, right?).

The only thing you can conclude from near 100% CPU usage is that the cores are getting taxed heavily - you don't have enough data to conclude "something heavy duty was running". It is just as possible that the cores are running below final specs and are being taxed by "lightweight" processes; ESPECIALLY since AMD declined to provide any detail.

please stop using logic arguments! this is shakirous blog!; 2:22 AM, December 04, 2006
Anonymous said...: 100% on all cores doesnt necessarily mean heavy duty...
Since even small tasks are too much for AMDs slow processors =)

However, Windows generally loads the CPUs at 100% from time to time, without any obvious reason; 4:20 AM, December 04, 2006
Anonymous said...: The URL you posted for the Xeon X5355 CFP2000 Results only has base copies, base runtime, base ratio measurements. Does that mean the tests have not actually been run as that report looks a little incomplete compared to others.; 5:32 AM, December 04, 2006
Anonymous said...: 4x4 is a complete mess. Who's going to buy this? Enthusiasts are better off with different processors, it's no good as a server solution, and its benchmark scores are underwhelming for digital content creations.

Not to mention it's impossible to upgrade TO, not able to be overclocked, and consumers far more power than it SHOULD.

AMD had better stick to making its dual core and Opteron products better before trying to get into the big leagues.; 7:43 AM, December 04, 2006
Anonymous said...: OT but interesting

EEtimes says

SAN FRANCISCO — Advanced Micro Devices Inc. (AMD) and Hynix Semiconductor Inc. will finally break into the list of the world's top 10 semiconductor suppliers this year, according to a preliminary ranking from market research analyst iSuppli Corp.
AMD's semiconductor revenue is expected to increase by 90 percent in 2006, which will cause the company's ranking to jump eight places to seventh place; 8:39 AM, December 04, 2006
Anonymous said...: OT but cool

The Z-RAM Gen2 technology for AMD & IBM is ready. Zram is looking better all the time.

http://www.eetimes.com/news/semi/showArticle.jhtml?articleID=196601127; 8:49 AM, December 04, 2006
Roborat, Ph.D said...: AMD's quad core demo and its "simulated" benchmark is nothing but knee-jerk panic response to Intel's slaughter of QuadFX for damage control.

Clearly AMD hasn't won a product review for the last 6 months and this forced demo is not a positive sign. And I haven't even talked about what they're demoing.; 10:02 AM, December 04, 2006
Fujiyama said...: http://www.us.design-reuse.com/news/news14802.html

Z-RAM Second Generation
65nm licensed by AMD; 10:47 AM, December 04, 2006
Anonymous said...: OK - I'll take a stab at some 65nm insights.

Intel's process philosophy is put in the bulk of the transistor improvements before ramp and than after technology ramps there are minor process tweaks (implant, maybe a little bit of tightening down on litho CD's, and fine tune yield). This means the 65nm improvements are more or less seen right away. Hence the P4 power declines that were seen, speed scaling, etc...

AMD's approach is this CTI approach which has steps in improvements of transistor improvements. The initial 65nm transistor is essentially the same as 90nm with exception of CD's. Without Tox, implant and other scaling of the transistor, short channel effects will dominate and you should expect to see very little transistor speed improvement initially. You will probably see some power improvements as the Vt for the transistor will be targetted lower (per ITRS roadmap) and thus supply voltage (and active power consumption) will go down.

I would not expect offstate power (idle) to change much as the main sources of leakage will not be changed much initially (gate, subthreshold, junction) - this may improve a bit overtime when some of the transistor improvements are put in. (I would not expect much here overall on 65nm, with the obvious exception of the new architecture performance, as AMD has hit the same gate oxide scaling wall as the rest of the industry has)

So I'm not sure why people are surprised that the high end products continue to be 90nm and the 65nm products are lower speeds. The 90nm process has been running for a while and yield/bin splits should be MUCH more stable than 65nm right now (and please don't come with that APM3.0 crap!)

Over time you will see the higher end products start to migrate over to 65nm but don't expect it right away, and for folks assuming the current 65nm versions of the 90nm X2 parts will OC better, you are in for a surprise (down the road this may become true though...); 12:17 PM, December 04, 2006
Anonymous said...: And in other news... Intel is teaming up with GM and going to be making cars!! Cars?? Yes, cars...!! It will follow the EXACT same microprocessor formula to claim to be first.

They will be FIRST (yes, FIRST to market) by strapping two 4-bangers (4-cyclinders) together and "calling" it a V8!! Genius!!

This to be followed soon after by strapping to V6's together for a W12 configuration monster! Wow?!! Who cares if it works well... the marketing guy said to give them something marketable.

What's better than W12 you ask?

Well, TWO V8's of course!!
Yes, W16's will be out soon... and they will once again be FIRST to market for the marketing guys. Brilliant!!!

Early preliminary specs (released yesterday, or however many days before the other company does it):
- One V8 will drive the FRONT wheels
- Other V8 will drive the REAR wheels
- However, the FRONT and REAR cannot talk to each other, they are linked using the same old V8 transmission.

I personally can't wait until the V10 comes out... cause then, a W20 can be out in no time with MCM, umm... I mean, transmission.

:); 1:44 PM, December 04, 2006
Anonymous said...: "Funny thing is that K8L is compared against 2.8GHz dualcore Opteron, 3GHz dualcore Xeon and a 2.66GHz quadcore Xeon. Compared to the dualcores, K8L should be ~70% faster."

The performance figure of 70% and 40%
are for improvements in PERFORMANCE PER WATT! Check the AMD website.; 2:03 PM, December 04, 2006
Anonymous said...: Amd went from 3.9 billion to 7.5 billion in one year. They must be doing something right. (ATI deal); 2:21 PM, December 04, 2006
Anonymous said...: Between now and K8L, AMD needs to have 65nm chips with at least twice the Level 2 cash. That should improve performance by at least another 3-5% allowing 4 x 4 to come closer to the QX6700. And also improving things on the 1 P and 2P server side.; 2:56 PM, December 04, 2006
Anonymous said...: Charlie has been very pro AMD in the past.. bottom line he sees the writing on the wall.

AMD is finished, Barcelona is coming late. AMD can't even get their Astep silcon to do anything but run task manager... how sad

AMD BK in 2008; 6:45 PM, December 04, 2006
Anonymous said...: "The performance figure of 70% and 40% are for improvements in PERFORMANCE PER WATT! Check the AMD website."

This makes the concerns valids - as the dual and quad cores are supposed to be ~same power envelop, this would mean that doubling the cores AND changing the architecture only led to a 40% FP/70% INT... onw would think dual to quad core ALONE would be this good... they must be slowing the core down to meet the power/thermals....; 7:56 PM, December 04, 2006
Anonymous said...: "And in other news... Intel is teaming up with GM and going to be making cars!! Cars?? Yes, cars...!! It will follow the EXACT same microprocessor formula to claim to be first."

And in other news AMD is chaining two trucks together to make a quad truck! Of course it uses double the gasoline, but hey it is really innovative and the trucks can be later swapped out for bigger SUV's...

Of course it's performance is not as good as Intel's car, but hey it uses twice as much gasoline and for those megataskers who need to tow 5 cars behind them (which every truck enthusiast needs to do, no?), it can do that...I think they call it a 4x4?!?!? Oh and you probably will need a new radiator to keep the engine from overheating, but you can plug in 4 CD players and 12 cigarette lighters/power adapters...I hear the other car maker can only have 2 CD players and 4 cigarette lighter so you are quite limited listening to music while chain smoking....

Real good analogy (and by real good I mean dumba$$)...; 8:03 PM, December 04, 2006
Anonymous said...: 65nm is out!!! Amd will surely take back the title now since the atholons easily beat the core2 duos now that Amd went 65nm its all over. Intel BK in about 2 quarters....; 10:58 PM, December 04, 2006
Roborat, Ph.D said...: The hilarious part is the turn around between AMD and Intel. At the beginning of this year Intel was being laughed at for benchmarking Core2Duo 6 months before release. And now it’s AMD doing the same thing. The only difference is that Intel showed a system running real benchmarks while AMD now is just showing simulation. AMD’s lack of consistent progress is the fundamental reason why it will always play second fiddle to whatever industry it wants to compete in.

- the new Sharikou; 2:23 AM, December 05, 2006
Pop Catalin Sever said...: Anonymous said...
"Charlie has been very pro AMD in the past.. bottom line he sees the writing on the wall.

AMD is finished, Barcelona is coming late. AMD can't even get their Astep silcon to do anything but run task manager... how sad"

No, actuay Barcelona arives at the best time possible! When Conroe will become mainstream. Actualy C2D didn't gave Intel any financial advantage so far, because of limited production, meanwile AMD has squized the last drop of money fron 90nm process without rushing to 65nm wich translates directy to higher profits... and this can be seen in AMD's double digit growth and Intel's double digit decline !

Intel's latest game of giving highest performance at the cost of premature transitions with billions of dollars expenses only holds in the eyes of fanboys and not from an economical point of view.
It's ironical how oance more the lesson is not learned, "technology is of no use if can't be made available to the masses" or at leat at large portions of them.

Gaining market share without transitioning to new manufactoring thenologies can't be a bad thing no matter haw you take it and this is what AMD just did even if it lost the performance crown for a short period I think it was a briliand execution a strategy from AMD's part ...; 4:10 AM, December 05, 2006
S said...: "Conclusion, a dual core Opteron is faster than a Quad core Xeon, as far as FP performance is concerned."

Your comparision is like making an F1 car race on a cross country circuit and get a Hummer to race on a F1 circuit and then say Hummer is a faster car. It is not a apples to apple comparision.; 5:55 AM, December 05, 2006
Anonymous said...: "This makes the concerns valids - as the dual and quad cores are supposed to be ~same power envelop, this would mean that doubling the cores AND changing the architecture only led to a 40% FP/70% INT... onw would think dual to quad core ALONE would be this good... they must be slowing the core down to meet the power/thermals...."
NOT SO.You have to compare like with like because the "performance per watt(PPW)" figure is the SAME no matter how many cores are involved. What the 70% figure means is that if a DUAL CORE rev. F2 CPU has performance of M1 at Q1 watts,then a rev. H DUAL CORE CPU has a performance of M2=1.7M1 at Q1 watts. The PPW figures are:
M1/Q1 for a DUAL CORE rev.F2 and 1.7M1/Q1 for a DUAL CORE rev.H.Now lets look at QUAD CORES: for rev.F2
the performance is 2M1 and the power 2Q1 and PPW=(2M1/2Q1)=M1/Q1 the same as for DUAL CORE.
For rev.H the performance is 2M2=2(1.7M1)=3.4M1 (2M2/2Q1)=2(1.7M1/2Q1)=1.7M1/Q1 the same as DUAL CORE.
In other words, a quad core rev. H Barcelona is 3.4 times more powerful than a dual core rev.F2 Opteron at the SAME TDP. Which means you need to go BACK TO SCHOOL, MORON!; 8:56 AM, December 05, 2006
Anonymous said...: And in other news... 65nm X2 is about 67% the die area of a 90nm product, not the 50% people were (naively) expecting - "65nm will effectively double AMD's capcity, blah, blah, blah...."

Of course this should have been painfully obvious to anyone with a technical background when you looked at the reported SRAM cell sizes at IEDM (info available way back in later 2005); but then again this is Sharikou's blog so I guess I should not really be expecting any insightful technical comments when it comes to Si processing...

So the million dollar question for Sharikou: Why is AMD not getting 2X from 65nm in terms of die scaling? Are they trying to keep production costs up? Do they just not want to make the extra chips?

I'll wait for his analytical response before providing some actual information.

Now throw in yields, extra wafer processing costs (additional metal layer, extra process steps for all the new strain processes, movement to more high end litho equipment...) and it makes you wonder how much of a financial impact 65nm will have in the near future. Of course from technical perspective 65nm will be better, so I'm not saying 65nm is a bad idea but for all those folks expecting HUGE financial benefits from 65nm, you will need to wait for further shrinks and process maturity to see them (at least 1 more year). Of course AMD claims to have pulled in 45nm to 18 months, if true those benefits will be shortlived...; 12:45 PM, December 05, 2006
Anonymous said...: AMD FUD to the 65nm power released!; 2:12 PM, December 05, 2006
Anonymous said...: See the brain damage here?

Yes, unfortunately it is your brain damage I see. You probably didn't read Charlie's article carefully enough or just can't comprehend it. He did say it was a pretty amazing feat to just boot Server 2003. Also, in the past you lambasted Intel for doing a very controlled demo on Conroe where reporters weren't allowed to tweak demo systems but you think it is GREAT that AMD can just run task manager at an analysts conference. That shows that you are a hypocrite but I think everyone here already knows that. Finally, I think one of the analysts at that meeting summed it up best by saying "It wasn't much of a demo unless you count listening to server fans humming a demo"; 2:44 PM, December 05, 2006
Anonymous said...: How can you claim to be a ph.d and a "journalist" when you can't string a coherent sentence together?

Great question.... I wait for Dr. Sharifraud's response.; 2:46 PM, December 05, 2006
Anonymous said...: 65nm is out!!! Amd will surely take back the title now since the atholons easily beat the core2 duos now that Amd went 65nm its all over. Intel BK in about 2 quarters.

Intel's had it's 65nm process online for over a year now. You tout AMD's 65nm process as if it's some huge new revolution. Intel has three 65nm fabs right now. 45nm coming next year.

What's worse for AMD is that there's no performance benefit from 65nm. All their faster parts (6000+ etc) are all made on the older 90nm process.; 5:41 PM, December 05, 2006
Anonymous said...: AMD got Opteron from Microsoft who in turn stole it from an independent designer.

Now when it comes to doing something new themselves, AMD is falling on their face.

Because stealing can only get you so far. When there is no talent, sooner or later, you hit the wall.

So it is really AMD that is not using their brain. Because they don't have one.; 6:35 PM, December 05, 2006
Anonymous said...: "AMD can't even get their Astep silcon to do anything but run task manager... how sad"

Sharikou described the "brain damage" sympton and you fall right into it after he described it.

A task manager with all cores running at near 100% load means the silicon is working on somethings hard.; 8:15 PM, December 05, 2006
Anonymous said...: Intel fans seem not able to comprehend a few basic points on computer performance:

1. Multi-core performance improvement depends heavily on the software you run and the number of cores you have. Even with Cinebench (which improves 90+% fron single to dual cores), speedup is ~70% from dual to quad cores, and ~50% from quad to 8 cores. This is amongst the best scaling one could expect from common workstation/server software.

2. The 70% performance improvement AMD claimed is on database applications, which by nature don't scale nearly as well as rendering programs such as Cinebench.

3. In terms of new vs. old microarchitectures, one should look at / compare to the speedup seen from Core Duo to Core 2 Duo. Was it as great as 70% INT and 40% FP? Not even for SuperPi, dude. For almost all programs it's less than 30% (unless you believe Core Duo was 10+% slower than K8).

4. There could be many reasons for AMD to simulate of quad-core performance. It's supposed to work with HT3 and DDR3, both of which are not available now. The first silicons are probably not speed-binned nor run at optimal frequencies. The software/compiler could not be ready (to better optimize for newer instructions and additional L3 cache, for example). Any one of these could make simulation necessary - and the simulated result could be very close to the real results.

A 70% performance per watt improvement on database applications is significant, no matter how you look at it (unless, of course, you believe AMD will be far off with its estimate when the chip is actually released).; 9:05 PM, December 05, 2006
Anonymous said...: "if this is a chip to chip comparison, I don't think this (simulated) data is so great. If it is normalized for core count, than yes it is very impressive...."

It doesn't matter. The comparison is on performance per watt. That should say it all for all non-nuthead people.; 9:10 PM, December 05, 2006
Anonymous said...: "What if this stepping had a critical speed path which will need to be fixed on a future stepping...and say the CPU was running at 500MHz or 750MHz or 1 Ghz. Would the CPU cores be taxed at all then?"

The CPU will not reach its power usage envelope, but you will see near 100% CPU usage (precisely due to the critical path and slower clockrate).; 9:13 PM, December 05, 2006
Anonymous said...: "The only thing you can conclude from near 100% CPU usage is that the cores are getting taxed heavily - you don't have enough data to conclude "something heavy duty was running"."

If running task manager could've taken all 16 cores nearly 100% load, how long do you think it'd take to boot Windows? Remember, the Windows was reportedly to boot in front of the audience. So nobody, not even Charlie at the Inquirer, noticed someting wrong (i.e. extremely slow) when the machine was booted and task manager was started?

FYI, just try to run Windows XP with K8 X2 or C2D core speed fixed at 800MHz. You will notice the slow-down with each and every open/close operation.; 9:19 PM, December 05, 2006
Anonymous said...: "This makes the concerns valids - as the dual and quad cores are supposed to be ~same power envelop"

This (good property) is not true for Intel, though.; 9:24 PM, December 05, 2006
Anonymous said...: 3. In terms of new vs. old microarchitectures, one should look at / compare to the speedup seen from Core Duo to Core 2 Duo. Was it as great as 70% INT and 40% FP? Not even for SuperPi, dude. For almost all programs it's less than 30% (unless you believe Core Duo was 10+% slower than K8).

You're missing one crucial point though:- Comparing Core Duo to Core 2 Duo was still comparing two dual cores. Not a dual core to a quad core.

You'd expect more of an improvement considering that they are doubling the number of processing cores and using a "new" architecture.; 11:39 PM, December 05, 2006
Anonymous said...: A 70% performance per watt improvement on database applications is significant, no matter how you look at it (unless, of course, you believe AMD will be far off with its estimate when the chip is actually released).
It's not bad, it's just that Intel Clovertown already matches that performance and power consumption today.; 1:13 AM, December 06, 2006
Anonymous said...: keep in mind if you look for other 2p opteron systems in the database only one 2p opteron system beats intel, the rest loose by about 10 points, and they are running a similar operating system with the same configs.
http://www.spec.org/cpu2000/results/res2006q4/cpu2000-20061030-07857.html; 1:30 AM, December 06, 2006
Anonymous said...: "You're missing one crucial point though:- Comparing Core Duo to Core 2 Duo was still comparing two dual cores. Not a dual core to a quad core."

No, I was not, but you are. The point here is performance per watt. Merom does not have 70% increased performance per watt over P-M, not even close. Even talking about absolute performance, 70% speedup on database apps from 1 to 2 cores is already impressive, let alone from 2 to 4 cores (remember AnandTech's "negative scaling" article?); 2:01 AM, December 06, 2006
Anonymous said...: "This makes the concerns valids - as the dual and quad cores are supposed to be ~same power envelop"

This (good property) is not true for Intel, though.
What's so great about having extremely hot dual-cores? It's easy to fit with a dual-core power envelope when its as high as Intel's quad-core power envelope.; 3:22 AM, December 06, 2006
Anonymous said...: "Intel Clovertown already matches that performance and power consumption today."

What are you smoking? Clovertown got no where near 70% speedup over Woodcrest for real database applications. Don't tell me it serves 70% more webpages per second with MySql - that type of static information retrieval is not what database app is about.

The main purpose of Clovertown is to catch up with Opteron in terms of total number of cores you can have. With 8-way dual-core Opteron you can have 16 cores today and 32 cores next year. Woodcrest allows you to have up to 4 cores; Clovertown, 8 cores. For highly multithreaded app like Cinebench, 8-core Clovertown is 31% faster than 4-core Woodcrest (search for "31"). Even today's 4-way Opteron servers beat that.

Sharikou is actually right about one point: Core 2 is (probably purposely) made for low-end servers. Intel fans will say, "but most servers are 2p, and you can build great farms with 2p servers!" However, with 2p only, you lose power efficiency (just the additional power supplies will eat 30+% electricity) and manageability (2x number failures and management).; 12:09 PM, December 06, 2006
Anonymous said...: "It's easy to fit with a dual-core power envelope when its as high as Intel's quad-core power envelope."

What makes you think there won't be an "HE" version Barcelona?; 12:11 PM, December 06, 2006
Anonymous said...: Anónimo dijo...

"This makes the concerns valids - as the dual and quad cores are supposed to be ~same power envelop"

This (good property) is not true for Intel, though.
What's so great about having extremely hot dual-cores? It's easy to fit with a dual-core power envelope when its as high as Intel's quad-core power envelope.

3:22 AM, December 06, 2006
-----------
Dont forget that all quadcores have way slower freq per core ( from intel )

Kentsfield top is like 2.6 Ghz
vs the 3 Ghz and something from the extreme dual core.; 1:28 PM, December 06, 2006
Anonymous said...: No, I was not, but you are. The point here is performance per watt. Merom does not have 70% increased performance per watt over P-M, not even close.
Merom vs P-M is probably around 140% increased performance/watt, twice as many cores, each core significantly more powerful while using 10% more power on load or so.; 1:49 PM, December 06, 2006
Anonymous said...: What are you smoking? Clovertown got no where near 70% speedup over Woodcrest for real database applications. Don't tell me it serves 70% more webpages per second with MySql - that type of static information retrieval is not what database app is about.
The estimated OLTP score for Barcelona can be matched by Clovertown today. AMD merely picked the slower of the two HP submissions.

The main purpose of Clovertown is to catch up with Opteron in terms of total number of cores you can have. With 8-way dual-core Opteron you can have 16 cores today and 32 cores next year. Woodcrest allows you to have up to 4 cores; Clovertown, 8 cores. For highly multithreaded app like Cinebench, 8-core Clovertown is 31% faster than 4-core Woodcrest (search for "31"). Even today's 4-way Opteron servers beat that.
No they don't, in fact they scale very similarly to Clovertown:

http://forums.2cpu.com/showpost.php?p=624674&postcount=9; 7:22 PM, December 06, 2006
Anonymous said...: What makes you think there won't be an "HE" version Barcelona?
Which will be competing against a LV version Clovertown.; 7:24 PM, December 06, 2006
Anonymous said...: "It doesn't matter. The comparison is on performance per watt. That should say it all for all non-nuthead people."

So I take it you believe 4x4 is complete crap based on its performance per watt then?

You are also taking all of the SIMULATED data as gospel; one would think you would actually want to wait and see an actual demo of an application run first (and also look at the simulated 40% FP, instead of solely focusing on 70% INT...)

From what the INQ is saying (so take it with a grain of salt) - the planned clock speed was significantly reduced to fit the same power envelop. Also I believe the dual core opterons were 1MB cache/core; these are 512MB/core (which also helps with power). They did add 2MB L3 cache which essentially brings the cache/core ratio even (albeit half of it will now be slower L3)

I also find it amusing that people compare Core 2 to Core as the previous architecture when that is only the case in mobile area. (probably so it makes the Core 2 performance leap look less so). If one compares a dual core P4 server of desktop chip to a Kenstfield is there a >40%FP / 70%INT jump?; 10:33 PM, December 06, 2006
Anonymous said...: "What makes you think there won't be an "HE" version Barcelona?
Which will be competing against a LV version Clovertown."

Woodcrest LV is 80W, but Opteron HE is 65W. What say you now?; 10:41 PM, December 06, 2006
Anonymous said...: Dont forget that all quadcores have way slower freq per core ( from intel )..."

Way slower? Kenstfield - 2.66GHz, X6800 - 2.93GHz.

I find it amusing you rounded incorrectly in both directions (to try to maximize speed delta?) - the difference is <10%. Of course TDP is higher for Kenstfield due to the fact that core speed is "pretty close" to the dual core. What would be interesting would be for someone to downclock the processor to see what speed Kenstfield would have to run at to meet the TDP of the top end dual core (6800).

In AMD's case to maintain the same power envelop (let's use the 68Watt one) they are planning 1.9-2.0GHz K8L vs 2.8/3.0GHz Opterons (not sure what top speed grade is for 68Watt). If you prefer the 95Watt envelop the K8L is targetted for 2.1-2.3 (which is still a substantial speed drop); 10:51 PM, December 06, 2006
Anonymous said...: Oops... I was wrong. Woodcrest LV is only 40W, not 80W.; 10:52 PM, December 06, 2006
Anonymous said...: "For highly multithreaded app like Cinebench, 8-core Clovertown is 31% faster than 4-core Woodcrest (search for "31"). Even today's 4-way Opteron servers beat that.

No they don't, in fact they scale very similarly to Clovertown:

http://forums.2cpu.com/showpost.php?p=624674&postcount=9"

Huh? Did you read the link you referred? From 3.2x to 4.8x speedup the improvement is something like 50%, which beats the 31% of Clovertown over Woodcrest soundly!; 10:57 PM, December 06, 2006
Anonymous said...: "In other words, a quad core rev. H Barcelona is 3.4 times more powerful than a dual core rev.F2 Opteron at the SAME TDP. Which means you need to go BACK TO SCHOOL, MORON!"

You need to work on the algebra buddy... AMD themselves says 70%INT/40%FP normalized. You start you calucations with these numbers and end up 340% better? Ummmm....

So all of us understand, your calculations surmise that the Barcelona core is 3.4 times more powerful AT SAME TDP and therefore the improvement we should get is 70%INT?; 11:02 PM, December 06, 2006
Anonymous said...: "Merom vs P-M is probably around 140% increased performance/watt, twice as many cores, each core significantly more powerful while using 10% more power on load or so."

You must be smoking heavily to say this. First, Cinebench from 1 to 2 core can have speedup ~80%; that's pure SMP, and has nothing to do with how good a "new architecture" is. From 2 to 4 cores the speedup of Core 2 w/ Cinebench drops to something like 50-60%. From 4 to 8 cores, the speedup is even lower.

Second, Core2 to P-M is something like 10% improvement on Cinebench. And FYI, Core Duo is really two Pentium-M with a shared L2 cache.

Thus w/ Cinebench, a 1-socket Clovertown would be lucky if it's 66% faster than a 1-socket Core Duo based server. The 140% you mentioned only exists in Intel fan's imaginary fairyland created by Intel's marketing force.; 11:27 PM, December 06, 2006
Anonymous said...: You are comparing Core2 Duo to Core Duo, not Core2 Duo to Pentium M; 10:00 AM, December 07, 2006
Anonymous said...: "Second, Core2 to P-M is something like 10% improvement on Cinebench. And FYI, Core Duo is really two Pentium-M with a shared L2 cache."

10% or less, don’t forget that the link you provided is vs the 4MB version. I think the difference between Core Duo 2MB vs Core 2 Duo 2MB is much smaller.

http://www.xbitlabs.com/articles/mobile/display/core2duo_12.html; 10:48 AM, December 07, 2006
Anonymous said...: "First, Cinebench from 1 to 2 core can have speedup ~80%; that's pure SMP, and has nothing to do with how good a "new architecture" is. From 2 to 4 cores the speedup of Core 2 w/ Cinebench drops to something like 50-60%."

That is a great example of a badly written multithreaded application. It is relatively simple to write a perfectly scaling ray tracer. I've seen exact same program to scale linearly with 2x increments from one to two to four cores. If anyone is interested then that program is here

I have two questions:
1) where did that Java processor story go?
2) Why isn't here a new article about the rumours of K8L based Opteron speeds? Simulated test results earned a story and this one doesn't?

One place where they are reported is here. Maximum speed 2.5GHz. What do you think how high will C2Q reach by that time? Current official maximum is already clocked higher than AMD's future product.; 11:51 AM, December 07, 2006
Anonymous said...: "And FYI, Core Duo is really two Pentium-M with a shared L2 cache."

Edward you really don't think this do you? I've always thought you were a little sharper and new a little more about microarchitectures than Sharikou and would not resort to such fanboy statements...

And I think the comments about 70%/40% is both increasing cores AND changing to a new architecture? (and normalized per Watt - did you do this for P-M/Merom comparison?)

If AMD gets 40% improvement doubling cores and changing architecture, how much FP improvement would you expect from a dual core K8L over a dual core K8 (THAT is the architecture comparison, no?). This would be even less than 40%, no?; 12:22 PM, December 07, 2006
Anonymous said...: Huh? Did you read the link you referred? From 3.2x to 4.8x speedup the improvement is something like 50%, which beats the 31% of Clovertown over Woodcrest soundly!
Do you forget that Clovertown is 12% frequency slower than Woodcrest? What's the speedup between 1 thread and 8 threads for Clovertown, that's right 4.875X.; 12:35 PM, December 07, 2006
Anonymous said...: Thus w/ Cinebench, a 1-socket Clovertown would be lucky if it's 66% faster than a 1-socket Core Duo based server. The 140% you mentioned only exists in Intel fan's imaginary fairyland created by Intel's marketing force.
Yawn. You would be wrong, Yonah scores 630 on Cinebench, Kentsfield scores 1337.

Besides, I was referring to your comment:
No, I was not, but you are. The point here is performance per watt. Merom does not have 70% increased performance per watt over P-M, not even close
Which it clearly does since Merom is more than double the computing power of Dothan at a minor increase in electrical power.; 12:45 PM, December 07, 2006
Anonymous said...: Here is a review comparing the Core 2 Duo to a Pentium M.

The performance is pretty substantial, more like 28% in Cinebench.; 12:56 PM, December 07, 2006
Anonymous said...: "You need to work on the algebra buddy... AMD themselves says 70%INT/40%FP normalized. You start you calucations with these numbers and end up 340% better? Ummmm...."

If something is 3.4X better, the percentage change is {(3.4X-X)/X}100=240%
BACK TO SCHOOL, MORON.; 4:33 PM, December 07, 2006
Anonymous said...: Doc.,

what happened to the latest story? Why did you retract it?; 5:50 PM, December 07, 2006
Anonymous said...: Sharikou..

Where is your next BK prediction?

AMD has got the war won. I predict INTEL having invested bilions in inferior 65nm now that AMD's quadocre is out and kicking ass. They are total screwed. They are invested another 5-7 billion in a broken 45nm technology with broken CPU.

I say BK by end of 2007.. what do you think?; 6:13 PM, December 07, 2006
Anonymous said...: AMD's quad core is out? They've shown "estimated" performance... and shown a few people a 4P server running task manager. That's still a LONG way from have a product to sell.

In the mean time Intel has AMD beat in every market segment except 4P/8P servers where AMD is more competitive. Tulsa, even though it's Netburst based, still does well in 4P and up thanks to it's 16mb L3 cache.

Intel has Tigerton coming for 4P and up next year ... any advantage AMD will have because of Barcelona will be gone. Native quad cores for the desktop and 2P server market too.

In the mean time, while AMD is blurting out incessant hyperbole about how great a native quad core is vs. a MCM quad core Intel is selling quad core processors and regaining server marketshare. Yeah, way to go AMD!!

The only new product they have is the 4x4. They really should have called it HeatxLow Performance, as that's what it is. The tests show that the 4x4 FX 74 can barely match the QC6600 from Intel @ 2.4Ghz. It gets utterly destroyed by the QX6700 running at 2.66GHz.

I have to say it looks like AMD is BK by Q4 '07. I doubt they'll even have time to get Barcelona out.

In meantime they have plenty of time to tell people how great native quad core processors are, and that *gasp* *shock* Intel's integrated graphics are awful for gaming!; 8:38 PM, December 07, 2006
Anonymous said...: "Of course TDP is higher for Kenstfield due to the fact that core speed is "pretty close" to the dual core."

Nop, Kentsfield TDP is higher because its quad-core is non-native. It takes electricity to to supply signals and power to the additional core via extra pinouts, not to mention the total # of transistors in two dual-cores will be more than a native quad core.

"In AMD's case to maintain the same power envelop (let's use the 68Watt one) they are planning 1.9-2.0GHz K8L vs 2.8/3.0GHz Opterons (not sure what top speed grade is for 68Watt)."

Opteron 2016HE (68W) is 2.4GHz, whereas Barcelona 68W is 2.0GHz. OTOH, Xeon 51xx LV (40W) is 2.33GHz, with Clovertown (Xeon 5310LV) 50W at only 1.6Ghz.

Ff your datacenter previously used Woodcrest, it may have thermal problem upgrading to Clovertown even after dropping almost 1/3 operating cycles per core. Even for perfectly scaled apps, you gain only 36% speedup. (Did I mention thermal problem?)

With Opteron, you sacrifice only 1/6 clockrate and have no thermal problem to upgrade from dual to quad cores. That is up to 66% speedup with the same core.

Don't forget quad-core Xeon servers require even beefier northbridge to work efficiently. I see no rosy way for Intel server solutions. No wonder AMD's lucrative server market is still growing faster than Intel's.

In terms of power consumption, we see the same pattern that AMD is able to fight Intel with inferior lithography but better architecture (and much less both marketing and R&D, makes you wonder how much Intel had squandered).; 8:41 PM, December 07, 2006
Anonymous said...: "Yawn. You would be wrong, Yonah scores 630 on Cinebench, Kentsfield scores 1337."

Do they use the same memory? Same FSB/chipset? Same frequency? Same amount of cache? Comparing Clovertown to Yonah directly is not valid because one is server setup and the other is mobile (FYI, I didn't say Yonah, I said "Core Duo based").

Did you miss this link for a relatively fairer comparison. Tell me again what's the speedup?; 8:48 PM, December 07, 2006
Anonymous said...: "Edward you really don't think this do you? I've always thought you were a little sharper and new a little more about microarchitectures than Sharikou and would not resort to such fanboy statements..."

True, Yonah has improvements over Dothan. The shared L2 cache mandates some differences such as pipeline length and memory access. SSE is improved, cache is larger, and even a new socket is used. But all those differences translate to somewhat around 10% extra performance, which could be almost entirely attributed to the larger shared L2 cache size.; 9:03 PM, December 07, 2006
Anonymous said...: Anonymous said...

"...In other words, a quad core rev. H Barcelona is 3.4 times more powerful than a dual core rev.F2 Opteron at the SAME TDP..."

Where are you getting this 3.4 times?

The AMD presentation stated that the 70% and the 40% performance improvements are going from dual core to quad core in the same thermal envelope, you do not get to add it twice.

1.7 * 2220SE 2.8GHz at 120W = 2.5GHz at 120W quad core's projected/simulated performance in database applications.

1.4 * 2220SE 2.8GHz at 120W = 2.5GHz at 120W quad core's projected/simulated performance in SPECfp.

And in regard to dual core, its probably half and then adjust for increased clock speeds.; 9:15 PM, December 07, 2006
Anonymous said...: 1.9Ghz for the low power K8L? I didn't realize they were converting the Geode into a quad core (kidding).

Kidding aside it looks like one will need to carefully select the right K8l processor to the applications planned as I doubt mmultithreaded software will be "pervasive" (you know like 64 bit has become pervasive, but I digress) by the time it is out.

While this is also true with the Intel quads the core frequency doesn't seem to be as different between dual/quad core. Maybe as the AMD 65nm transistor performance improves they will be able to bump the speed while keeping power under control.; 11:06 PM, December 07, 2006
S said...: There is only one way Intel will BK. If AMD buys TSMC. Long shot ? I guess so !; 4:18 AM, December 08, 2006
Anonymous said...: enumae dijo...

Here is a review comparing the Core 2 Duo to a Pentium M.

The performance is pretty substantial, more like 28% in Cinebench.

12:56 PM, December 07, 2006

SS3 and SS2 instructions makes wonders, dont them?

I think the real comparation should be with Pentium M vs Yonah (Core Duo) then Yonah vs MErom ( Core 2 duo ); 11:44 AM, December 08, 2006
Anonymous said...: Anónimo dijo...

Thus w/ Cinebench, a 1-socket Clovertown would be lucky if it's 66% faster than a 1-socket Core Duo based server. The 140% you mentioned only exists in Intel fan's imaginary fairyland created by Intel's marketing force.
Yawn. You would be wrong, Yonah scores 630 on Cinebench, Kentsfield scores 1337.

Besides, I was referring to your comment:
No, I was not, but you are. The point here is performance per watt. Merom does not have 70% increased performance per watt over P-M, not even close
Which it clearly does since Merom is more than double the computing power of Dothan at a minor increase in electrical power.

12:45 PM, December 07, 2006

where they using 32 bit cinebench on the kentsfield/clovertown ?
or 64 bits?; 11:45 AM, December 08, 2006
Anonymous said...: Performance Per Watt ETC.
Here are the figures:
Opteron Model 270 84.0Gflops/KW
Barcelona 337.0Gflops/KW
337.0/84.0=4.0, a 300% increase near enough!
What is this 40% nonsense?
Both processors at 95W.
See here: http://www.rubyworks.net/forumz/viewtopic.php?t=340&start=0; 3:21 PM, December 08, 2006
Anonymous said...: "But all those differences translate to somewhat around 10% extra performance, which could be almost entirely attributed to the larger shared L2 cache size."

Edward this is crap and even YOU know it (ALMOST ENTIRELY ATTRIBUTED TO LARGER SHARED CACHE SIZE?) - just look at the benchmarks which looked at underclocked 6800 (4MB cache) compared to a 2MB model (I think this was maybe Anand?) The delta if I recall correctly was ~3%...

It's just funny how you started drinking the coolaid too.

Yeah just a Pentium 3 with shared cache - this coming from someone who claims to know about computer architectures... Clearly you know all this is not the case and are just acting like a fanboy. To say almost all of the improvement is from larger cache, I guess those 2MB Conroe's(6300, 6400) match Yonah performance clock for clock then?

You, sir, are quickly becoming as big a joke as Sharikou!; 4:25 PM, December 08, 2006
Anonymous said...: Do they use the same memory? Same FSB/chipset? Same frequency? Same amount of cache? Comparing Clovertown to Yonah directly is not valid because one is server setup and the other is mobile (FYI, I didn't say Yonah, I said "Core Duo based").
You said Pentium-M, which of course refers to Dothan or Banias.

Did you miss this link for a relatively fairer comparison. Tell me again what's the speedup?
Why do you insist on using dual-core vs dual-core comparisons for Intel while using a dual-core vs quad-core comparison for AMD? The point is a 70% increase in performance/watt is nothing special at all, considering Merom achieved much more than that.; 4:53 PM, December 08, 2006
Anonymous said...: Nop, Kentsfield TDP is higher because its quad-core is non-native. It takes electricity to to supply signals and power to the additional core via extra pinouts, not to mention the total # of transistors in two dual-cores will be more than a native quad core.
It has the same TDP as Barcelona.

if your datacenter previously used Woodcrest, it may have thermal problem upgrading to Clovertown even after dropping almost 1/3 operating cycles per core. Even for perfectly scaled apps, you gain only 36% speedup. (Did I mention thermal problem?)

With Opteron, you sacrifice only 1/6 clockrate and have no thermal problem to upgrade from dual to quad cores. That is up to 66% speedup with the same core.
If you have problems with Clovertown, you will have problems with Barcelona since they will have the same power consumption. And where is this 66% number coming from? If you're going to use the Cinebench numbers, then it shows Clovertowns scale exactly the same as 4S dual-core Opterons.; 5:00 PM, December 08, 2006
Anonymous said...: "Performance Per Watt ETC.
Here are the figures:
Opteron Model 270 84.0Gflops/KW
Barcelona 337.0Gflops/KW
337.0/84.0=4.0, a 300% increase near enough!
What is this 40% nonsense?
Both processors at 95W.
See here: http://www.rubyworks.net/forumz/viewtopic.php?t=340"

Link to MadModMike's; I don't know about you all but that's were I go for unbiased, objective data - when of course I'm not visiting this site! Might as well just start quoting AMDZone...

And the Barcelona #'s there came from... task manager? (that's all I've seen demo'd thus far); 5:43 PM, December 08, 2006
Anonymous said...: "337.0/84.0=4.0, a 300% increase near enough!"

Clovertown has exactly the same FLOPS rating as K8L, 4 single percision SSE SIMD instructions per cycle.

2GHz Clovertown consumes around 80W of power at normal settings or when scaling to AMD power usage, (1.2*Intel=1*AMD) they are about equal in GFLOPs per kW. But if you take the 2.33GHz 80W Clovertown then you have around 1000/(80*1.2)*2.33*4*4=~392GFLOPS per kW, or in other words ~17% more bang per kW. Those Clovertons are availiable now, not in >4 months.

Also current Clovertown run at ~6.5% higher clock speed than fastest K8L that will be on the market in around 4-7 months. What do you think could be the clock speed of Clovertown by the time it arrives? Also keep in mind that Q1 07 there will be 50W Clovertowns on market.; 3:25 AM, December 09, 2006
Anonymous said...: Another old story sais that in Q3 07 Intel will have 3.46-3.74GHz 45nm quadcores. I wonder what will AMD have against that. Yet another upgrade to that rather expensive and powerhungry 4x4?; 3:32 AM, December 09, 2006
Anonymous said...: "Why do you insist on using dual-core vs dual-core comparisons for Intel while using a dual-core vs quad-core comparison for AMD?"

It's because I had the engineering training that tell me how to approach the problem (and you probably don't).

When there is no valid Core 2 Quad to Core Duo comparison, you divide the problem into two orthogonal ones: first C2D to Core Duo, then C2Q to C2D. The first as we know is ~1.1x. The second as we also know is ~1.5x. The combined best is thus 1.65x, or 65%. (This is sheer performance; performance-per-watt will be worse, since C2Q has ~20% higher TDP than C2D.)

"You said Pentium-M, which of course refers to Dothan or Banias."

You are factually wrong. So Intel fans not only spread FUD against AMD, but also others now? Search for "1-socket Core Duo" on this page and be a proven FUD spreader.; 10:14 AM, December 09, 2006
Anonymous said...: "To say almost all of the improvement is from larger cache, I guess those 2MB Conroe's(6300, 6400) match Yonah performance clock for clock then?"

What's the matter with you? I wasn't comparing Conroe with Yonah. I was comparing a single Yonah core with a P-M core. Please do your reading before you comment.

Have you benched them on single-threaded applications at all? Just FYI, the difference is well under 10%, more like 5%. I should've been more specific about the core (Banias has 1MB L2, but Dothan has 2MB), but the point here is the core improvement from Dothan to Yonah is insignificant (unless you call 5% significant). The main difference between these two microarchitecture is that the latter has a shared L2 cache.

This is understandable since from Banias to Conroe you only have somehwere like 25% difference (per core) anyway. How much do you expect from Dothan to Yonah?; 12:13 PM, December 09, 2006
Anonymous said...: "Link to MadModMike's; I don't know about you all but that's were I go for unbiased, objective data - when of course I'm not visiting this site! Might as well just start quoting AMDZone..."

You sound like like a jilted whore-get some therapy, no wait, get a LOT of therapy!; 1:23 PM, December 09, 2006
Anonymous said...: "But if you take the 2.33GHz 80W Clovertown then you have around 1000/(80*1.2)*2.33*4*4=~392GFLOPS per kW, or in other words ~17% more bang per kW."

Typical Intel Fanboy comparison-apples to oranges.
Lets look at a 2.3GHZ Barcelona at 95W,yes that's right,95W.
{1000/95}*2.3*4*4=387Gflops/KW near enough.I make that a 1.29% advantage, NOT 17%! Remember that Intel has had time to develop its 65nm process so these figures will only improve for AMD once their 65nm process matures.

How about a 2.0GHz Barcelona HE at68W. {1000/68}2*4*4=470Gflops/KW near enough.; 2:36 PM, December 09, 2006
Anonymous said...: It's because I had the engineering training that tell me how to approach the problem (and you probably don't).

When there is no valid Core 2 Quad to Core Duo comparison, you divide the problem into two orthogonal ones: first C2D to Core Duo, then C2Q to C2D. The first as we know is ~1.1x. The second as we also know is ~1.5x. The combined best is thus 1.65x, or 65%. (This is sheer performance; performance-per-watt will be worse, since C2Q has ~20% higher TDP than C2D.)
Your approach is worthless, plus your Cinebench numbers are wrong. The QX6700 scales 67% from a E6700. Plus what's the point of using a 3D rendering to compare with an AMD figure that was done using a database test. There is already a HP 2S Clovertown system that scores 70% more than the fastest 2S Opteron system in the database test, TPC-C.

"You said Pentium-M, which of course refers to Dothan or Banias."

You are factually wrong. So Intel fans not only spread FUD against AMD, but also others now? Search for "1-socket Core Duo" on this page and be a proven FUD spreader.

This is what you said:

No, I was not, but you are. The point here is performance per watt. Merom does not have 70% increased performance per watt over P-M, not even close.

Merom is over 100% vs Pentium-M.; 9:36 PM, December 09, 2006
Scientia from AMDZone said...: As far as Yonah versus Dothan goes there wasn't much change in speed. Most of the improvement was for SSE. There wasn't much change in integer peformance. The extra cache does help but Yonah's cache is also 40% slower than Dothan's which somewhat balances that.; 8:26 AM, December 10, 2006
Anonymous said...: Edward this is what he's talking about...

Second, Core2 to P-M is something like 10% improvement on Cinebench. And FYI, Core Duo is really two Pentium-M with a shared L2 cache.

He's not spreading FUD, he's talking about your 10% claim.; 11:12 AM, December 10, 2006
Anonymous said...: Comparisons, comparisons... A right comparison must consider two systems as blackbox A and blackbox B. If blackbox A make the same work of B but is faster, cheaper and drain less AC power than blackbox B, then indoubtfully blackbox A is better than B. The rest is a blah blah nonsense. By now, since almost a year blackbox A are Intel solutions, blackbox B, ehm... sorry AMDull solutions. However I'm patient, I'm an AMD owner awaiting the new K8L before to decide about my new PCs. AMD should reconsider the price of the "top" solutions, a price actually too high, far too high than the competition products.; 11:44 AM, December 12, 2006
Anonymous said...: "He's not spreading FUD, he's talking about your 10% claim."

If Intel fanboys like you have had the ability to read, you would've noticed that there is a link to back up the 10%, which is actually not my claim.

The P-M was a typo. I meant to say C2D being 10% faster than C1D on Cinebench (again, not my claim, follow the link yourself), then C1D is not much more than 2 Dothans sharing an L2 cache (plus better SSE).

Anyone can cut my words from one place and paste it on another context. By doing so he is spreading FUD.; 4:07 AM, December 13, 2006
Anonymous said...: "Your approach is worthless, plus your Cinebench numbers are wrong. The QX6700 scales 67% from a E6700."

You really don't understand, do you? QX6700 is just a E6700 SMP with 25% faster FSB. There's no magic there. If a software scales well with SMP, it will do with QX6700, and vice versa.

Plus, dual Clovertown is 31% over dual Woodcrest on Cinebench. I quote, not claim.

"Plus what's the point of using a 3D rendering to compare with an AMD figure that was done using a database test."

The point is to use Cinebench as an example to show how much # of cores can affect scalability (i.e., greatly diminished return with higher # of cores).

Database apps is a very broad term. So is 3D rendering.

"There is already a HP 2S Clovertown system that scores 70% more than the fastest 2S Opteron system in the database test, TPC-C."

TPC-C has very good scalability not just for SMP, but even for clusters. It's also more I/O than CPU bound (average response time usually under 1 sec.) A 70% increase of TPC-C with double # of cores is NOTHING. IBM p5 595/570 has ~100% increase of TPC-C from 16 to 32 and 32 to 64 cores. Even 4-socket Opteron has ~100% TPC-C increase over 2-socket Opteron. The +70% of Clovertown over Woodcrest is just sad.... (read: memory bottlenecked); 5:07 AM, December 13, 2006
Anonymous said...: Edward said...

"If Intel fanboys like you..."

Edward, I am trying to be polite, but with you it seems useless.

Here is a question...

Do I say negative things about AMD?

---------------------------------

I was not attacking you, I had interpreted his perspective pertaining to your debate... and you know what happened after that, you start labeling people... maturity at its finest.

So, was this FUD?; 10:51 PM, December 13, 2006
Anonymous said...: You really don't understand, do you? QX6700 is just a E6700 SMP with 25% faster FSB. There's no magic there. If a software scales well with SMP, it will do with QX6700, and vice versa.
The QX6700 has the same FSB. It scales 67% compared to a E6700, the same percentage shown by going from 2 K8 cores to 4 cores of the same clock frequency.

Even 4-socket Opteron has ~100% TPC-C increase over 2-socket Opteron. The +70% of Clovertown over Woodcrest is just sad.... (read: memory bottlenecked)
But AMD's estimate shows that Barcelona will only provide the same 70% gain, so Intel already matches AMD's performance today.; 7:03 AM, December 14, 2006
Anonymous said...: The P-M was a typo. I meant to say C2D being 10% faster than C1D on Cinebench (again, not my claim, follow the link yourself), then C1D is not much more than 2 Dothans sharing an L2 cache (plus better SSE).

Anyone can cut my words from one place and paste it on another context. By doing so he is spreading FUD.

You're the one spreading fud by trying to use the projected numbers for a quad-core/65nm product for AMD versus two dual-core/65nm products for Intel.

No, I was not, but you are. The point here is performance per watt. Merom does not have 70% increased performance per watt over P-M, not even close. Even talking about absolute performance, 70% speedup on database apps from 1 to 2 cores is already impressive, let alone from 2 to 4 cores (remember AnandTech's "negative scaling" article?); 7:13 AM, December 14, 2006
Anonymous said...: "But AMD's estimate shows that Barcelona will only provide the same 70% gain"

So AMD's estimate was not about TPC-C (there are more database applications than just TPC-C).; 10:41 AM, December 14, 2006
Anonymous said...: "Edward, I am trying to be polite, but with you it seems useless."

Or could you try to read first? Because I'm sorry but you really didn't read. Right on the sentence you quoted me, there is a link to the original claim. No the comments that you couldn't read was probably not polite, but it was just fair.

OTOH, had you been challenging me writing "P-M" while the link compared C1D, I would have corrected myself (as I already previously described their differences). But you didn't or couldn't do that.

By no reading/thinking thoroughly, you are just wasting all of our time.; 10:47 AM, December 14, 2006
Anonymous said...: Edward said...

"But you didn't or couldn't do that."

I try not to correct your mistakes, or even post anything relative to your comments, why?

Because this is what happens every time.

You are quick to be offended and then the personal attacks start.

Look up the page at my post showing a link to a C2D vs PM.

I had clearly seen your post was wrong, and showed a link and stated the difference was more like 28%.

I am here for debate not your petty personal attacks.

"By no reading/thinking thoroughly, you are just wasting all of our time."

Then that makes two of us, almost every other post I make I have to explain something to you, and almost every post you make is saying someone is illogical or can not read.

So...

Who is really wasting peoples time?; 4:41 PM, December 14, 2006
Anonymous said...: So AMD's estimate was not about TPC-C (there are more database applications than just TPC-C).
It looks to me that AMD's estimated performance for OLTP, based on the scores and the systems used are only from TPC-C.; 5:49 PM, December 14, 2006
Anonymous said...: "I am here for debate not your petty personal attacks."

I will gladly attack your inability to read properly until you do.; 1:25 AM, December 15, 2006
Anonymous said...: "Look up the page at my post showing a link to a C2D vs PM."

I did. See below.

"I had clearly seen your post was wrong, and showed a link and stated the difference was more like 28%."

Again, you didn't read properly, not only my posts, but also the article you linked yourself.

First, I had many posts, each has its own context. You can't take my statements out of their context and mix and match them arbitrarily. On the link I made to the AnandTech test, I wrote P-M; but if you actually read my statement in its context and followed the article, you would've known that it's a typo (as I've previously explained).

Second, where did you get the 28% from your link? For Cinebench it showed ~45% faster on the Dell notebook than the IBM one - the former has 25% faster FSB (rendering is highly memory intensive) and better graphics card. How much of the 45% is CPU contribution, we never know. For statements regarding CPU performance, search for "10%" in the article. Oops... the article actually has my statement in it!

Third, my statement regarding C2D to P-M is performance per watt. P-M's battery life is superb; on 6-cell it can last almost 5 hours. OTOH, the Merom notebook lasts only 4 hours on 9 cells. Lets say the Dell C2D notebook performs 100% faster than the IBM P-M one (as claimed in the article); factor in the battery life the performance per watt improvement is <60%.

Of course you may say such direct comparison is unfair, because they are using different memory, chipset, and graphics. Then so are the single-core to multi-core performance comparisons.

Again let me re-iterate: when I said you didn't read properly, no I wasn't being (nor trying to be) polite. Nor was I making (nor trying to make) personal attacks. I was just stating the fact.; 2:09 AM, December 15, 2006
Anonymous said...: Edward, I have just finished a 2 page response to your last post, but I trully feel it will be wasted if I were to post it.

I will say this, you claim the Anandtech article is about performance per watt, though it says nothing on in about performance per watt, it does however say...

"Performance under Cinebench mimics what we saw under 3dsmax, with performance going up by around 10% compared to Core Duo."

Now, I think your just trying to make your self sound right so I will let you, this is in fact a waste of time.

PS: In regards to learning how to read, please check your 45% claim as you are comparing Dell to Dell... Single core Dell is 325, single core IBM id 222. It seems you compared the single core Dell (325) to the dual core Dell (592).

325 * 100 / 592 = 54.89, or about 45%; 6:46 AM, December 15, 2006

Journal of Pervasive 64 bit Computing
Main Blog Page

About Me

Previous Posts

Sunday, December 03, 2006

Charlie at INQ not using his brain

120 Comments:

Journal of Pervasive 64 bit Computing Main Blog Page

About Me

Previous Posts

Sunday, December 03, 2006

Charlie at INQ not using his brain

120 Comments:

Journal of Pervasive 64 bit Computing
Main Blog Page