Journal of Pervasive 64 bit Computing
Main Blog Page

Analysis on IT trends and competitive strategies, with emphasis on micro processors, computer systems and networks. Based on latest news, backed up with real data, this site intends to provide a true and realtime picture of the fast changing IT landscape. This journal strives to be accurate on facts and sharp on criticisms. You may email your opinion to sharikou@yahoo.com or post comments here, be cool and intelligent.

Name: Sharikou, Ph. D.

Freelance journalist on IT matters. Some of my writings have been published on online IT journals. Any original content on this journal is Copyrighted, but it's free for non-commercial use. Any Trademarks used on this site belong to their respective owners. Some of the pictures are links. If there is any issue with the content of this site, please email sharikou@yahoo.com .

View my complete profile

Sunday, May 27, 2007

Real world experience with Intel CPUs

A reader posted the following based on his many years of experience:

I live in the “Real World” not the “Benchmark World.” The problem I have with the Intel CPU is that it is real fast until it reaches “Critical Mass” where it suddenly stops. (Five minutes to open Outlook, I couldn’t connect with Terminal services either.) AMD, however, slows proportionally to the load, and when dealing with an Outlook memory leak putting the CPU at 100% usage recently, I was still able to remote in and fix the problem.

I fully agree. I had used Intel servers before, in a lot of situations, the server got overloaded, reaching 100% CPU utilisation, when that happened, it was impossible to login remotely. I had to call the datacenter and had the locked up Intel server rebooted. I never had similar problems with Opterons.

In one of Bob Colwell's stanford lectures, he admitted that the Intel CPU design uses an instruction scheduling algorithm that often enters a kind of loop which it takes a long long time to get out. When that happens, the Intel CPU would appear locked up.

Primitive stuff.

124 Comments:

DaSickNinja said...: Words, words, words, and not an pound of truth to back it up. Jesus, just replace "Intel" with "AMD" and "Xeon" with "Opteron" and the story sounds like something I've heard on some marketing site. C'mon, Ph(ony)D, you can do better than that. You at least tried to pass off the superiority of the 6000+ to the E6600 with links (even though they we're Newegg comments... LOL); 9:23 AM, May 27, 2007
Altamir Gomes said...: ninja

Intel Xeons are inferior to AMD Opterons.; 9:39 AM, May 27, 2007
Altamir Gomes said...: Today's Xeons cannot endure Opteron's performance. Thus we already know that Clovertown is doomed. The only pertinent question is by how much Barcelona will surpass its ol' bro DC Opteron.; 9:43 AM, May 27, 2007
Lone Eagle said...: Mmm...

I have similar experience when my server is overloaded with VMWare. Actually, I am witching back to AMD's server soon.; 9:51 AM, May 27, 2007
Ho Ho said...: Jyst as I thought there is no way this blog can get any lower than it already has our beloved host prove me wrong once again.

If I'd post some thought-out blantantly false stories about X2900 or K8, would you make a whole new topic about that too and call them "stories from a guy with many years of experience"?; 9:58 AM, May 27, 2007
Evil_Merlin said...: [i]Today's Xeons cannot endure Opteron's performance [/i]

You are smoking some hard drugs there bud.

I have hundreds of servers, a good mix of DL580, 585, 380 and 385's.

Since Woodcrest the poor Opteron servers have been relegated to file servers since they cannot compete against the Woodcrest servers in either Exchange, SQL or Oracle.

Under VMWare the 580's are far superior than the 585's running VMWare 3.x. I can run more virtual instances, and they run better overall than they do on the 585's.; 12:19 PM, May 27, 2007
Unknown said...: I too have experienced the C2D laptops that run like cat cats when they decide too, where as the good old turions keep jogging along, its a most annoying feature that is hard to explain other than cache thrash.
The user experience is far better with the turions x2 for this reason they sell better not to mention that they are priced the same points as Intel celerons here in NZ, maybe intels dumping ground?; 1:03 PM, May 27, 2007
Altamir Gomes said...: evil merlin

Do you run systems with count higher than 2S per box or clusters for scientific research? I guess no.; 1:22 PM, May 27, 2007
Unknown said...: This is a very valid point. Benchmarks are usually performed under best case scenarios. Redo the same benchmarks under medium to worse case scenarios and you get a completely different picture. I see this clearly from the network side of the world where Cisco is notorious for boasting outrageous specifications to make themselve look better than the competition but they don't mention the tests were performed under best case conditions with no services running and the wind to its back. In real world tests Juniper is superior. I see Intel and AMD in the same light. Intel has lost the technological leadership and has turned into a marketing gorilla like Cisco pitching inferior products.; 1:24 PM, May 27, 2007
Evil_Merlin said...: Read my fucking post symbiansn you moron.

How fucking simply does it have to be?

Do you want me to type it for you again in ALL CAPS? Maybe that would help.

AMD Opterons are inferior to Intel's Xeons.; 1:27 PM, May 27, 2007
Altamir Gomes said...: Inteler.

I did read your post. I then asked if you have 4S or even 8S systems, and also if you dedicate some to scientific research.; 3:02 PM, May 27, 2007
Evil_Merlin said...: Actually Inteller and AMDer which is more than I can say for you.

Read my orignal post AGAIN, as it seems to be getting misplaced in that thing you call a brain.; 4:01 PM, May 27, 2007
Anonymous said...: This comment has been removed by the author.; 6:51 PM, May 27, 2007
Anonymous said...: Do you run systems with count higher than 2S per box or clusters for scientific research? I guess no.

Are all AMD fanpois stupid as you? Read his post you moron.; 6:52 PM, May 27, 2007
Randy Allen said...: I did read your post. I then asked if you have 4S or even 8S systems, and also if you dedicate some to scientific research.

Do you have anything higher than that? Maybe 16P or 32P servers? Oops. Opteron can't even scale past 8P. Pathetic. It's no wonder that AMD's market share has been crashing lately. That is in all three segments:- servers, desktop CPUs and mobile CPUs. Intel has a commanding lead in all three. Everyone knows that Woodcrest and Clovertown offer far superior performance.

Clovertown frags Opteron by 79%.

http://tweakers.net/reviews/661/4

It's as Pat Gelsinger said:- "Barcelona is too little too late."

AMD BK Q2'08.; 7:15 PM, May 27, 2007
Altamir Gomes said...: 4Intelers.

evil_merlin keeps evading my question. Put simply, Opterons above 2S beat Xeons hands down overall.

Xeons can run a few select databases given that their caches don't get thrashed out.

randy

http://www.realworldtech.com/page.cfm?ArticleID=RWT120104202353&p=3; 8:08 PM, May 27, 2007
Lone Eagle said...: Do those Intel supporters always behave like this ? Maybe you guys deserve Intel servers, enjoy them.

My site holds Oracle DBs for data warehouse systems on VMWare. Intel box runs fine when the load is light. However, when the load is high, Intel box does not response smoothly. However, AMD box does fine under load. I have enough and shall switch back to AMD box in a couple of months.

That is purely for experience sharing and that is the fact. You guys don't be silly enough to argue with the fact.; 8:12 PM, May 27, 2007
Evil_Merlin said...: MORON, I answered your question before you even bothered to ASK it.

Mayhap befor shooting your fanboi mouth off you could do a little learning.

My god, you make even Penix look intelligent.

About all the AMD fanbois can do is lie.

For the "dba" talking about Oracle, mayhap you should hit Oracle's site and look at the whitepaper where Woodcrest totally frags (to steal a word from Ph(ake)d) Opteron.

AMD fanboi's... living in a world of denial.; 8:52 PM, May 27, 2007
Unknown said...: My, My Mr evil merlin has a chip on his shoulder,
We run HP Opteron blades racks with VMware after extensive testing with HP Intel blades the choice was a no contest!.
Database, Exchange, GIS the Intels bogged down the opterons keep on trucking.; 12:17 AM, May 28, 2007
Lone Eagle said...: to Evil Merlin

I have 2 servers with exactly same configurations and same image for operations. That is the moment of truth.

In data center, only performance counts. Let's face it.; 1:31 AM, May 28, 2007
Ycon said...: You have never used Xeon or Opteron servers.

Fact is that the old P4 was WAY superior in real world than your beloved Athlons.; 3:57 AM, May 28, 2007
Altamir Gomes said...: "MORON, I answered your question before you even bothered to ASK it."

No, you don't. You said to have such systems but I asked if you actually run them. Why bother running inferior Opterons if Intel is all that better so?

you said something about fileservers... is it all that bad?; 4:07 AM, May 28, 2007
Randy Allen said...: Geforce 8800 Ultra review:

http://www.reghardware.co.uk/2007/05/28/review_nvidia_geforce_8800_ultra/

It tends to keep up nicely with two of AMD's fastest GPUs. R600 is totally fragged. I guess they didn't include SLI results because they didn't want to embarrass AMD too much.

We also know that 8 Clovertown cores frag 16 Barcelona cores by 1000 points (20% gain) http://www.theinquirer.net/default.aspx?article=39896

If there were only 8 Barcelona cores the Intel system would have scored double the AMD system. Clearly, Clovertown is twice as fast as Barcelona.

"Barcelona is too little too late."

AMD BK Q2'08.; 9:36 AM, May 28, 2007
Hornet331 said...: No, you don't. You said to have such systems but I asked if you actually run them. Why bother running inferior Opterons if Intel is all that better so?

eh.... how "stupid" is this comment... do you expect people buy hardware worth thousand of dollars and than not put them to use?...; 4:51 PM, May 28, 2007
lex said...: Primitive, I tell you something even more primitive about AMD "Ph"ony!

AMD CEOs have an algorithm that often enters a kind of loop which it takes a long long time to get out. Its called LOSING money, hundreds of millions a quarter.

Not that is loop that really sucks.. .LOL; 7:10 PM, May 28, 2007
tech4life said...: I thought it interesting that when several posters shared their real world experience with Opterons none of the Intelers could refute them. All they could do was point to benchmarks and claim "fragging". And then Randy Allen tried to change the subject...lol. Why do Intelers refuse to hear the voice of reason?; 5:50 AM, May 29, 2007
Unknown said...: tech4life, personal anecdotes have to be proven before you can expect them to be refuted. The whole reason for criticism is the fact these claims are all unproven and could just as easily be made with Intel changed to AMD and vice-versa.; 6:22 AM, May 29, 2007
pointer said...: tech4life said...

I thought it interesting that when several posters shared their real world experience with Opterons none of the Intelers could refute them. All they could do was point to benchmarks and claim "fragging". And then Randy Allen tried to change the subject...lol. Why do Intelers refuse to hear the voice of reason?

the so called real world experience actually coming from the thread below (sharikou's previous post)

https://www.blogger.com/comment.g?blogID=18375538&postID=5065116618244687902

this guy would change your whole system if it has an audio problem. and surprisingly, people like you all echoing his 'comment' as if it is real.

need real world example? :)

http://www.techspot.com/vb/all/windows/t-39408-Quake4-locks-AMD-system.html
excerpt:
Amd 3000 64bit 939
gigabyte Ga-k8ns ultra
2 gigs of ddr400 running at ddr335
300gig maxtor diamond 10 sata
geforce 6800gt OC
Audigy2 zs
420watt theraltake purepower

To start I would get a game lock up with a sound loop, only way to get out is rest Pc.

https://bugs.launchpad.net/debian/+source/arts/+bug/11921

per the theory presented by those ppl, the AMD system is so faulty and need to be replaced with an Intel system. Don't shoot me on this, i'm just using their 'smarter' logic to make this statement :); 6:53 AM, May 29, 2007
R said...: There is compelling evidence that Intel cpu’s are optimized for benchmarks and perform mightily on normal loads, which is the largest percentage of the servers time. I thought it was common knowledge the Operton was superior under load; this shouldn’t even be a debatable subject.

I’m not the expert that some of you are, but I work in a different server room every day programming lowly access control systems and have witness the same thing as the real world example, many times.

My example is not conclusive because most of the server rooms I see are mostly Intel based with only a few AMD servers sprinkled in, however when the IT department is pulling their hair out at a peak hour I hear the conversations and its always the same and it sounds like &*^%( ..dame.. &^%$@ …mother… f*&^$%)…..Intel ..$%#@ how come that never happens to the AMD.; 7:46 AM, May 29, 2007
R said...: This is going to sound unbelievable, but ironically I just received a call from and IT manager stating a server had to be rebooted and the buffer info I needed was lost. The server was overwhelmed with after holiday traffic and locked up. We’ll never really know if that would of happen if it was an AMD server instead of Intel. You can’t actually blame it on the Intel server either, but it is kinda ironic.; 9:14 AM, May 29, 2007
Ho Ho said...: r
"There is compelling evidence that Intel cpu’s are optimized for benchmarks"

There is? Where can I see it?

Linking to abinstein won't do you any good since he doesn't know what he talks about. When I fixed the numerous mistakes he made he simply didn't put them up because he didn't like the way I worded things.

Another reason he gave me about not publishing the post was that blogger messed up one of my link. One messed up link was enough reason not to put up hundreds of words long comment full of descriptions of the mistakes he had made.

So, scientia was first and now abinstein seems to be next who hasseemingly banned me. I doubt Sharikou would do it as he has almost stopped posting comments himself and generally seems not to care much about what people talk here.

Is truth really so painful for those other two that they are reaching for straws not to post my comments? From the looks of it, yes, it is.

"You can’t actually blame it on the Intel server either, but it is kinda ironic"

I'd blame the people who are responsible for the hardware and software. Someone hasn't done the calculations correctly when your servers bottom out like that. My memory might be wrong about this but weren't you somehow connected to those tasks?; 9:27 AM, May 29, 2007
R said...: To: ho ho
“ My memory might be wrong about this but weren't you somehow connected to those tasks?”

You are correct; I have cause a number of problems all by my self. Too many to list.

“There is? Where can I see it?”

I was referring to Scientia’s post a few months back. The post was mostly referring to compilers tweaked to favor Intel if I remember correctly. I’ll have to go back and read the rebuttals. I used the word “compelling” evidence not “conclusive” evidence. It was a good read.

Be Well; 10:39 AM, May 29, 2007
Ho Ho said...: r
"The post was mostly referring to compilers tweaked to favor Intel if I remember correctly"

It is common knowledge that without hacks ICC compiles optimized code basically only for Intel CPUs, that's no news to me. I'm quite sure othercompilers do not do it.

Scientia seems to be a big fan of PGC but I think it is kind of debatable if results from that CPU can be really compared as different CPUs can be running different codepaths and we can't make sure what runs what.; 10:57 AM, May 29, 2007
Jerry Gallagher said...: I have big clusters built of both intel and amd. I would say both procs are about the same. In really heavy load, with a lot of number crunching, with high memory requirements. I would choose AMD. If I didn't care about fp performance and had something with a small memory and I/O footprint I would choose Intel. The Xeons Hit a higher peak faster and quickly taper off, the AMD will sustain that peak. Its almost like a horsepower vs torque argument. With all the Intel fan boyism lately I have found it hard to keep intels out of the datacenter. I do have a problem with power and heat and even though Intel claims to be winning this race, they are so far behind its not funny.; 11:36 AM, May 29, 2007
PENIX said...: When the Core 2 made it's debut, I was foolish enough to buy into some of the hype. I built a secondary E6400 test system to verify the performance claims for myself. The benchmarks were less than published results, but still comparable. When used side by side with my, now ancient, AMD 2200+ system, the performance increase was barely noticeable. The actual performance did not match what the benchmarks were saying; not even close.; 1:27 PM, May 29, 2007
enumae said...: Penix

Due to your constant anti-Intel claims I would like you to provide some sort of proof confirming you have an Intel system.

CPU-Z, screen shot of system properties, etc...

I mean no offense, but I can not believe you have an Intel system based upon your biased claims.; 1:38 PM, May 29, 2007
PENIX said...: enumae said...
I would like you to provide some sort of proof confirming you have an Intel system.

screenshot; 1:51 PM, May 29, 2007
DaSickNinja said...: @Penix
... Sure. No increase. Got anything in the way of proof? Any benches you want to publish? Specs of the target systems? Drivers used? If you're going by perception, then that leaves much to chance, especially with the amount of bias you show.; 2:13 PM, May 29, 2007
abinstein said...: Bob Colwell refers to Pentium 4 (Netburst) replay mechanism, not anything in Core 2.; 2:20 PM, May 29, 2007
enumae said...: Thanks Penix; 2:28 PM, May 29, 2007
Ho Ho said...: penix
"When used side by side with my, now ancient, AMD 2200+ system, the performance increase was barely noticeable"

My ancient Northwood 2.8GHz with FX5600 Ultra was also considerably faster than my brothers XP* 2500+@2.3GHz with Radeon 9800se in both synthetic benches, real-world programs and games.

*) I built it for him, it was the fastest performing thing one could get at the time and it costed around half what I payed for my box around 9 months earlier when that was the fastest thing in its price range.

Though I can't prove it so you simply have to take my word on it. I think it is not worth less than Penix'es. Intel CPU has been sold at least twice, it is in high demand. My old FX sits in my sister's computer and most other parts are also sold. My brothers computer is more or less in one piece, only Radeon cooler has given up and is replaced by custom built solution.

Compared to my roommates s754 3000+ my 3.2GHz Prescott was slower. It couldn't get anywhere near P4D 920 and e6300/4300 in multithreaded apps and agaist the last two in any apps. Compiling was more than twice fast even on P4 compared to that AMD. We measured it by compiling glibc and other big packages for Gentoo. AMD couldn't get anywhere near to those Core2's.

I could provide you with screenshots of most of the PC specs but would you believe me? I can download endless amounts of images from the net. Replacing a couple of numbers I could even make a whole new ones.

So, is the little story of small part of my life worth anything to you? Is it worth anything less/more than what penix said?

dasickninja
"Got anything in the way of proof?"

Our beloved host has already written a post about his very detailed and unbiased benchmark he performed with real-world applications in controlled environment: Core 2 Duo is slow as hell

abinstein
"Bob Colwell refers to Pentium 4 (Netburst) replay mechanism, not anything in Core 2."

Penix and probably everyone else can give just as nice benchmarks.

To be percise he is talking about the system in HT capable Northwoods and it only kicks in with HT enabled. Later CPUs have that problem fixed and see quite nice performance increase with HT enabled in most apps.; 3:00 PM, May 29, 2007
PENIX said...: I want to restate the facts, since there is already some obvious confusion.

My Intel C2D E6400 has much higher benchmarks than my AMD XP 2200+.

Make no question about it. My Intel C2D E6400 system beats my AMD XP 2200+ system in all benchmarks, by a huge margin. The question at hand is, with such a huge margin in benchmarks, why doesn't the user experience reflect this? They are so similar in real world performance that a blind user perception test would be as good as flipping a coin.

The C2D dominates every single benchmark by huge margins, but the user experience just doesn't hold up to what the benchmarks say it should. What could be the reasons for this? Perhaps the Israeli engineers pulled a fast one over on their ignorant American bosses by simply optimizing the P3 specifically for benchmarks.; 4:38 PM, May 29, 2007
R said...: I’m only bringing this up because it happened today and it is on topic of real world benchmarks and servers; “STREAM” benchmark on 2nd gen Opteron sets world leading results. Who is Liquid Computing?

http://www.amdzone.com/modules.php?op=modload&name=News&file=article&sid=7816&mode=thread&order=0&thold=0; 4:46 PM, May 29, 2007
R said...: Google "Liquid computing" and you get; a virtualized data center in a box company!

Great marketing anyway.

http://www.liquidcomputing.com/home/home.php; 4:56 PM, May 29, 2007
Unknown said...: Penix, what do you do with your computer? Your "entertainment" will not load any faster with a Conroe.; 5:13 PM, May 29, 2007
Randy Allen said...: More "benchmarks don't matter" posts? Wait. I thought Intel had bribed all those sites to post pro-Intel reviews. The simple fact is that C2D frags all of AMD's pathetic CPUs. I expect AMD to be a full year behind Intel on quad core CPUs. AMD's market share and revenue will continue to plummet while it's losses continue to increase before AMD goes BK in Q2'08.

AMD is also totally fragged in servers:-

See this: http://www.sgi.com/company_info/newsroom/press_releases/2007/march/international_truck.html

Xeon is 3 -> 5x faster than Opteron.

"In our testing of the Altix XE cluster — an identical configuration to International Truck's system — a 3-5X performance gain was achieved over the AMD Opteron system that were benchmarked against, running MSC.Nastran," said Don Coburn, director of HPC solutions for Hoff and Associates.

AMD BK Q2'08.; 11:22 PM, May 29, 2007
Ho Ho said...: In the other news Intel has updated its highest end quadcores. There used to be 150W 3Ghz quads but now there will be 130W ones with added FSB boost.

clicky; 6:40 AM, May 30, 2007
Ho Ho said...: Whoops, that should have been 120W instead of 130, sorry about the typo.; 6:47 AM, May 30, 2007
Unknown said...: Unbiased benchmark..

AMD 4400+ 2.67Gzh beat C2D E6600+ 2.4Gzh in 3Dmark06

http://www.tekbunker.com/index.php?option=com_content&task=view&id=252&Itemid=57&limit=1&limitstart=1; 7:19 AM, May 30, 2007
Ho Ho said...: I wonder how much of that performance difference comes from AMD having its RAM OC'd by around 35% and how much comes from using the better performing DDR1 K8.

Can you find some similar benchmarks with DDR2 K8?; 7:36 AM, May 30, 2007
Evil_Merlin said...: pezal, you musta missed this part of that unbiased review:

the E6600 setup closed the gap by producing a superior cpu score.

and

know all the AMD fans out there wish that they could push their X2’s to the speeds at which Conroe’s are getting on air, and close the gap in performance.; 7:45 AM, May 30, 2007
Hornet331 said...: http://www.hardtecs4u.com/reviews/2007/amd_cpu_roundup2007/

very nice roundup of AM2 90nm vs 65nm Cpus.

for amd fanbois, you better ignore the benchmarks (especial the gaming benchmarks); 9:10 AM, May 30, 2007
abinstein said...: "Wait. I thought Intel had bribed all those sites to post pro-Intel reviews."

Dare to swear on your kids that you do not get financial benefit (directly or indirectly) from Intel? Any of those sites dare to do so?

Only pathetic companies do marketing by purchasing cheap "website labors." Intel is just one of them.; 10:02 AM, May 30, 2007
abinstein said...: "To be percise he is talking about the system in HT capable Northwoods and it only kicks in with HT enabled."

I don't think you understand the replay mechanism in Pentium 4. It is due to aggressive speculative execution, not necessarily hyperthreading. It happens when you decouple the dependency check from an aggressive instruction issue, which happens tens clocks earlier.

I'm not sure how later Pentium 4 "fix" that problem. There are quite a few parameters adjustable, to remedy the situation on common worloads, but nothing "fix" the problem.; 10:10 AM, May 30, 2007
abinstein said...: "In the other news Intel has updated its highest end quadcores."

A quad-core taking 150W is just useless to almost all servers, which mostly have max TDP in 90W/120W per socket.

"There used to be 150W 3Ghz quads but now there will be 130W ones with added FSB boost."

I don't understand... where is the FSB boost? High clocked Clovertown already has 1333MHz FSB.; 10:22 AM, May 30, 2007
Anonymous said...: SWEET!!!!!!!!!!!!!!!!

Toshiba will launch three laptops in its Satellite range in North America based on AMD's Turion 64 X2 dual-core chip and M690 chipset, AMD said on Wednesday. Toshiba, which confirmed plans on Tuesday for AMD machines but offered no details, said Wednesday that the computers will be available in the third quarter.; 12:15 PM, May 30, 2007
netrama said...: The C2D dominate every single benchmark by huge margins, but the user experience just doesn't hold up to what the benchmarks say it should. What could be the reasons for this? Perhaps the Israeli engineers pulled a fast one over on their ignorant American bosses by simply optimizing the P3 specifically for benchmarks

I see some of you already see this. But sadly the button pushers don’t even realize what is described in the above lines. From ground up the C2D is a one pony trick designed exclusively with massive optimizations for the benchmarks. This was the secret Israeli weapon Paul O used to keep talking about during his pathetic pre-C2D launch days. A architecture change will only come in 2008 or it is 2009?? Meanwhile all Intelers dig into your pocket again in an Year..Ohh wait that is what the idiot economy wants you to do right?? Change car every 5 years and change CPU every 5 months; 12:25 PM, May 30, 2007
Ho Ho said...: abinstein
"High clocked Clovertown already has 1333MHz FSB."

So it seems. I somehow managed to mess up QX6800 FSB with the Clovertowns.

What this shows me is that Intel rushed a bit with those 3GHz quads. A few months ago it was said they will be released at normal TDP in H2. For some reason they did it sooner and with higher TDP. Seems as their designs have cought up to those earlier estimates.

netrama
"From ground up the C2D is a one pony trick designed exclusively with massive optimizations for the benchmarks"

Are games also a form of benchmark? Is this the reason why Core2 dominates everything else in that area? What about video encoding and compression? Are you saying that lots of the applications people use daily are simply benchmarks?

Btw, what are those optimizations that only help with benchmarks and don't help in real-world?; 12:41 PM, May 30, 2007
Roborat, Ph.D said...: netrama winges: "From ground up the C2D is a one pony trick designed exclusively with massive optimizations for the benchmarks."

lets pretend its a blue moon and consider what you just said. C2D is indeed optimised for benchmarks. As a result Intel gets all the sales because of the unanimous reviews in its favour. Does it really matter then or is your point just moot?; 12:44 PM, May 30, 2007
Evil_Merlin said...: Intel is in the lead, the benchmarks are false, wrong or set up.

AMD is in the lead, the benchmarks (the same ones usually) are right on the money.

AMD fanbois have performance envy.; 1:07 PM, May 30, 2007
netrama said...: ho ho said ...

Btw, what are those optimizations that only help with benchmarks and don't help in real-world?

I think you need to leave those c++ for dummies book and start reading some real comp arch books :-)); 1:18 PM, May 30, 2007
Ho Ho said...: netrama
"I think you need to leave those c++ for dummies book and start reading some real comp arch books :-))"

Is this really the best ansver you can manage? Kind of pathetic I'd say.

Yes, you can make your program work well on some particular CPU if you work hard enough. Doing the opposite is pretty much impossible as different programs have hugely different workflows.

If you really do need to know then I have used only one book to learn the basics of C++ (thinking in c++), took me around a couple of weeks. The rest came during the next ~4 years of practice and google. In the Uni I was teaching the professors who were supposed to teach us C++ so I think I do know a bit about the language and programming in general, even though I've been working as a programmer for only about 1.5 years.

Btw, what is your software development experience?; 1:56 PM, May 30, 2007
R said...: Intel’ers Fry’s has your special
Compaq sr5030nx (3.2GHz P4 HT, 1GB, 160GB, Vista Basic)
$399.00

http://www.frys.com/product/5187766; 4:12 PM, May 30, 2007
netrama said...: Ho Ho said...
Btw, what is your software development experience?

You have shown us how ignorant you obviously are. Thanks to folks like you that companies like Intel is able to peddle sub-par sh*t. Just go out there and try writing a few compilers and assemble code then you might probably even begin to understand benchmark optimizations.; 4:41 PM, May 30, 2007
Evil_Merlin said...: This coming from some AMD moronboi who thinks the C2D is a one trick pony.

One that just so happens to be smacking the shit outta anything AMD has to offer...; 4:45 PM, May 30, 2007
ElMoIsEviL said...: symbiansn, you obviously haven't a clue what you're talking about. Evil Merlin did answer your question.. he stated he ran a DL580. That's a 4 Way rig... jeez.; 5:52 PM, May 30, 2007
The Burninator said...: Trogdor sez: AMD sold a CPU to Toshiba. Therefore Intel will be BK last week, we just don't know it yet because of huge EVIL Intel conspiracy!; 6:32 PM, May 30, 2007
abinstein said...: Ho Ho -
"Btw, what are those optimizations that only help with benchmarks and don't help in real-world?"

The fact that profiled compilation doesn't help Core 2 very much clearly shows that the processor has parameters specifically optimized for the (SPEC) benchmarks.

Games are generally very susceptible to optimization. All new games optimize themselves for Core 2 Duo.; 10:49 PM, May 30, 2007
abinstein said...: evil_merlin -

You actually have a Woodcrest system that rivals HP DL585? Please do let me know what it is, and on what workload, because I'm pretty sure either you are bullsh*tting or you are a stupid fanboy.

Woodcrest is inferior for high-end servers; it doesn't even scale to 8 cores. Clovertown's performance loses to Opteron on 8 cores, too. Fact it, Core 2 are for entry-level servers. If they don't for you, then you clearly have some fanboism problems...; 11:00 PM, May 30, 2007
Randy Allen said...: AMD's market share will continue to crash and they lose more and more cash. No one wants AMD's antiquated server processors.

Intel frags AMD in every market segment. After Intel's July price cuts AMD's fastest CPU will be competing with a $163 CPU. AMD will be unable to maintain it's ASP and it will post even bigger losses with lower market share and lower ASP.

The same is true in graphics, AMD will continue to lose market share because of the R600 flop. Nvidia enjoys 100% of high end graphics card sales and is raking in the profit.

No one wants AMD's crappy 4P servers with CPUs that cost over $2100 each. Four CPUs cost $8400. A 2P Xeon server offers similar performance, but each quad core CPU is only $1200. $2400 vs. $8400 on CPUs alone. Not to mention the power requirements. 4x 125W or 2x 120W? That's not a hard decision. Intel is clearly the smarter choice.

By the end of the year AMD will have less than 5% in servers remaining.

AMD BK Q2'08.; 11:58 PM, May 30, 2007
Ho Ho said...: netrama
"Just go out there and try writing a few compilers and assemble code then you might probably even begin to understand benchmark optimizations."

I thought we were talking about CPUs, not compilers. Though I think you answered my question by evading it. Too bad, that's what I thought.

abinstein
"The fact that profiled compilation doesn't help Core 2 very much clearly shows that the processor has parameters specifically optimized for the (SPEC) benchmarks."

Seesh, how many times I have to explain this to you until you get it? Have you ever compared the Core2 results on a non-ICC compiler? If yes then what were the results?

"Games are generally very susceptible to optimization."

Yes, they are, there is no doubt in that.

"All new games optimize themselves for Core 2 Duo."

OK, what makes you think that? What about all those older games that were developed before Core2 came out? Say FEAR, HL2 and the like. From what I've seen most games used for benchmarking came out way before Core2 saw the light of day. Doesn't that kind of make your point moot?

Are you claiming that using SSE and small datasets is optimizing for Core2 architecture? Wouldn't it optimize performance on every other CPU also? If yes then how can you say games are only optimized for Core2 and not for the other CPUs?; 1:13 AM, May 31, 2007
Randy Allen said...: Now "benchmarks don't matter" or "C2D is just optimised for benchmarks" so these benchmarks don't matter. But when AMD is faster we should all believe the benchmarks and AMD's performance boasts?

Pathetic. AMD BK Q2'08.; 2:13 AM, May 31, 2007
PENIX said...: abinstein said...
Core 2 are for entry-level servers.

Agreed. I run several servers with high traffic. I don't care about benchmarks. I need real world performance. Until Intel proves themselves as a worthy alternative, I will only use AMD.; 9:09 AM, May 31, 2007
abinstein said...: Ho Ho -
"Have you ever compared the Core2 results on a non-ICC compiler? If yes then what were the results?"

Yes, I have told you this before and you are not getting the point. I am comparing the difference between optimized compilation and profiled compilation.

The fact that you keep on babbling the same irrelevant "explanation" over and over really shows how much you (don't) know.

"abinstein
"All new games optimize themselves for Core 2 Duo."

OK, what makes you think that? What about all those older games that were developed before Core2 came out?"

Can you read? Was I talking about older games? Please read my sentence right above before you letting out any single more syllabus, won't you? Also mind you Core2's optimization manual have been out for more than 1 year.; 9:18 AM, May 31, 2007
PENIX said...: abinstein said in response to ho ho...
The fact that you keep on babbling the same irrelevant "explanation" over and over really shows how much you (don't) know.

Remember the lengthly argument on write cache? ho ho will never admit when he is wrong no matter how obvious it is.; 10:12 AM, May 31, 2007
Unknown said...: What, exactly, is wrong with optimizing games for CPUs? All these games are also optimized to run as fast as possible on an AMD CPU.; 10:13 AM, May 31, 2007
PENIX said...: Giant said...
What, exactly, is wrong with optimizing games for CPUs?

In that simple of a context, nothing. It would be foolish not to optimize any program.

The reverse is also true. There is nothing wrong with optimizing a CPU for an application. The problem occurs when a CPU is optimized specifically for a benchmark.; 10:35 AM, May 31, 2007
Ho Ho said...: abinstein
"Yes, I have told you this before and you are not getting the point."

You did? Where? Perhaps you also chose not to publish them on your blog among with my responses.

"I am comparing the difference between optimized compilation and profiled compilation."

I know, that is the whole point. ICC compiles SPEC benchmarks (almost?) the same way in both normally optimized and by using generated profile data. My theory is that ICC has some mechanisms that detect SPEC benchmarks and enter a special compiling state that generates near-maximum efficiency code for them.

Your "analysis" only used ICC and claimed that Core2 can't be optimized any further. Wouldn't that also mean that Pentium4 is built and optimized for running Spec benchmarks as their scores get about the same speed increase as Core2's when using profiling information? What a shcoker, two absolutely different CPU architectures that both are optimized for the same benchmarks!

penix, would you like to talk more about RHT and finally answer to the huge amount of questions I've asked you?

"The problem occurs when a CPU is optimized specifically for a benchmark"

Can you elaborate on that? abinstein failed on that, perhaps you can do better.; 10:39 AM, May 31, 2007
netrama said...: ho ho said..
"The problem occurs when a CPU is optimized specifically for a benchmark"

Can you elaborate on that? Abinstein failed on that, perhaps you can do better.

It is well known that the CPU Hardware architecture and tweaks that Intel has been doing is only to win in the benchmarks. One example is a prediction algorithm that will work like ...."Hmm this seems like a Spec FP Benchmark...go execute in this little black block of hardware”
So please go read some books on comp arch rather than spamming here and wasting google b/w on searches of C++ code.; 11:37 AM, May 31, 2007
Ho Ho said...: netrama
"It is well known that the CPU Hardware architecture and tweaks that Intel has been doing is only to win in the benchmarks"

It is well known? By whom?

"One example is a prediction algorithm that will work like ...."Hmm this seems like a Spec FP Benchmark...go execute in this little black block of hardware”

You cannot be serious when claiming that the CPUs have special circuity for benchmarks.

Does Intel also have special Quake2/HL2 HW acceleration built-in to their CPUs?

"So please go read some books on comp arch rather than spamming here and wasting google b/w on searches of C++ code."

I have read several of those books. How many have you?; 11:56 AM, May 31, 2007
PENIX said...: ho ho said...
penix, would you like to talk more about RHT and finally answer to the huge amount of questions I've asked you?

Nope.

ho ho said...
Can you elaborate on that [benchmark optimization]? abinstein failed on that, perhaps you can do better.

Both ATi and NVidia were caught "cheating" on 3DMark by changing their driver to run differently on that program. Is it really so hard to believe that Intel isn't doing something similar?; 12:50 PM, May 31, 2007
Jonathan said...: The arguement that games are optimized for C2D is ridiculous. Every game, even the ones released 3 years ago are benefitting from c2d. This is because of the additional cache of c2d over the athlons, and intel's great branch predicting along with that. Recent benchmarks have shown that a conroe with 512k of l2 cache will lose to an athlon 64 with the same amount where games are concerned. but this is mostly a moot point, as the high end c2d's have 2 or 4 mb available to a single threaded application if necessary. That is what makes games run so well, not any specific optimizations intel has made

In relations to server load, it is obvious that most benchmarks are not representative of real world performance, as it can fluctuate, and more demanding scenarios can occur. Opterons, thanks to their integrated memory controller can simply throughput more data simultaneously, and in some intense workloads, it really starts to show, whereas in many benchmarks this strength under heavy load is unintentionally or intentionally not shown.; 12:58 PM, May 31, 2007
abinstein said...: This comment has been removed by the author.; 1:09 PM, May 31, 2007
abinstein said...: Ho Ho -

"Linking to abinstein won't do you any good since he doesn't know what he talks about. When I fixed the numerous mistakes he made he simply didn't put them up because he didn't like the way I worded things."

Your comments to my blog are worthless and contains nothing but off-the-topic claims and personal attacks.

* First you don't believe in SPEC CPU.
* Then you quote one SPEC CPU result (from Acer) that is obviously off the mark with all other results, and you claim my analysis based on all other results is false.
* You were totally wrong about AMD's HE line processors (68W TDP); you claim they consume 90W and when I correct you, you keep on babbling asking for a official link.
* You keep claiming format conversion is computation, while there are obviously numerous well optimized SSE codes that do little if any SSE-to-x87 format conversions.
* You come to argue with me SPECjbb, which is dependent on a whole array of components in the system, when my focus was SPEC CPU, which focuses on just processor and memory. Based on your SPECjbb argument Intel doesn't even need to go from Netburst to Core2. Imagine that.

At this point I've determined you're either lunatic or paid by Intel. You just destroyed the last bit of your arguments' worthiness.

""The problem occurs when a CPU is optimized specifically for a benchmark"

Can you elaborate on that? abinstein failed on that, perhaps you can do better."

Just how shameless and low can you get, uh?; 1:19 PM, May 31, 2007
abinstein said...: "Opterons, thanks to their integrated memory controller can simply throughput more data simultaneously"

IMC affects memory acces latency, not bandwidth.; 1:22 PM, May 31, 2007
abinstein said...: Ho Ho -
"Does Intel also have special Quake2/HL2 HW acceleration built-in to their CPUs?"

Your knowledge in computer microarchitecture is not only poor but also immature.

Povray sse is very different from games. SPECfp and SPECint are very different from games. Nobody is arguing how well Core 2 performs on games, which are apparently compiled optimized for Core 2 and are limited by sequential executions.

Core 2 performs better on games partly due to the large shared cache, partly due to the low thread count and high inter-thread communications in games which make the shared L2 cache more attractive.

The way you purposely mix up games and SPEC clearly shows your poor understanding and immaturity.; 1:29 PM, May 31, 2007
enumae said...: Abinstein

IMC affects memory acces latency, not bandwidth.

Just to be sure I understand, HyperTransport is what allows AMD to have more bandwidth, right?

Thanks; 1:31 PM, May 31, 2007
abinstein said...: "The arguement that games are optimized for C2D is ridiculous. Every game, even the ones released 3 years ago are benefitting from c2d."

I don't understand your logic. Does that in any way interfere with the claim that new games (release over the last year or so) are optimized for Core 2? How do you use an irrelevant argument to show something ridiculous?

"This is because of the additional cache of c2d over the athlons, and intel's great branch predicting along with that."

Yes, large cache will help games, but not much from branch prediction. Games are generally stream processing; the loops in games are pretty large - 100fps on a map divided into 16 parts would make a single pass over the loop longer than 1 million clock cycles.

Anyway, you're not understanding things right and please don't argue for the purpose of argument. Newer games are optimized for Core 2, yes. Games have higher fps at low resolution on Core 2, yes. Core 2 is better for mission critical high-end servers, NO.; 1:39 PM, May 31, 2007
abinstein said...: "Just to be sure I understand, HyperTransport is what allows AMD to have more bandwidth, right?"

More precisely, it is the separation of memory and IO (which is now shifted from FSB to HT links).

Memory bandwidth of Opteron is dictated by the memory technology (multiplies the number of channels). OTOH, memory bandwidth of Core 2 is affected by IO and FSB activities. This is why in realistic environments Opteron will work more reliably agaisnt Core 2 than under benchmarking (SPEC or not).

When you do CPU/memory benchmarks, you want to reduce IO to minimum. This get rid of the IO interference, but also get rid of the realistic factors.; 1:43 PM, May 31, 2007
abinstein said...: Ho Ho -

I have caught you pants down so many times that I lost count. Your "comments" to my blog were first worthless to publish, then they got worthless for my personal reply.

"ICC compiles SPEC benchmarks (almost?) the same way in both normally optimized and by using generated profile data."

The point is not how the compiler compiles, but how much profiling can improve compilation. Even compared to Pentium 4 (which also gets compiled by icc), Core 2 does not benefit as much from profiling, period. Now, same using icc, why different improvements? Are you still saying it's not from Core 2 itself?

Your knowledge on such matters is just as Sharikou said in the last sentence of this blog: primitive.

"My theory is that ICC has some mechanisms that detect SPEC benchmarks and enter a special compiling state that generates near-maximum efficiency code for them."

Worthless theory. Any compiler can do this, probably except gcc. Yet all profiled compilations on K8 improve performance much more than on Core 2.

In my opinion, you are just the kind of people who do not improve his understand because he simply doesn't learn from facts. I guess this is partly why you were banned from Scientia's blog, though official it's because you (rightfully) owe him an apology. Whether you like his blog or not, suffice to say that lots of Intel supporters are still allowed to comment there. Ho Ho, stop spreading FUDs on others now, because you have already been pwned, multiple times. I'm sorry.; 1:58 PM, May 31, 2007
abinstein said...: "Wouldn't that also mean that Pentium4 is built and optimized for running Spec benchmarks as their scores get about the same speed increase as Core2's when using profiling information?"

Netburst SPECint improvement with icc profiling:

6.0%, 5.7%, 5.2%, all above 5%.

Instead, there is not a single Core 2 that is improved above 4%; most are below 3%.

Note that the long pipeline of Netburst and internal replay mechanism (not affected by compilers) is known to be difficult to profile.

You are truly blind to facts. Again, where's your pants, Ho Ho?; 2:14 PM, May 31, 2007
Evil_Merlin said...: SO the Quake 2 folks (iD) went forward in time and compiled the game for C2D, went back in time with the recompiled code and thats the way it was huh?

When Intel performs better its Time Travel folks! You heard it here first!; 2:59 PM, May 31, 2007
Ho Ho said...: penix
"Nope."

So I conclude you know nothing on the subjects you don't want to answer. Feel free to prove me wrong.

"Both ATi and NVidia were caught "cheating" on 3DMark by changing their driver to run differently on that program. Is it really so hard to believe that Intel isn't doing something similar?"

Yes, it is. Video card manufacturers optimized the drivers to replace some parts of games and benchmarks with better optimized ones or even lowered rendering quality. You can't to anything even remotely similar with CPU. If you think Intel can do anything similar please explain what could it be.

abinstein
"Your comments to my blog are worthless and contains nothing but off-the-topic claims and personal attacks."

Since when is it offtopic to point out your mistakes?

"* Then you quote one SPEC CPU result (from Acer) that is obviously off the mark with all other results, and you claim my analysis based on all other results is false."

So you say they cheated somehow and got higher results than the ones you chose to analyze?

"* You were totally wrong about AMD's HE line processors (68W TDP); you claim they consume 90W and when I correct you, you keep on babbling asking for a official link."

For now there is no offical information about Barcelona HE TDP. There is a good chance they will be at same TDP as K8 HE's but there is no official proof of it, that is what I'm talking about.

"Just how shameless and low can you get, uh?"

Well, you did not prove anything but the fact that ICC doesn't generate much better code for Core2 with Spec benchmarks when using profile guided optimizations. Had you used any other compiler in your analysis for Intel things might have been different.

"IMC affects memory acces latency, not bandwidth."

It affects the efficiency of memory bandwidth usage. Old DDR1 had around 90% efficiency, Intel has around 60-70%. New DDR2 K8's have around 70-80%.

"Your knowledge in computer microarchitecture is not only poor but also immature."

Just read what else did I say and it becomes clear why I said that.

"Nobody is arguing how well Core 2 performs on games, which are apparently compiled optimized for Core 2 and are limited by sequential executions"

Even the ones made years ago, well before Core2 saw the light of day? Why do they show such major performance advantages over K8?

"Games are generally stream processing; the loops in games are pretty large - 100fps on a map divided into 16 parts would make a single pass over the loop longer than 1 million clock cycles."

Please learn a bit about games, they don't work like you described here, this even doesn't make any real sense. What blocks are you talking about? Some kind of space partitioning?

"Newer games are optimized for Core 2, yes"

I'll ask again, how do you know this?

"Games have higher fps at low resolution on Core 2, yes"

What does lower resolution got to do with CPU performance?

"Are you still saying it's not from Core 2 itself?"

Yes, I am. Was spec 2006 availiable while Core2 was designed? Have you compared Core2 and Netburst on spec 2000 benchmarks? What exactly is optimized for Spec benchmarks in Core2?

"Instead, there is not a single Core 2 that is improved above 4%; most are below 3%."

Here you are, almost 5% performance increase. Also take a look at cpu2000 results, it seems as Netburst was built for that version of the benchmarks if some would believe your theory.

Main thing that profile guided optimizations do is rearrange conditionals. Core2 with its wery good predictors can't ever benefit as much as other CPUs with worse prediction mechanisms, haven't you ever thought that might be the reason why it benefits so little?; 3:08 PM, May 31, 2007
abinstein said...: Ho Ho -

"Since when is it offtopic to point out your mistakes?"

It is offtopic to show off your poor knowledge and bad logic. You didn't point out any of my mistakes, at least not successfully.

"For now there is no offical information about Barcelona HE TDP. There is a good chance they will be at same TDP as K8 HE's but there is no official proof of it, that is what I'm talking about."

What you are talking about is to show you being a die-hard bonehead who can't eat his words because they are just too ridiculous.

"Please learn a bit about games, they don't work like you described here, this even doesn't make any real sense.'

I know it doesn't make sense to you, because you have limited knowledge. Games essentially in loops and consist of most streaming process. A good multithreaded game divides the objects on the screen into different "domains" to process them in parallel.

So what is your understanding of how games work, uh? From someone who lectures his professor C++ and claim to read many microarchitecture books? Just what are your experiences in game programming? Please show us because you've got no creditability otherwise.

"It affects the efficiency of memory bandwidth usage. Old DDR1 had around 90% efficiency, Intel has around 60-70%. New DDR2 K8's have around 70-80%."

This is precisely the type of 'off-topic' spins you like to give. IMC helps latency, not bandwidth, period. The efficiency difference comes from DCA vs FSB. You can have IMC with an fSB architecture, or DCA with a off-chip memory controller.

As I said, you are just the kind of bonehead who does not admit fault, does not learn from facts, and babbling even when eating his own words.

"What does lower resolution got to do with CPU performance?"

Go bench a game with high resolution and image quality. CPU performance doesn't matter there. It turns out for most games both K8 and Core2 offer playable performance; the true differentiator for gaming is on the graphics card and system cost.

"Main thing that profile guided optimizations do is rearrange conditionals. Core2 with its wery good predictors can't ever benefit as much as other CPUs with worse prediction mechanisms"

You are actually agreeing with me with such comments, but in your primitive knowledge you thought you were not, because you thought branch prediction is just one single entity of the processor.

There are tens of parameters in a modern branch prediction design, which utilizes both global and temporal information in a tournament way, giving different predicted results different weights to pick from. The fact that profiling helps Core 2 very little could well be due to its BP parameters are tuned for the benchmarking workloads.

This also shows up in the fact that Core2 performs much better, compared to K8, in SPEC (single task) than in SPEC_rate, because when you mix a few tasks together the overall execution is harder to optimize for than when there is just one copy of the task.; 3:53 PM, May 31, 2007
abinstein said...: "So the Quake 2 folks (iD) went forward in time and compiled the game for C2D"

Some people do not have the intelligence to read properly a simple sentence, and they go on to speculate on time travelling:

All new games are optimized for Core 2.

In fact, it all depends on what compiler/flags they use to compile their game binaries. If they use icc, then they're optimizing for Core 2. If they compiled to i686, then they're optimizing for Pentium III, and also Core 2. The difference is probably not much here, but there is some optimization.; 4:01 PM, May 31, 2007
PENIX said...: ho ho said...
So I conclude you know nothing on the subjects you don't want to answer. Feel free to prove me wrong.

Your comments are already 3½ pages long. If we bring start bringing in previous topics as well, your comments will be so long people will start drowning.

ho ho said...
Video card manufacturers optimized the drivers to replace some parts of games and benchmarks with better optimized ones or even lowered rendering quality. You can't to anything even remotely similar with CPU. If you think Intel can do anything similar please explain what could it be.

There are many ways this could be accomplished. As netrama stated, there could be dark blocks of circuits which only become active when specific benchmark binary patterns are recognized. What do you think would be faster, calculating PI or simply reading back a stored value for PI? OR maybe the optimizations are in the chipset driver rather than the CPU. OR maybe Intel has paid NVidia to place the optimizations back in their drivers, but triggered only when an Intel processor is in use. All 3 theories are possible and fit and explain why Intel real world performance does not live up to the benchmarks.; 4:05 PM, May 31, 2007
Ho Ho said...: abinstein
"Games essentially in loops and consist of most streaming process"

You just described almost every single program out there, at least the CPU heavy parts of the applications.

"Please show us because you've got no creditability otherwise."

Here is one of my first tries at game engine creation. Please remember that this was mainly written during the first year of learning C++ so the code is quite ugly at places. It is also a funny coincidense that I starterd working as a professional game developer just two weeks ago in Videobet.

Where is your experience shown?
I'd also like to know more about those "parts" you were describing before. Were those the domains you described? Please do so so we could see how much you really know about the subject.

"IMC helps latency, not bandwidth, period"

Then why does K8 have more memory bandwidth than Core2 in 1P? Latter doesn't waste 25-40% of FSB bandwidth talking with other peripherials.

"Go bench a game with high resolution and image quality. CPU performance doesn't matter there"

Thank god you didn't go the route someone other went some time ago claiming that bigger cache of Core2 speeds up games at lower resolutions thanks to fitting frames to caches.

"The fact that profiling helps Core 2 very little could well be due to its BP parameters are tuned for the benchmarking workloads."

Here you say "could" but before you said this was a proven fact. What caused the change?

"The difference is probably not much here, but there is some optimization."

Yet again you pretty much nullify your previous claims of "proven facts".
Before Core2 biggesr part of market was held by Netbursts. Optimizing for them was quite a bit different than optimizing for K8 and Core2. Still even games that were developed during that time run faster on Core 2.

Btw, how big difference is there between K8 and P3 that the latter benefits more from i686 than the first? You said you had access to both Core2 and K8, please do some benchmarks. I'd probably do it myself if I had some K8 laying around.

Also, did you know that if you want to get the best performance out of Core2 with GCC then using -march=k8 gives better result than -march=prescott?

penix
"What do you think would be faster, calculating PI or simply reading back a stored value for PI?"

The value of Pi is stored in standard libraries and CPUs have had lookup tables for trigonometry for quite some time, though these are only used when you allow the compiler to generate code with those functions as they might not be as accurate.

"All 3 theories are possible and fit and explain why Intel real world performance does not live up to the benchmarks."

I'll give you a fourth theory: perhaps Core2 really is better at running most programs than previous CPUs.; 5:01 PM, May 31, 2007
PENIX said...: ho ho said...
The value of Pi is stored in standard libraries and CPUs have had lookup tables for trigonometry for quite some time, though these are only used when you allow the compiler to generate code with those functions as they might not be as accurate.

My main concern with the Pi example would be the Super Pi benchmark.

ho ho said...
I'll give you a fourth theory: perhaps Core2 really is better at running most programs than previous CPUs.

But this theory does not explain why Intel real world performance does not live up to the benchmarks.; 5:21 PM, May 31, 2007
Ho Ho said...: penix
"My main concern with the Pi example would be the Super Pi benchmark."

Ok, would you like to elaborate on that? If (part of) the value of Pi would be stored in CPU then we should see the program run nearly instantly, not only a bit faster than its predecessors.

Superpi is most definitely not being run in some special HW units. It is not a miracle that Core2 benefits that much in that benchmark. I suggest you to see how well did Core and Pentium M fare in that benchmark.

"But this theory does not explain why Intel real world performance does not live up to the benchmarks."

Depends on what "world" are you living in. The applications I use daily see quite nice speedups from Core 2. Server applications under heavy load are slowed down by not so good system architecture, CPU itself has little to do with it.; 6:06 PM, May 31, 2007
pointer said...: PENIX said...

ho ho said...
I'll give you a fourth theory: perhaps Core2 really is better at running most programs than previous CPUs.

But this theory does not explain why Intel real world performance does not live up to the benchmarks.

wow, nice try. Just not to take any new game as example here (as some fanbois might claim that the new game is optimized for C2D). All the pre C2D game run much faster with C2D compared to whatever AMD has to offer (as long as not Gfx bottleneck-ed), isn't this a real world example.

and remember some benchamarks are real apps, such as audio/video encoding/decoding ... and the one that you won't like - POVray :); 7:35 PM, May 31, 2007
PENIX said...: ho ho said...
If (part of) the value of Pi would be stored in CPU then we should see the program run nearly instantly, not only a bit faster than its predecessors.

Exactly, but that would be a little too obvious. They want to take the lead, not get caught. But the specifics are irrelevant since the point of that example was not to state it as an absolute, but just demonstrate how hardware could be optimized to cheat a benchmark.; 7:43 PM, May 31, 2007
pointer said...: PENIX said...

ho ho said...
If (part of) the value of Pi would be stored in CPU then we should see the program run nearly instantly, not only a bit faster than its predecessors.

Exactly, but that would be a little too obvious. They want to take the lead, not get caught. But the specifics are irrelevant since the point of that example was not to state it as an absolute, but just demonstrate how hardware could be optimized to cheat a benchmark.

so, you think that the AMD's IMC is to cheat the memory latency benchmark? :)

if not, then what you think is to cheat benchmark? bigger cache? hahaha.; 7:58 PM, May 31, 2007
abinstein said...: This comment has been removed by the author.; 9:15 PM, May 31, 2007
abinstein said...: Ho Ho -

"Where is your experience shown?"

I can tell you that, unlike you, I am not (and probably won't be) a professional programmer. I stopped that career not long after I started my post-graduate study.

"I'd also like to know more about those "parts" you were describing before. Were those the domains you described?"

All games (as well as all big programs) can be divide into different domains and made multithreaded. I am sorry if you think otherwise.

The point, however, is not your off-topic spins, but the fact that games, especially real-time graphical games, are basically loops of mostly streaming commands. Thus games get help from big cache more than from branch prediction. Games with lots of inter-thread communications get help from better memory address resolution, as in Core2, too. jonathan's claim that "intel's great branch predicting" helps gaming is just not true, and your follow-up clearly shows how immature you are, opposing me for the purpose of opposition.; 9:32 PM, May 31, 2007
abinstein said...: Ho Ho -

"It is also a funny coincidense that I starterd working as a professional game developer just two weeks ago in Videobet."

So much from someone who claims to "teach his professor C++", when he only has two weeks of professional programming experience.

Learn to wipe your nose before you talk about things that you don't fully understand.

"Here you say "could" but before you said this was a proven fact. What caused the change?"

The different topic caused it, of course, and you just can't read. You imagine up things that I did or did not claim. I'm starting to feel sorry for your employer.

The proven facts are that Core2 do not scale to higher number of cores. The proven facts are that Povray SSE 3.7beta is badly "optimized" for SSE.

I say "could" in my blog whenever I have no direct knowledge of something. Unlike some people (like you) who talk about things they don't know like they did, and refuse to correct themselves when they're caught.

You just have zero credit now. Why don't you just work on your C++ coding and stop commenting on things that you don't know?

"Yet again you pretty much nullify your previous claims of "proven facts"."

How do you get that? Did I say the difference was huge? Where? Again, can you even read?

"Before Core2 biggesr part of market was held by Netbursts. Optimizing for them was quite a bit different than optimizing for K8 and Core2."

Optimization and microarchitecture are much more than number of pipelines. In terms of branch prediction, Netburst had superior than Pentium M, and Core2 got much from Netburst.

"Still even games that were developed during that time run faster on Core 2."

It doesn't matter when the game's C++ code is written. Nobody is writing in assembly and optimize on that level these days. All depends on compiler, as I have clearly said, and you just can't read.; 9:35 PM, May 31, 2007
Randy Allen said...: When the Athlon 64 X2 made it's debut, I was foolish enough to buy into some of the hype. I built an Athlon 64 X2 4200+ for myself. The benchmarks were less than published results, but still comparable. When used side by side with my, now ancient, Pentium 4 1.8Ghz system, the performance increase was barely noticeable. The actual performance did not match what the benchmarks were saying; not even close.

AMD's market share is crashing:-

http://img183.imageshack.us/img183/4608/amdmarketsharecrashingpz0.jpg

AMD is also making false performance claims as well:-

http://img101.imageshack.us/img101/546/amdfakeperformancemm7.jpg

AMD BK Q2'08.; 11:17 PM, May 31, 2007
Ho Ho said...: penix
"But the specifics are irrelevant since the point of that example was not to state it as an absolute, but just demonstrate how hardware could be optimized to cheat a benchmark."

How can you say that when there is absolutely no hints that would say you are right? I could also say that AMD has built-in specXX_rate accelerators that lessen the memory bandwidth usage and thus show better scaling, can you prove me wrong?

abinstein
"I can tell you that, unlike you, I am not (and probably won't be) a professional programmer."

I didn't ask you if you were a professional programmer, I only asked what your knowledge and experience was on the subject. So far it seems as you know very little.

"All games (as well as all big programs) can be divide into different domains and made multithreaded. I am sorry if you think otherwise."

What makes you think I think otherwise? Anyways I was talking about this thing you said:
100fps on a map divided into 16 parts would make a single pass over the loop longer than 1 million clock cycles

"So much from someone who claims to "teach his professor C++", when he only has two weeks of professional programming experience."

Nice leap of faith. I thought the fact that before that game developer job I have had 2 years of professional web application development (Java and Oracle APEX) and around 8 years of hobby programming experience with a variety of languages was not important in that context.

"Learn to wipe your nose before you talk about things that you don't fully understand."

Please do so before trying to comment on game programming or analyzing. As I said before, your profile guided optimization analysis only shows that ICC doesn't generate much better code on Core2, nothing else. Still you make this into a CPU "problem".

"How do you get that?"

First you claimed that all (newer) games are optimized for Core2 and then backed up by saying that it simply benefits because of its architecture.

"In terms of branch prediction, Netburst had superior than Pentium M, and Core2 got much from Netburst."

Yes, it is but you claimed that branch prediction doesn't help much in games, so why is Core2 so good at the oplder games also? Also have you checked the benchmark results of older mobile non-netburst pentiums? The latter ones before Core2 were already competing with K8, Core2 just extended the difference a bit more.

"It doesn't matter when the game's C++ code is written. Nobody is writing in assembly and optimize on that level these days."

It does matter, many games haven't even seen updates after the Core2 release. Also if you want to claim that some game has had a boost on Core2 after some update then it would be relatively ssimple to find out just by comparing different versions of the game.; 12:52 AM, June 01, 2007
Azmount Aryl said...: PENIX said...

abinstein said...
Core 2 are for entry-level servers.

Agreed. I run several servers with high traffic. I don't care about benchmarks. I need real world performance. Until Intel proves themselves as a worthy alternative, I will only use AMD.

Second that.; 1:30 AM, June 01, 2007
pointer said...: "abinstein said
"All new games optimize themselves for Core 2 Duo."

OK, what makes you think that? What about all those older games that were developed before Core2 came out?"

Can you read? Was I talking about older games? Please read my sentence right above before you letting out any single more syllabus, won't you? Also mind you Core2's optimization manual have been out for more than 1 year.

well, i guess you are the one that have reading problem, or trying to spread FUD, the 'doubt' part of it.

you were replying to hoho statement: Are games also a form of benchmark? Is this the reason why Core2 dominates everything else in that area? What about video encoding and compression? Are you saying that lots of the applications people use daily are simply benchmarks?

Btw, what are those optimizations that only help with benchmarks and don't help in real-world?

and you, abinstein was trying to put out a statement like games are optimized for C2D, trying to confuse people as if those games were used in the benchmarks, especially those benchmarks done when the C2D was just launched.

abinstein said
"hoho:Still even games that were developed during that time run faster on Core 2."

It doesn't matter when the game's C++ code is written. Nobody is writing in assembly and optimize on that level these days. All depends on compiler, as I have clearly said, and you just can't read.

and another attempt here ... as if the pre-C2D released game binary used in the benchmarks at the earlier day of C2D is possible to have such 'optimization'

while some AMD fanbois might be spreading FUD that is just unbelievable, you are trying to spread looks-like-correct FUDs. Too bad, none of those stand. Try again:); 5:00 AM, June 01, 2007
Unknown said...: Funny experience with Intel CPUs

"I'm so headache with a problem which is my ASUS 680I (striker extreme) doesnt support INTEL QX6700.

ASUS has released a lastest BIOS version which is 0701 by 20/12/2006 and I have upgraded already.

the montherboard box says: FULLY SUPPORT INTEL QX6700 and I can see the processor information is correct in BIOS interface that is: INTEL CORE 2 QUAD XXXX which means the montherboard detected my CPU, the same information I can see that in WINDOWS XP PRO SP2 and CPU-Z 1.38.

however, there only two cores are working at the monent in winxp, cos only two cores shows in windows task manager, but actually should be shows four cores.

I have been reinstalled windows xp few times even windows vista, but still same.

I believe the QX6700 doesnt have any problem because I've checked by intel CPU ID software and the cpu is retail ver and was sealed.

I have chekced so many times in bios or guide book which like any switch to enable or disable four core and the answer is NO!

I have runned few times of 3D MARK 06, the CPU score is strange which only around 2500 same as E6700, the real QX6700 should be between 4500-5500, so I'm sure the two of them cores are unable to work at the monent.

please help me!"

"I’m having the same problem with aw9d-max and qx6700, after a long search i disabled 2 of the 4 cores and was able to boot and run vista, once vista was running i enabled the third core booted and no problems then i tried the 4th core and vista hangs Sad

Now I was wondering if it could be possible that my psu is not powerful enough or if the problem is caused by something else.

I currently have a 400W psu, next week I’m going to test a 500W"; 6:11 AM, June 01, 2007
pointer said...: AMD fanbois spreading 'real world experience FUD s.....
"I'm so headache with a problem which is my ASUS 680I (striker extreme) doesnt support INTEL QX6700. ..

I’m having the same problem with aw9d-max and qx6700, after a long search i disabled 2 of the 4 cores and was ...

well, i do not disagree some might have grumbled about Intel CPU for whatever reason .. however, grumbling on the other end is equally true.

here is my less than 1 minute google search :)

I have had a bad experience with AMD's XP Athalons. I am a musician who gigs live with a computer. When it comes to proforming live, theres nothing worse than a computer crashing during an event. The AMD processors have caused me great turmoil due to unstability. They also lagged in simple processing. It takes 2-3 seconds for the start menu button to pop up... Unacceptable!

Upgraded to a same class ASUS motherboard and this Intel chip. Stability atlast! not a crash yet and i stressed it to 100% cpu power for over 12 hours! no crash or restart!

You get what you pay for. Intel. when stability matters...

Better than the AMD CRASHAthon....AMD is far behind INTEL, there processors always do weird stuff, never stable and problems galore....do not buy an AMD....they are cheaper and maybe faster but are very buggy processors. I think AMD forgets to do all the steps in a normal cpu cycle that is why they are faster but crash lots.

hint: this is an excerpt from the buyers of P4 HT in pricegrabber :); 7:36 AM, June 01, 2007
Anonymous said...: People complaining about AMD naw cant be. Those are all paid pumpers from intel. Only complaints and posts against intel are credible and if that same source says anything positive they become paid pumpers as well.....; 8:19 AM, June 01, 2007
Ho Ho said...: ... and people don't retire from Intel, they'll just become their (un)official viral marketers who write good stuff about Intel on the Internet.; 8:25 AM, June 01, 2007
abinstein said...: Ho Ho -

You are not reading, and thus your "reply" becomes mostly meaningless. I'm not spend more of my time with you except the following -

"First you claimed that all (newer) games are optimized for Core2 and then backed up by saying that it simply benefits because of its architecture."

These are two statements. New games are optimized for Core2, if they use icc or compiled to i686. Core2's large shared cache benefit games with high inter-thread communications.

Below is another statement: you simply can't read, and still don't read even when others point it to your nose.

""In terms of branch prediction, Netburst had superior than Pentium M, and Core2 got much from Netburst."

Yes, it is but you claimed that branch prediction doesn't help much in games, so why is Core2 so good at the oplder games also?"

I explained why but you couldn't read, sorry I can't help.

"The latter ones before Core2 were already competing with K8, Core2 just extended the difference a bit more."

Apps optimized for Core also are optimized for Core2.; 9:06 AM, June 01, 2007
The Dude said...: Randy Allen said...

When the Athlon 64 X2 made it's debut, I was foolish enough to buy into some of the hype. I built an Athlon 64 X2 4200+ for myself. The benchmarks were less than published results, but still comparable. When used side by side with my, now ancient, Pentium 4 1.8Ghz system, the performance increase was barely noticeable. The actual performance did not match what the benchmarks were saying; not even close.

This is bullshit.

When I was in school, they replaced all of the computers in our computer lab with brand new 2.0 GHz Dells. You'd think they would scream. But, in reality, they were actually slower than my Athlon XP 1700+ at home. I couldn't believe it! And I mean SLOWER. Of course, they probably had the crappy integrated graphics that most OEMs use, but still. It was slower bringing up the start menu, slower loading a program, slower getting a web page, everything!

It was so bad that I simply did all my programming and such at home, and brought everything to school on a floppy.

In comparison, when I first put a new Athlon64 3200+ (2.0 GHz single-core) together, I couldn't believe HOW MUCH FASTER IT WAS. When it took a fraction of time to do things like rar/unrar files, I thought perhaps I had overclocked it. But there it was, the same 2 GHz as the Dell P4 machine at school. But the difference in performance was night and day.; 9:19 AM, June 01, 2007
Randy Allen said...: After that computer I built a system using the E6600 (later q6600). It's faster than anything AMD has out and I finally see real world performance gains.

One thing is for certain, I will never believe the hype of AMD again.

Intel literally does 'Leap Ahead'!; 9:43 AM, June 01, 2007
The Dude said...: Yeah. It's good to see that after 3+ years of dominance by AMD in terms performance and heat output, Intel has come back for (almost) a year.

Good luck with your leaping, dude!; 10:47 AM, June 01, 2007
tech4life said...: Question: In regards to the recent POVRay demonstration by AMD they showed an 8 core Opteron system scoring 2200 vs a 16 core Barcelona system that scored just over 4000. They say the purpose was to show how well Barcelona scales. But doesn't it only show that Barcelona offers no performance gains over the old Opteron (which is absurd)? I mean if they tested an 8 core Barcelona system wouldn't they get around 2100-2200 just like the old 8 core Opeteron? Somebody clear this up for me please.; 11:38 AM, June 01, 2007
tech4life said...: And if they were trying to demonstrate scalability shouldn't they compare an 8 core Barcelona system to a 16 core Barcelona so it would be apples to apples?; 11:51 AM, June 01, 2007
abinstein said...: "They say the purpose was to show how well Barcelona scales."

No, the purpose is to show how well Barcelona works as drop-in replacement to dual-core Opterons.

You can't say the same thing on Intel products.; 9:55 PM, June 01, 2007
Unknown said...: You can't say the same thing on Intel products.

Of course you can. Clovertown and Woodcrest are socket compatible. If someone bought a server with two Woodcrest CPUs last year they could upgrade those to Clovertown CPUs today.

With AMD you could read about "simulated performance" if you wanted to, maybe look at some powerpoint presentations. You couldn't actually buy a quad core CPU though.; 12:27 AM, June 02, 2007
Unknown said...: C'mon Giant, you know deep in your intel'ish hearth that AMD has better product comparing to intel. All web sites say so. We know that is so.

So stop spreading FUD. Duhh, your post doesn't even perfect sense.; 11:31 PM, June 05, 2007
jane said...: Used your unique style, use your pandora high-tech material, you quickly broke into the people’s heartpandora. Ten years later, we believe you will become more and more advanced and still very popular.
In everyone’s life, there are several of ten years.pandora Ten years, neither short nor long. But, ten years, arePandora Jewelry enough to change everything,pandora bracelets the people, the things. Everything always can’t escape the time’s washing, evenpandora bracelets though Chanel shoes. Ten years, it’s enough to change its modeling. However, we firmly believe that its unique pandora necklacestyle we never change.
pandora bracelets canada
pandora jewelry canada
pandora bracelet
pandora beads canada
pandora bracelets and charms
pandora jewelry toronto
wedding jewelry
pandora jewelry sale
pandora charms cheap
pandora charms sale
disney pandora charms
pandora jewelry charms
Pandora Online Store
pandora charms online
pandora jewelry stores; 6:29 PM, August 12, 2011
jane said...: Wenn Frauen über ihre beliebtesten Besitz werden gebeten, würden die meisten vonthoma sabo ihnen kommen mit der Antwort des thomas saboKleidung und Accessoires. Nun, das ist? S wahr ist, weil ja viele Frauen, Abercrombie and Fitch Kleidung kaufen Liebe Kleidung & Accessoires, weil sie ein Sammler? s Artikel sind in der Mehrzahl der Frauen? thomassabos Schränke und jedes gute Stück tom saboKleidung sieht am besten, sabo thomaswenn sie mit der richtigen Art von accessories.For all diejenigen, die ihren Stil lieben gekoppelt Thomas
Anweisung in der Retro-und Abercrombie sabo online shopand Fitch Stil Outfit und Accessoires, das sind die meisten im Besitz Elemente in thoma sabo online shopjeden Kleiderschrank.
Thomas sabo Schmuck
thomas sabo ringe
thomas sabo ohrringe
thomas sabo schmuck
thomas sabo preise
thomas sabo sale
thomas sabo onlineshop
thomas sabo uhr
thomas sabo anhänger
thomas sabo kette
thomas sabo online shop
thomas sabo outlet
thomas sabo charm club
thomas sabo shop
thomas sabo armband
thomas sabo parfum
thomas sabo anhänger reduziert
schmuck thomas sabo; 6:30 PM, August 12, 2011

Journal of Pervasive 64 bit Computing
Main Blog Page

About Me

Previous Posts

Sunday, May 27, 2007

Real world experience with Intel CPUs

124 Comments:

Journal of Pervasive 64 bit Computing Main Blog Page

About Me

Previous Posts

Sunday, May 27, 2007

Real world experience with Intel CPUs

124 Comments:

Journal of Pervasive 64 bit Computing
Main Blog Page