Sunday, April 08, 2007

Intel's Quad core only slightly faster than the $64 X2 3600

You pay $65 for the X2 3600+ and get 65.5 fps in Rainbow Six: Vegas.
Or, you pay $1000 for the Intel's fake quad core QX6800 and get 88 fps.
If you look at other benchmarks, the picture is worse for Intel.

The pathetic "quad core" performance of Intel CPUs shows how outdated Intel's architecture is.

48 Comments:

Blogger Unknown said...

What?! You mean a game only optimised for dual core CPUs (as opposed to quad cores) is only as fast as a dual core CPU?! Who would have ever thought that?

But Sharikou's link does illustrate one point:-

The Athlon 6000+ runs at 3Ghz. It's the Socket AM2 equivalent of the Athlon 64 FX 74.

Yet AMD's 4x4 QuadFX is slower than using one processor. That's right, using one AMD CPU is faster than using two. Negative scaling. I guess that shows how pathetic AMD's architecture is.

10:46 PM, April 08, 2007  
Blogger Randy Allen said...

This comment has been removed by the author.

10:49 PM, April 08, 2007  
Blogger Randy Allen said...

Wow. AMD's negative scaling in Oblivion is really great.

In Rainpow Six Vega adding another processor and gaining a pathetic 2.6 FPS for AMD? AMD truly has a pathetic architecture that does not scale. 4x4 is a joke.

AMD BK Q2'08.

10:51 PM, April 08, 2007  
Blogger novice said...

Shgarikou, You are wise!
It does not look good for Intel. Being 10 times the size of AMD means also 10 times more costs. So with AMD and anything more than 10 percent of marketshare, it means trouble for Intel, and Intel will have to shrink. Every shrink for Intel would mean expansion for AMD and this will go on, until...well, I don't know until what.

11:07 PM, April 08, 2007  
Blogger abinstein said...

What?! You mean a game only optimised for dual core CPUs (as opposed to quad cores) is only as fast as a dual core CPU?! Who would have ever thought that?


Sharikou's point is clear and you just don't get it. It's most cost effective now as before to buy AMD processors. You get the most out of what you pay for.

11:58 PM, April 08, 2007  
Blogger Azmount Aryl said...

Randy Allen said...
In Rainpow Six Vega adding another processor and gaining a pathetic 2.6 FPS for AMD? AMD truly has a pathetic architecture that does not scale. 4x4 is a joke.
AMD BK Q2'08.


In comparison dual core C2D X6800 at 2.93GHz is 2.7FPS slower than quad core QX6800. I'm sorry sir but you have no point here.

Do you get paid to post pointless posts or do you just want for world to know how childish your logic is?
It would be nice if you would respond to my question.

12:22 AM, April 09, 2007  
Blogger Dahaka said...

Intel has a bottleneck. "FSB" .
If Intel wants more performance needs memory controller!

Intel has a good chip but has no good platform.AMD has average chip but pretty good platform.AMD need stronger core and see the incredible performance.Intellers will be upset I'm sorry.Haha
Wait Barcelonaaa!!!

1:38 AM, April 09, 2007  
Blogger uf said...

I recommend to look at SPECint_rate2006 and SPECfp_rate2006 for two-way sockets machines. Don't be fooled by average scores, look at details of the benchmark.
One can see that:
a) Intel's architecture very much dependent on the task, Opteron’s architecture is more robust;
b) Higher average score of Xeon is reached usually due to the abnormal score of one particular task;
c) There are some tasks where two-core Opteron beats fore-core Xeon;

uf.

2:04 AM, April 09, 2007  
Blogger Ho Ho said...

It all depends on what you need.
E.g, Valve's particle simulator runs at around 4.5x faster on QX6800, 4.2x faster on QX6700 and 3.76x faster on Q6600 compared to x2 3600+.

After April 22 you can get Q6600 for around $530, still ~8x more expensive than that discounted x2. Without discount the Q6600 costs around 4.8x more. Too big price difference? Perhaps, but consider that there exists no AMD counterpart with similar performance, not to mention price.

In Q3 that Q6600 costs around $266, I have no ideas what that 3600+ costs.

3:35 AM, April 09, 2007  
Blogger Azmount Aryl said...

Ho Ho said...
It all depends on what you need.
E.g, Valve's particle simulator runs at around 4.5x faster on QX6800, 4.2x faster on QX6700 and 3.76x faster on Q6600 compared to x2 3600+.


I agree. It all depends on what you need. Let me ask you a question, do you run Valve's particle simulator for, at least, one hour every day on your QX6800? Cause i sure do run Firefox 2 (e-mail, forums, blogs) for one hour, every day, on my X2 3600+.

4:02 AM, April 09, 2007  
Blogger Randy Allen said...

Cause i sure do run Firefox 2 (e-mail, forums, blogs) for one hour, every day, on my X2 3600+.

Frefox only needs a 500mhz processor.

There is just no denying that AMD is fragged all over by Intel. In servers Intel frags AMD. In desktop Quad Core X6800 frags 4x4 while using less power than AMD 6000+. In mobile Intel frags AMD while still using less power.

4:25 AM, April 09, 2007  
Blogger Ho Ho said...

azmount aryl
"Let me ask you a question, do you run Valve's particle simulator for, at least, one hour every day on your QX6800?"

No, it doesn't work under Linux and I don't have a quadcore, at least not yet.

I do lots of compiling (average >1h per day), lots of software rendering (>2h a day), run lots of data processing (custom scripts that download webpages, gather information and store in DB around 1-2h a day) and lots of other stuff that would take too long to list here. Also most CPU cycles not used by those tasks are picked up and used by Seti@Home.

Also web browsing and music listening can take quite a bit of CPU cycles. At work I just restarted so I only have data for last 45 minutes. Rendering GUI (X) has taken around 2:14, Konqueror(with ~40 tabs)+flash ~2:48, Amarok a total of ~30s for 14 parallel threads. I haven't (yet) used my database frontend program, that one takes considerable processing power to run. During normal use it takes almost 5-15% of all CPU time.

All those numbers are actual CPU of actual CPU usage. I don't count the time those things are sleeping at background. If I would then I'd have to count most of them running 24/7.

I mostly do web development so my CPU is not under that much load as it is at home. If anyone is interested I can post a detailed processor usage graphs of my home PC later when I get home, for some reason I can't access it remotely ATM.



My PC runs 24/7 but I'm using it at most 8h a day, thus around 4-5h of it is doing something that is not idleing.


My point (and Sharikou's) was that for running R6:V you might not need to get an expensive CPU. What Sharikou missed is that there are people who are not gamers and do other stuff that does need considerably more CPU power that 3600+ simply cannot provide.


randy allen
"Frefox only needs a 500mhz processor."

FF is actually rather CPU intensive. I run FF and Konqueror on P3 500 and Konqueror finishes rendering pages at least 2-3x faster than FF does. Though FF is still quite usable on that P3. On my older P1 233@266 though FF isn't all that snappy any more so I use only Konqueror there.

4:36 AM, April 09, 2007  
Blogger Azmount Aryl said...

Randy Allen said...
There is just no denying that AMD is fragged all over by Intel.


Actually theres no denying that AMD has equivalent to anything intel has to offer. AMD has the best server processors on the market as shown by countless benchmarks, they have QuadFX platform as opposed to intel's quad core CPU's, they offer the cheapest solutions for average Joe such as 300$ Athlon 6000+ which stand fairly against intel's 500$ C2D E6700 CPU.

Due to your inconsistent with logic post i must reiterate my question: Are you getting paid to do pointless posts or are you just a kid who has a kiddish opinion?

5:08 AM, April 09, 2007  
Blogger Azmount Aryl said...

To Ho Ho:
I see what you do for living now. My personal suggestion to you sir is this, get a good collage degree and move to U.S., thanks to Bill's plea to congress we might start giving away those work visas to IT experienced people around the world. I'm sure you feel secure making $8/hr in your country, buy if you'd be here, i assure you, you would feel much more secure making $50/hr.

By the way, I'm Russian and when i first entered U.S. i did it with the work visa.

5:17 AM, April 09, 2007  
Blogger Ho Ho said...

azmount aryl
"AMD has the best server processors on the market as shown by countless benchmarks"

Depends on benchmark


"they have QuadFX platform as opposed to intel's quad core CPU's"

I personally wouldn't get 4x4 with two CPU fans and coolers, double the DIMM count and considerably higher power usage when I can get same or better performance using only single socket.


"they offer the cheapest solutions for average Joe such as 300$ Athlon 6000+ which stand fairly against intel's 500$ C2D E6700 CPU."

For now, you are correct when you don't look at the power consumption. From 22'nd April that E6700 will cost around $316 and q6600 will drop to previous level of E6700 at $530.



Sorry for the OT, but ...

"if you'd be here, i assure you, you would feel much more secure making $50/hr."

I have no doubts I could earn considerably more money in US than I do here in Estonia. Just a couple of days ago some statistics showed that on average, Estonians spend 25% of their income on food, 15% on rent and 10% on transport. A total of around 55% for the most basic needs. Same number for average EU country is <20% and in US ~12-14%.

Though as I said before, money is not the most important thing for me and US is certainly that kind of country I wouldn't want to live in, especially at the time when they can easily get the biggest market crash ever in history, not to mention all of that "big brother" stuff going on there.

I could work in Finland without much hassle, though. Helsinki is actually closer to me than my parents are, one is 80km and other is 100km away. It is also cheaper to get to Helsinki.

5:49 AM, April 09, 2007  
Blogger Christian Jean said...

Yet AMD's 4x4 QuadFX is slower than using one processor.

Show me a 'true' 4x4 product! Not the prototype/beta that is currently circulating. AMD is just establishing a market right now until their K10's are read... but happy to have informed you!

Also, on a second note. It wasn't quite clear what the 4x4 was aimed at and what it was suppose to be. Those who think it was a cheap way of getting quad doesn't know a single thing about computer. But recently AMD's Richards has described the true goal of 4x4, which is primarily to replace all platform limitations, such as having the ability to use 4 graphics card (just to name one). Go read the interview for more on this product... quite interesting!

Negative scaling. I guess that shows how pathetic AMD's architecture is.

AMD truly has a pathetic architecture that does not scale. 4x4 is a joke.

Once again, Intelers spoke and are either...

a) very stupid
b) don't know a thing about architectures
c) all of the above

5:53 AM, April 09, 2007  
Blogger Christian Jean said...

My personal suggestion to you sir is this, get a good collage degree

Why is that? Most hiring I do now, I try to prefer people without schooling. I search for those who have shown the ability to learn by themselves and learn from self motivation. For the last 5 years now, they have shown to be better than out of college employees. And because they didn't come out influence by everything Microsoft, you can shape them willingly.

and move to U.S.

Cool, and become part of the McDonald's culture. Get treated like a third-rate immigrant until you receive a draft card in an attempt to ship you to Iraq?

Don't forget to take a billion dollars worth of insurance because on average each Americans have 3 pending lawsuits on everyone else :)

thanks to Bill's plea to congress we might start giving away those work visas to IT experienced people around the world.

Microsoft wouldn't dare outsource, but what's the next best thing? Bring the outsource to your country!

By the way, I'm Russian and when i first entered U.S. i did it with the work visa.

If you allow me to ask Azmount Aryl... being Russian, are you allowed to go to Cuba on vacation? Or having a U.S. work Visa prevents you legally from going?

I love Cuba... wouldn't want to stop going there!

6:08 AM, April 09, 2007  
Blogger Azmount Aryl said...

Ho Ho said...
I personally wouldn't get 4x4 with two CPU fans and coolers, double the DIMM count and considerably higher power usage when I can get same or better performance using only single socket.


Then again thats you. Some people are more familiar with AMD brand than you. And some people just need a good workstation and $346 MoBo plus a few FX-70 processors.... whops! ain't got em FX-70 processors - they are Sold Out (atm). I guess someone does need them.

I also noticed that you have referred to intel's Q2 price fall-down, well, the only thing i can say is... AMD will have one to.

6:16 AM, April 09, 2007  
Blogger Azmount Aryl said...

Jeach! said...
If you allow me to ask Azmount Aryl... being Russian, are you allowed to go to Cuba on vacation? Or having a U.S. work Visa prevents you legally from going?
I love Cuba... wouldn't want to stop going there!


I'm a US citizen. Also i prefer asian girls as a sex toys (thats just me so don't reply to this - we all different)

6:25 AM, April 09, 2007  
Blogger Christian Jean said...

This just in on CNBC...

AMD adjusts (down) its Q1 guidance to $1.2 Billion (no surprise).

And surprisingly, it's stock shot up from a gain of 1.2% to 4% almost instantly! Why?

Because the announced a corporate restructure and a cut to its CAPEX expenditure by $500 Million.

This is good news to shareholders, but I'm wondering where 'exactly' the $500 million cuts will be made? FABS, R&D, equipment, supplies, all of the above?

6:26 AM, April 09, 2007  
Blogger Ho Ho said...

jeach!
"Show me a 'true' 4x4 product! Not the prototype/beta that is currently circulating."

Problem is when I want to have comparable CPU power to Intel quads now I don't have any other choise to get that prototype thingy.


"ability to use 4 graphics card"

You could do it before with certain motherboards. Just that 4x4 supporting motherboard* can do it also.

*) How many 4x4 motherboards there are? Only one?


"Most hiring I do now, I try to prefer people without schooling"

Heh, I might have a chance with you then. I've been to college for 2 years but it got really boring really fast and I don't think I'll finish it any time soon. I knew more about programming and computers before I started college than they could ever teach me there.

Though I'll probably start taking various certification exams once I have a bit more free time. I can already pass the first Java one without any preparation, at least according to a test program I tried a few weeks ago.


azmount aryl
"Some people are more familiar with AMD brand than you."

What has brand got to do with power usage and noise levels? For me my PC is a workstation and to work efficiently you shouldn't be able to confuse it with dustbuster. Though I agree, I'm certain there are different people than me.


"I guess someone does need them"

... or nobody wants them so they are not ordered. Only FX72 is there for some reason.


"AMD will have one to"

Didn't it already have one ? Or will they simply drop their already cheap prices even more? If yes then I can understand how they can burn $1B in just a few months.

6:29 AM, April 09, 2007  
Blogger Randy Allen said...

Why are you linking to Sharikou's links to AMD benchmarks? By following your logic, we may as well link to this: http://www.intel.com/products/processor/xeon/competitive_guide.pdf

We'll follow your logic. Clearly, Clovertown is 400% faster than Opteron.

6:29 AM, April 09, 2007  
Anonymous Anonymous said...

I do believe that intel is really concerned about AMD capturing more market share at any cost. The more market share AMD captures, Intel will have a hard time regaining that back.

6:44 AM, April 09, 2007  
Blogger Randy Allen said...

This just in. I thought the AMD fanboys would appreciate this.

http://biz.yahoo.com/bw/070409/20070409005383.html?.v=1

SUNNYVALE, Calif.--(BUSINESS WIRE)--AMD (NYSE:AMD - News) today announced it expects to report revenue of approximately $1.225 billion in the quarter ending March 31, 2007. Revenues declined sharply quarter-over-quarter for the Computing Solutions segment, primarily due to lower overall average selling prices and significantly lower unit sales, especially in the resale channel.

$1.225 Billion in revenue is very unhealthy for AMD. This time last year AMD reported sales of $1.33bn and profit of $185 million. This was not including Ati. Q1'06 Ati reported revenue of $591 million, and a small profit of $7 million. So together, last year, AMD and Ati had a Q1 revenue of 1.921 billion and profits of $192 million. If we assume that AMD meets it's estimated revenue, that's a decline year over year of 37%. AMD will also post a significant loss.

This is primarily "due to lower overall average selling prices and significantly lower unit sales".

AMD is clearly losing market share, and revenue share of processors.

AMD BK Q2'08.

7:03 AM, April 09, 2007  
Blogger Evil_Merlin said...

Wow, the fanboi's are still listening to this fool?


Anyone who claims AMD's "server" processors are superior to Intel's has never really done much testing.

I have. I run a 1000+ server data center. About 70% Intel and 30% AMD. Since the Woodcrest, we have been moving to exclusively Intel

Why? Exchange 2003 and 2007 performance on the Woodcrest is about 23-27% faster across the board than the like priced AMD CPU.

Why? SQL 2005 performance under x64 is up almost 40% with the Woodcrest CPU.

Not to mention I can actually buy the Xeon 5300 family NOW and for a decent price.

Get out of your dreamworld FanBoi's Sharikou is nothing but a clown.

8:24 AM, April 09, 2007  
Blogger PENIX said...

Evil said...
Anyone who claims AMD's "server" processors are superior to Intel's has never really done much testing. Why? Exchange 2003 and 2007 performance on the Woodcrest is about 23-27% faster across the board than the like priced AMD CPU. Why? SQL 2005 performance under x64 is up almost 40% with the Woodcrest CPU.


Where can we review this "test" of yours? Where can we view the proof that these numbers are not just the gross fantasy of an obvious fanboy with a hard on for Intel?

9:01 AM, April 09, 2007  
Blogger netrama said...

Why? Exchange 2003 and 2007 performance on the Woodcrest is about 23-27% faster across the board than the like priced AMD CPU.

Why? SQL 2005 performance under x64 is up almost 40% with the Woodcrest CPU.


What.. is this a joke , you sound like those fake benchmarks posted in those Intel slides, To even read their footnotes, I had to put a 10 man team.

Since the Woodcrest, we have been moving to exclusively Intel
Yeah right ..and before that you were 100% AMD. Dont give us bull. Idi*ts like you take what ever Dell shoves down your throat , less care what is inside.

9:28 AM, April 09, 2007  
Blogger Christian Jean said...

Although unfortunate, there are many ignorant people like 'evil'.

For example, I had an argument with a friend of mine the other day. He was always divided, but usually favored AMD. Recently he told me that his company were ONLY buying Intel processors.

When I asked him why, I only got comments in the likes of 'AMD sucks'. But with a little more persuasion I got enough information out of him to start investigating.

It turns out that their computer purchases were based on the performance during the compilation of their Linux distros. But for some reason (don't ask me why), they were compiling their Intel machines in 32-bit, but compiling their AMD machines using 64-bit.

It took almost 30% more time for a Linux distro to compile on AMD than it did on Intel and took that as face value that Intel as MUCH faster!!

One might think a performance increase on 64-bit right? But it didn't and I couldn't figure out why! A little digging around and I quickly found out that it takes more time to compile 64-bit code than 32-bit code on the same machine (ok, so I didn't know, I usually go get a fresh cup of coffee during these compilations).

Reading the GCC archives I came to realized that there is no optimization into the compilation algorithms, ONLY in the compiled binaries. Further research indicated that this is because there are more registers, features (SSE2, SSE3, etc) and larger algorithms.

Technically speaking, if you compiled a 16-bit Linux and a 32-bit Linux on a 2GHz machine, the 16-bit Linux should be compiled much faster.

Also, if you compile with GCC 2.2, it should be faster than 2.3 and faster than 2.4, etc, etc.

I could only send him my research and hope they would realize the ridiculous assumptions they had made.

But examples like this exist all over the place all the time.

11:21 AM, April 09, 2007  
Blogger Ho Ho said...

jeach!
"It took almost 30% more time for a Linux distro to compile on AMD than it did on Intel and took that as face value that Intel as MUCH faster!!"

Interesting, I don't see much difference compiling my Gentoo on either 32bit or 64bit with my C2D. Most certainly not 30%, I'd say 0-10% at most and that is mostly thanks to bigger cache usage (32 vs 64bit pointers) and no macro-op fusion. On AMD64 the difference shouldn't be nearly as big.


"Reading the GCC archives I came to realized that there is no optimization into the compilation algorithms, ONLY in the compiled binaries"

I know there was some compiling performance drop going from 2.95 to 3.2 and to 4.0 but with latest 4.1 and 4.2 snapshots things are better. As I said, there shouldn't be anywhere near that big performance difference between 32 and 64bit for as long as compiler versions are the same.


"Further research indicated that this is because there are more registers, features (SSE2, SSE3, etc) and larger algorithms"

Say what? Having more registers can only increase performance. Though having twice as big pointers pretty much nullifies the performance gain given by additional registers. Features and algorithms are exactly the same on 32 and 64bit architectures so thay can in no way affect compiling speed.


"Technically speaking, if you compiled a 16-bit Linux and a 32-bit Linux on a 2GHz machine, the 16-bit Linux should be compiled much faster."

Well, that is actually impossible since there is no 16bit Linux. If there were then I'm quite sure compiling 16bit one can actually take a lot more time thanks to all the memory segment handling. It's not that nice to have at most 64k blocks of RAM and it literally kills performance since you can't use object pools that effectively.

As I said several times, compiling speed is not that much effected by 32 vs 64bit. Some people made a benchmark here and here on amd64. In first case, compiling got around 1-2% slower but when compiling Linux kernel it actually got considerably faster.

Only interesting thing is that for them compiling the kernel takes over a half an hour whereas I compile mine in less than four minutes. I wonder if I could get their .config somewhere ...


"I could only send him my research and hope they would realize the ridiculous assumptions they had made."

Please don't, you'd only make a fool out of yourself when you try to claim similar things you did here. Though if you do have that research somewhere then I'd be interested in reading it further. I'd like to see what mistake you made there.

12:30 PM, April 09, 2007  
Blogger realgenius said...

Right now Newegg is selling the am2-6000 for $239.
At $239 this processor cant be beat for value and performance.
Intel has nothing with this kind of performance at this kind of price.
BEST DEAL AMD AM2-6000 for $239 what a great deal on a high performance cpu.
You can put this AMD cpu on a $50 motherboard with ddr2-800 and blow away 95% of the intel line up.

12:50 PM, April 09, 2007  
Blogger abinstein said...

ho ho:"Say what? Having more registers can only increase performance."

Ho Ho, it seems to me that you really have poor grasp of any idea that's slightly more complex than simplest arithmetics.

The more registers of the compilation target architecture means a much higher complexity in terms of optimization the compiler has to go through. That will contribute to longer compilation time greatly.

1:38 PM, April 09, 2007  
Blogger Ho Ho said...

abinstein
"The more registers of the compilation target architecture means a much higher complexity in terms of optimization the compiler has to go through."

I'm sorry, sir, but you are plain wrong in every sense. have you actually got the slightest idea how compilers work?

Under x86 compilers have very few registers to work on, 8 (7) on 32 and 16 on 64bit. The less registers there are the more complicated it gets to decide what to keep in registers, when to swap out to RAM (cache) and when prefer math over caching data in registers.

I backed up my claims with different benchmaks that show almost nonexistent speed difference or even faster speeds in 64bit. Where is your proof that compiling for 64bit target takes more recources/is more complicated?

1:54 PM, April 09, 2007  
Blogger InorganicMatter said...

There's no way you are this dumb.

OK then, let's do an X264 conversion test. You 3600+ against my X6800. Still wanna play?

5:41 PM, April 09, 2007  
Blogger Evil_Merlin said...

Actually we use HP in house, not Dell. Before we moved to buying all Intel, we were about 70/30. 70 going to AMD, but with the price/BTU/power savings we get by moving to Woodcrest (not to mention that all the newest HP stuff is typically Intel first), it was a no brainer.

You don't have to like the math, but its the pure unadulterated truth fanbois. Keep following your fake PHd god into to the ground. I swear you seem almsot as bad as the Mac folks...

6:49 PM, April 09, 2007  
Blogger Christian Jean said...

Please don't, you'd only make a fool out of yourself when you try to claim similar things you did here.

You remind me of that 'know-it-all' kid at a meeting table which keeps blabbing and everyone tries to ignore.

Anyway, I was going to put you into your place but 'abinstein' did it for me (thanks)!

I lost the links I had and only found a few things... here they are!

Read 'X.org compile time'

GCC Thread

Compiling for many registers

Anyway, you get the general idea... do your own research if you want more info.

9:43 PM, April 09, 2007  
Blogger Ho Ho said...

Are you arguing against yourself now?

From the last link:

"It is defineately easier to optimize for amd64 because if its increased # of registers. But I'm not sure even "slower" is a valid claim -- on the i386 the compiler has to do a lot of time figuring out the best way to do the spill code (when, where)"

Exactly the thing I was talking about.


In the X compiling as he said, most of the time was spent by system, that means in kernel. My best guess is that it took more time there because of bigger chunks of memory that were allocated.

In the GCC thread it was also said that it is easier to compile for AMD64. Also that thread is more than three years old, do you really think things talked there are still valid, especially considering that GCC 4.x series use a whole new tree-ssa algorithms?

11:23 PM, April 09, 2007  
Blogger Christian Jean said...

You really are useless aren't you?

Bugs are fixed, enhancements are made, but YES things are relatively the same... 2004 or not.

5:16 AM, April 10, 2007  
Blogger Christian Jean said...

I just couldn't let it go!

Exactly the thing I was talking about.

And what was that? That your an expert and you have personal experience handling 'spill code'?

If you have 8 registers or 16 registers, you STILL will have 'spill code'. In that respect it doesn't change. So trying to figure it out on one or the other, your still trying to figure it out.

BUT, when you have double the number or registers, additional instructions and features (SS4), you spend a hell of a lot more time trying to optimize things.

Now, if you don't understand that, I'm not going to sit and argue with you. But if you insist on your view, I would suggest that you send the GCC team your resume because they are looking for a guy who 'knows' how to make 64-bit compile as fast or faster than 32-bit.

5:26 AM, April 10, 2007  
Blogger Ho Ho said...

jeach!
"Bugs are fixed, enhancements are made, but YES things are relatively the same... 2004 or not."

No, they are definitely not the same. I saw big changes going from 3.3 to 3.4 and to 4.1. Things are definitely not the same and the benchmarks performed a few months ago I referred before back me up. I've seen nothing that would back up your 30% speed loss claim, not even from >3 years ago.


"And what was that?"

Register spilling. As have been said earlier, there is quite little instrucion level parallelism in most programs. That means you don't have to deal with that many parallel operations taking place in different registers and compiler doesn't have to constantly "think" what to keep in registers and what to spill. The less registers you have the more careful you have to be not to spill out the ones that have most impact on performance.


"That your an expert and you have personal experience handling 'spill code'?"

As a matter of fact, I do have some experience with it. I gained ~10% performance when I managed to reuse one SIMD register. Before that my inner loop couldn't fit all the variables to the 8 availiable registers. Under 64bit I wouldn't have had that problem.


"If you have 8 registers or 16 registers, you STILL will have 'spill code'"

Of cource you do sometimes, but with more registers it isn't nearly as often as with fewer.


"BUT, when you have double the number or registers, additional instructions and features (SS4), you spend a hell of a lot more time trying to optimize things."

1) As I've said, more availiable registers make compilers life easier, not the opposite no matter what you claim. People in mailing lists have said that too, even in the messages you linked to.
2) Additional instructions are there in 32bit too, there is nothing exclusive in amd64.
3) There are exactly zero CPUs availiable with SSE4. First ones will be K10 and Intel 45nm C2s.

Also GCC's autovectorizing (automatical SIMD code generation) is not exactly working right now. Somewhat working implementation will be merged to mainline in 4.3 series that gets released in a bit more than a year. It has some very basic things in current 4.x series and next to nothing in 3.x.


"But if you insist on your view, I would suggest that you send the GCC team your resume because they are looking for a guy who 'knows' how to make 64-bit compile as fast or faster than 32-bit"

Perhaps it is you who should send them some datasets they could use to see their compiler running 30% slower in 64bit than in 32bit. They do have regression tests, you know. Also i myself and many other people on average see increased compiling speed under 64bit. It has to be a very special case to see 30% speed drop.


Two questions:
1) what GCC version was used on those machines you saw that 30% speed drop?
2) What CPUs were in those machines?

My guess is they had slow AMDs and fast Intels and that is where the difference came, not from 32/64bit.

5:49 AM, April 10, 2007  
Blogger abinstein said...

Ho Ho:"As have been said earlier, there is quite little instruction level parallelism in most programs."

Ho Ho, as Jeach said, you are too sure of the things you said, which often puts you on the wrong side.

There are in fact a great deal of ILP in most programs. IIRC, in most benchmarks ILP is between 4 to 16, and sometimes higher. Why else would there be 24/32 reservation stations in Yonah/Conroe?


Ho Ho:"That means you don't have to deal with that many parallel operations taking place in different registers and compiler doesn't have to constantly "think" what to keep in registers and what to spill."

As I observed & pointed out somewhere else some time ago, you are (still) confusing architectural registers with physical registers.

Architectural (ISA) register has little to do with keeping up with ILP in modern x86 processors. That job is performed internally by register renaming in the reordering buffer. In x86, ISA registers are most useful for compiler generated temporaries. Certainly, more ISA registers means compiler has more freedom (and higher complexity) to manage its temporary values.

Ho Ho:"The less registers you have the more careful you have to be not to spill out the ones that have most impact on performance."

What you said is not totally wrong, but mostly not correct, either. First, there are different heuristics for register allocation. One can start with an infinite number of registers and spill data down to the number of physical registers (Chaitin's); or one can start with data in memory and allocate them to one register after another (Chow & Hennessy's). (There are still others but we'll not skip them here.) In the former, more registers means less iterations of spilling; in the latter, more registers means more iterations of allocations.

However, all these is irrelevant, since it's not even sure that gcc allocates registers by graph coloring. No matter what it uses, I highly doubt that gcc would adopt a heuristics inefficient and slow for architectures with few registers, which are the main target of gcc for the past 20 years.

Second, register allocation is more than just graph coloring. One of my previous project is scalar replacement of register usage in arrays/loops. When solved as the 0-1 bin-packing problem the complexity is O(n*m) where n is # of registers and m is loop count.

3:32 AM, April 12, 2007  
Blogger abinstein said...

Ho Ho"1) As I've said, more availiable registers make compilers life easier, not the opposite no matter what you claim."

Unfortunately, what you said is based on Chaitin's register allocation by graph coloring heuristics and probably not what any real compiler uses at all.


Ho Ho:"2) Additional instructions are there in 32bit too, there is nothing exclusive in amd64."

There are many things exclusive in AMD64, which, among others, defines extensions to the ISA in order to access the additional architectural registers.


Ho Ho:"Also i myself and many other people on average see increased compiling speed under 64bit."

Why do you keep being confused of execution architecture and target architecture? Compilation running on 64-bit arch is faster, perhaps, but compilation for 64-bit arch is slower.

3:52 AM, April 12, 2007  
Blogger Ho Ho said...

abinstein
"Ho Ho, as Jeach said, you are too sure of the things you said, which often puts you on the wrong side."

Could you explain why? I'm wrong because I know what I'm talking about and don't put "think", "guess", "IIRC" everywhere as most other people here do?


"There are in fact a great deal of ILP in most programs."

What makes you think that?


"IIRC, in most benchmarks ILP is between 4 to 16, and sometimes higher"

I'm sorry but do you even know what ILP means? Of cource it might be simply bad wording that made me wonder that.

ILP shows how many instructions you can execute in parallel without having dependancies on previous instructions.

If ILP would be as high as you claim then we would see at least 4 instructions executed per clock cycle but in real world it is much closer to 0.4. Don't believe me? Just install and to see for yourself. I did and I know what am I talking about.


"Why else would there be 24/32 reservation stations in Yonah/Conroe?"

There are several instructions in flight at any point of time, most of those are in different states of execution. Also it has quite a lot to do with the fact that x86 CPUs are internally RISC where single x86 operation can produce several uops. Sometimes those uops need some space to store intermediate results.


"As I observed & pointed out somewhere else some time ago, you are (still) confusing architectural registers with physical registers."

What has that got to do with anything when we are talking about how compilers have to optimize their code?


"No matter what it uses, I highly doubt that gcc would adopt a heuristics inefficient and slow for architectures with few registers, which are the main target of gcc for the past 20 years"

It might be news for you but GCC has rather bad
register allocator that produces inferior code when having only a few registers availiable. In 32bit ICC generates vastly better code than GCC mostly thanks to better registry allocator. In 64bit things get turned around and GCC wins most of the time since it is not spilling that much any more and it generally has better optimizations. I hope you know that x86 is not the only architecture GCC is used for but 32bit x86 is an architecture with the least registers availiable.

I've personally seen exact same code generated with ICC and GCC to be 30% slower in 32bit with GCC and 10% faster in 64bit. ICC was roughly at same speed for 32 and 64bit.


"Unfortunately, what you said is based on Chaitin's register allocation by graph coloring heuristics and probably not what any real compiler uses at all."

I haven't studied what kind of register allocator GCC uses but I know from personal experience that under 64bit it is considerably faster. Do you have any ideas why it is so?


"There are many things exclusive in AMD64

... but nothing even remotely similar to what you listed before.


"which, among others, defines extensions to the ISA in order to access the additional architectural registers"

I know it does but how does it affect anything we are talking about?


"Compilation running on 64-bit arch is faster, perhaps, but compilation for 64-bit arch is slower"

I highly doubt that. I currently can't prove it though. So far I have personally compiled 32bit programs under 32bit and 64bit programs under 64bit. I'll see it later when I get enough time to make some tests under 64bit to compile 32 and 64bit versions of some programs using 64bit compiler. Any examples you would like to see compiled? Remember that they must be availiable under Linux.

Of cource feel free to do your own benchmarks.

4:33 AM, April 12, 2007  
Blogger Ho Ho said...

woha, I just noticed how the URL got screwed up. I meant to say "Just install PAPI to see for yourself." For some reason it removed the PAPI part and made everyhing else into an URL. Sorry about that.

6:13 AM, April 12, 2007  
Blogger abinstein said...

This comment has been removed by the author.

10:35 AM, April 12, 2007  
Blogger abinstein said...

Ho Ho:"If ILP would be as high as you claim then we would see at least 4 instructions executed per clock cycle but in real world it is much closer to 0.4."

Ho Ho, you are confused of the actual ILP in the program with the ILP exploitable by processor architecture.

Take a look at section 3.8 of Hennessy & Patterson CA AQA 3rd Ed. and you'll see the ILP available in a practical processor is between 4 and 150.

The 0.4 you talk about is the ILP that some particular processor can exploit, due to limited #issue/cycle AND windows size AND memory conflict AND branch prediction. This value is totally meaningless to compiler, whose interference graph is constructed with data value's life range.


Ho Ho:"What has that got to do with anything when we are talking about how compilers have to optimize their code?"

You (wrongly) claimed that the low ILP in programs makes compiler's life easier. This is wrong. Compilers, in doing register allocation, do not deal with ILP, but value's live ranges. Compiler can assign only architectural registers, whereas ILP is improved by number of physical registers. You got confused of the two.


Ho Ho:"In 32bit ICC generates vastly better code than GCC mostly thanks to better registry allocator. In 64bit things get turned around and GCC wins most of the time since it is not spilling that much any more and it generally has better optimizations."

You are still confused of the running architecture with the target architecture. We are not talking about performance of the target code, but the compilation itself.

As I pointed out and you conveniently ignored, there are different heuristics for register allocation, whose optimal solution is NP-hard. Different heuristics have different complexity dependency on number of available, and since you mention icc, I'm sure its running time is optimized for target architectures with few registers.


Ho Ho:"'There are many things exclusive in AMD64'

... but nothing even remotely similar to what you listed before."


You are confused of what I said with what Jeach said. I only pointed out your obvious error in claiming there's nothing exclusive (in terms of instruction complexity) in AMD64.

10:38 AM, April 12, 2007  
Blogger abinstein said...

Ho Ho:"I'm wrong because I know what I'm talking about and don't put "think", "guess", "IIRC" everywhere as most other people here do?"

For the pity of you I'm going to let you know why and what you are wrong.

You are wrong in assuming compilers follow only Chaitin's heuristics for register allocation. If that was true, then more available target registers probably make compilation easier. However, it is usually not the case, and mostly not for x86 compilers.

You are wrong in assuming program ILP is very low. Please as I suggested read CA AQA, which would give you a better perspective on what is available ILP and what is exploited/exploitable ILP (values you see from PAPI or whatever tool).

You are wrong in saying that, as long as additional instructions are in 32-bit, there's nothing exclusive to AMD64. Quite the contrary. In order to address the higher memory address range and memory-mapped IO and the more registers, AMD64 ISA is definitely more complex (but more elegant in 64-bit mode).

You are also wrong in saying:"ILP shows how many instructions you can execute in parallel without having dependancies on previous instructions." Every instruction depends on some previous one. ILP is the number of instructions, in run time, that do not depend on each other.

It's really a pity that the things you said so surely are so much more wrong than those things other said if they remember correctly.

10:59 AM, April 12, 2007  
Blogger Tanrack said...

In elder scrolls the quad FX74 setup is slower than a dual core X2 6000+. It is then also obviously slower than the QX6800. They are both running at the same clockspeed.
My question then to you Sharikou is this:
If you want the fastest CPU money can buy and CPU X is faster than CPU Y, which one will you buy?

The topic should have been “AMD’s 4x4 is only slightly faster than the $64 X2 3600”
The AMD’s 4x4 is even slower than the lower clocked 5600+ in the cherry picked benchmark.
The Rainbow six benchmark is less meaningful on the high end CPUs, but the 4x4 is again slower than the E6700 on the average frame count.
If you look at the Valve Source engine particle simulation benchmark it looks even worse for AMD. Even Intel’s slowest “outdated architecture” quad core is faster than the “superior architecture” of AMD’s quad setup, much of the same for Valve VRAD map compilation, The Panorama Factory, picCOLOR and Windows Media Encoder x64 Edition

LAME MP3 encoding AMD X2 faster than 4x4 other wise much of the same, AMD is slower than Intel

Cinebench is the first real world benchmark where the 4x4 is worth a look, still slower than Intel

POV-Ray rendering beta, one to AMD, POV-Ray's official benchmark scene is back to Intel, but AMD doing well.

AMD gets hammered in STARS Euler3d computational fluid dynamics where their 4x4 falls behind Intel dual cores again
Folding@Home In the first two benchmarks it looks like AMD has a winner, but wait, the next 4 belongs to Intel. This is the benchmark cherry picked to show that Intel is in trouble.
In SiSoft Sandra Mandelbrot Integer x16 AMD can't even beat the E6300. The Floating point one looks better, but AMD still gets it.
Power consumption and efficiency shows that AMD is slower and uses more power. even the 3600+ used more power than the Q6600 almost as much as the QX6800.

From the site you used to show how bad Intel is:
"Conclusions
You may have gathered that the Core 2 Extreme QX6800 is the fastest desktop processor that money can buy. No nuance is required to discuss this one. The QX6800 nearly swept our entire benchmark suite, and in many cases, it crushed its main rival, the Athlon 64 FX-74."

Right now AMD has only one thing going for it, and that is price. They sell the cheapest CPUs, not the fastest, not the best.

9:03 AM, April 16, 2007  
Blogger Unknown said...

you people all need to realize that quad cores etc. are for multitasking not for running games software faster... they are much better because they can run 4 applications (of same magnitude) at very close to the same speed that a regular single core of the same speed can run one.
The thing is if you are going to be ripping a dvd, while running windows and other small applications, plus playing a game it will be way way way faster on a quad-core than using a dual-core to do the same thing...

Thats where the core count adds up ;)

11:17 PM, May 10, 2007  

Post a Comment

<< Home