Woodcrest seems to have issue with large amount of RAM
With 2-4GB ram, Woodcrest was faster than Rev F Opteron by 2%. With 16GB ram, Rev F Opteron was ahead by almost 11% in doom3, quake4, prey and Painkiller BOH.
Core2 is inherently a 32 bit chip.
Intel has already found the same excuse for its market share loss. Again, it's having chipset shortages. It seems that Intel needs to copy-exact three more FABs.
Meanwhile, morale is extremely low, as thousands of workers wait for their pink slips.
84 Comments:
Are you going to set the record straight??
http://www.theinquirer.net/default.aspx?article=33822
Only a big man does!
" Are you going to set the record straight??
http://www.theinquirer.net/default.aspx?article=33822
Only a big man does!"
What the HELL does this have to do with the topic at hand? So what? Find another topic to bitch about it where it's relevant! Dell has flammable laptops, don't buy one! Thanks, drive thru please!
Alas sharikou is no big man infact he is an idiot. If he had an ounce of moral or ethics he would post a retractiong of every idiotic comment he made about the laptop being the processors fault and tell everyone that anything he says should be taken as ignorant garbage and the only people that will fully believe him are ignorant like him.
You are probably not going to post this since it is the truth.
Sharikou, Ph. D said...
"With 2-4GB ram, Woodcrest was faster than Rev F Opteron. With 16GB ram, Rev F Opteron was ahead by almost 11% in doom3, quake4, prey and Painkiller BOH."
Lets make this very clear, had these benchmarks gone in favor of Intel, AMD supporters would be screaming bloody murder for system specs.
Memory?
Motherboard?
Processor clock speed?
So does that would mean that the benchmarks are null and void... of course not they back up Sharikou's opinions.
This only shows his bias.
Hey Sharikou, I like your blog, but this is pretty weak.
Intel Bull Shit
Chip shortage is because the June launch was on paper, its complex chip to make and besides both of the people in the U.S. that wanted the 965 already have them.
No offense, but at least you acknowledged that Woodcrest beats Socket F in up to 4GB of RAM. That was pretty big of you considering your outright denials that Woodcrest could even come close to anything AMD has.
You are misquoting the reviewer.
http://www.amdzone.com/index.php?name=PNphpBB2&file=viewtopic&p=113256&sid=e16b1ca9d35a51bc70567090f65463fe#113256
In honesty, we ran the games at a resolution too high (2560x1600) to really point to cpu bottlenecks, the only real conclusion was that the a single 7950 ran games 11% faster than it did on Woodcrest.
He found that he didn't really conclude anything to the effect that Woodcrest was an outright loser. Besides, you aren't going to play games on a server.
Wood crest was indeed attractive on many levels, but on my own personal score sheet, the ref F opterons seemed to win.
He just saids that Rev F is better for him not that Rev F dominates Woodcrest or anything to that effect.
You noticed he didn't say what Woodcrest? He himself is using near top of the line 2.6GHz 2218s, but all he saids is that his friend has a Woodcrest. There is a 1.6GHz 1067MHz FSB Woodcrest too and if something like that was used obviously it would be unfair.
" Anonymous said...
You are misquoting the reviewer.
http://www.amdzone.com/index.php?name=PNphpBB2&file=viewtopic&p=113256&sid=e16b1ca9d35a51bc70567090f65463fe#113256
In honesty, we ran the games at a resolution too high (2560x1600) to really point to cpu bottlenecks, the only real conclusion was that the a single 7950 ran games 11% faster than it did on Woodcrest.
He found that he didn't really conclude anything to the effect that Woodcrest was an outright loser. Besides, you aren't going to play games on a server."
I will never understand these kids..
if their cpu loses in a game, they say superpi and apache calculations are the to be taken in point, not games..
but when opteron wins in apache & other stuff ( like they did before ) they punched again superpi or games...
nice way to change the "real important task" for your needs.
I will never understand these kids..
if their cpu loses in a game, they say superpi and apache calculations are the to be taken in point, not games..
but when opteron wins in apache & other stuff ( like they did before ) they punched again superpi or games...
nice way to change the "real important task" for your needs.
Really? Who are the people claiming that Woodcrest is the ultimate gaming machine? I don't think anyones said that especially considering the cost of FB-DIMMs and the fact that the 5000X chipset doesn't support either SLI or Crossfire.
Find something else to complain about.
That really doesn't surprise me. As everyone should've known, Core 2 Duo is a good design AS IS, but has poorer scalability compared to K8 (which is BTW 3-year-old). Now this guy confirmed that Woodcrest doesn't scale well to large memory on 2P 4 cores (16GB is a bit excessive though).
True that servers are not mean to be evaluated only by gaming - but aren't these game benches what Intel fans have been using to tout Core 2 Duo? Oh well, they also used SuperPi, which I guess is very useful to all those mathematically inclined individuals eager to find out what Pi is...
BTW Opteron also performs better on Photoshop operations. True as the reviewer said it's memory intensive and probably gives Opteron more advantage. That again proves at least partially that Core 2 Duo is good for benchmarking where data access and working set is confined.
It should be clear that 64-bit and NUMA are the definite future - the question is when. Both will bring benefit to the users, and it's clear here which side (AMD v.s. Intel) is pushing for the advance, and which is trying to delay it.
Core 2 is 32-bit! Oh that means Apple's OS-X is 32-bit and not 64-bit as they claim it to be.
The WinXP-64 that I bought, and is running perfectly, then is only 32-bit because I'm running it on a laptop with a Core Duo.
Please tell me who should I take to court, Apple for proclaiming that OS X is 64-bit or Microsoft for packaging WinXP64?
I am a C++ programmer and a device driver developer. I have 32-bit and a 64-bit compiler. Both of them works on my WinXP64 installed on my Core Duo laptop. Both compiler also work on my Athlon desktop.
As a device driver developer, I had to use a digital oscilloscope and take a snap shot of the data bus while in operation. With the Core Duo, I counted all 64 bits during 64 bits operation and 32 only during 32 bit operation. Its the same principle with the Athlon. Now tell me where you get the info that the Core or Core 2 is 32-bit and I'll be very happy to show you my snap shots of the Core and Athlon.
- Mr. Device Developer
Guys, that isn't impossible,
Game performace can be translated in low latency, everything that reduce the latency (big caches, faster memory, MCT, etc) helps,
The WoodCrest use FB-DIMM, these memories allow an almost ilimited amount of RAM, but have a huge latency and putting more FB-DIMMs sticks the latency increase a lot,
games don't take advantage of 16GB RAM, then comparing an Opteron with 70ns of memory latency with an Woodcrest with 200-300ns of memory latency will result in an easy victory for Opteron.
PhD pretender continues to search for excuses for AMD.
Bottom line who is faster? INTEL
When Opteron was faster you didn't here anything about how it was a bit less faster at one benchmark vs another. When you are behind you make excuses as to why you are behind.
DId you know that INTEL will make probably 5 or 6 billion. Did you know INTEL's profits will still exceed AMD's revenue in their bad year... what a sad sitatuion.
Evoke Copy Exactly as an excuse you don't know shit sharikou and it shows again and again. TOo bad you are nothing but an AMD cumm sucker
Some crackhead said:
"The WinXP-64 that I bought, and is running perfectly, then is only 32-bit because I'm running it on a laptop with a Core Duo.
Please tell me who should I take to court, Apple for proclaiming that OS X is 64-bit or Microsoft for packaging WinXP64?"
You should take Intel to court for making you believe Yonah has 64-bit support. I don't know what your running, but apparently neither do you, because Yonah does not have EM64T.
Haha. This is hilarious. You're using vague references to vague benchmarks done by some guy and his "buddy" on the AMD fanboi forum as proof of something?
This is utterly ridiculous, and pretty sad even by your non-existent standards. Well respected websites did reproducible, well documented benchmarks showing what Core 2 Duo can do and you dismissed them as "paid pumpers".
You, sir, are a joke and a liar.
The only thing your little blog "frags" is your own reputation.
guys, guys... Take it easy..
A clown's job is to entertain his audience. Nobody ever takes a clown seriously.. Just pretend you are smiling at him..
Well Sharikou, in clowntown, you are the PhD.. Congratulations!
Dr. Fragatron,
did you try being less of a fanboy and more of a scientist?
Try it one day, it is good, I promise..
Have you read tomorrows headlines?
The pretender frags himself!
the pretender said:"Meanwhile, morale is extremely low, as thousands of workers wait for their pink slips."
Maybe you can show them how to beocme PhDs and bitter bloggers!
Sharikou,
you really remind me of George W Bush.. He still thinks Iraq has WMD.. Doesn't that ryhme wiht Phd..
oh!
"Game performace can be translated in low latency, everything that reduce the latency (big caches, faster memory, MCT, etc) helps,"
Do you have reference to back up this claim? You probably mixed up network latency with memory latency. The former are usualy in the order of 10s ms; the second in 100s ns. There's a 100x difference between the two. That is, a 2 or 3 times of memory latency won't make the game feel sluggish at all (as if it had high network latency).
Besides, Woodcrest does have 4MB large smartcache, whose fill-up and write-back are in terms of blocks (i.e. dominated by bandwidth). Since games data usually have excellent locality, I frankly don't see latency a big issue for Woodcrest. We already know that latency affects K8 much more than Core 2 Duo.
Plus, using DDR2 will only make Woodcrest perform worse, because its bandwidth is not sufficient to feed the 4 cores. This IS a problem of Core 2 Duo, that its FSB memory architecture does not scale well for multiple cores (4 and up). FB-DIMM is what Intel could come up with best to help Core 2 Duo in the workstation/server space.
Some clown wrote:
Core 2 is 32-bit! Oh that means Apple's OS-X is 32-bit and not 64-bit as they claim it to be.
The WinXP-64 that I bought, and is running perfectly, then is only 32-bit because I'm running it on a laptop with a Core Duo.
Please tell me who should I take to court, Apple for proclaiming that OS X is 64-bit or Microsoft for packaging WinXP64?
I am a C++ programmer and a device driver developer. I have 32-bit and a 64-bit compiler. Both of them works on my WinXP64 installed on my Core Duo laptop. Both compiler also work on my Athlon desktop.
As a device driver developer, I had to use a digital oscilloscope and take a snap shot of the data bus while in operation. With the Core Duo, I counted all 64 bits during 64 bits operation and 32 only during 32 bit operation. Its the same principle with the Athlon. Now tell me where you get the info that the Core or Core 2 is 32-bit and I'll be very happy to show you my snap shots of the Core and Athlon.
- Mr. Device Developer
Seriously what are you injecting yourself with?
There is no way you can install WinXP-64 in a Core Solo/Duo (not Core 2), you get a message saying your processor is not supported. You can install a C++ compiler, but it will run in 32-bits and even if it allows you to produce a 64-bit binary you won't be able to run it. (By the way Visual Studio 2005 DOES NOT allow you to produce a 64-bit binary on a 32-bit system.)
My favorite part is about the digital oscilloscope. What did you do man? Connect a device on the motherboard traces? Or on the CPU pins? And I suppose it was able to measure signals that last only a few ns. Not to mention the fact that you have a serious confusion about the data bus width and the width of a processor's registers. Dude the data bus has been 64-bit wide since the first Pentium, modern CPUs have a 128-bit wide data bus (Athlon 64 S939/AM2 in dual channel configuration).
P.S. I would love to see those snapshots Mr. Device Developer.
Mr. Device Developer is probably the same guy who in the 80's touted that the 386SX is a 32-bit processor since he scoped the registers.
Woodcrest is today's 386SX.
Edward said...
That really doesn't surprise me. As everyone should've known, Core 2 Duo is a good design AS IS, but has poorer scalability compared to K8 (which is BTW 3-year-old).
who care it is 3 years old or not. Intel gives the better, cheapr, lower power CPU now. When the K8L is finally out next year, just go ahead and compare the K8L with the supposingly 1-YEAR-OLD C2D assuming Intel has yet to launch its new product by the time. Whoever gove the better value($, perf, power, etc) to the customer will win.
That again proves at least partially that Core 2 Duo is good for benchmarking where data access and working set is confined.
you have been asking people to prove whatever they claim. Now prove this. There are some synthesis benchmark which you can say that the data are confined. But there are also whole lot of benchmark running on real system, real apps and yet proving C2D superiority.
It should be clear that 64-bit and NUMA are the definite future - the question is when. Both will bring benefit to the users, and it's clear here which side (AMD v.s. Intel) is pushing for the advance, and which is trying to delay it.
While you have said to people in the other topic putting vague statement. Are you refering to whole markets? NUMA for MP server, YES, NUMA for desktop, NO, NUMA for mobile, NO. If you know those archetcture well, NUMA and UMA has their own advantages and disadvantages.
Take mobile for example, there are a few keys for its success, power, wireless, and form factor. NUMA on this guy would just means more power, bigger.
Btw, expect to see a boom of softwares that make full use of multithreading. Intel has been always the industry enabler and with its multicore CPU in the market, it is hard to believe Intel will not push multithreading into its software partner/ecosystem.
This blog has disintegrated and degenerated into the gutter.
Sharkie using some anecdotal test from a random message post to try and validate his spunk daddy's superiority is pathetic.
No specs, no real tests, no controlled environment = no comparison.
Even Shrek gets that.
The idiotic comments on this thread are equally pathetic.
With everything here turned ot utter shite, appropriately, the fame timer just ticked 14:59.
http://www23.tomshardware.com/cpu.html?modelx=33&model1=430&model2=464&chart=171
Leap ahead.
Core Duo (Pentium M "Yonah") = 32-bit only
Core 2 Duo (Merom architecture or Pentium M II) = AMD64 compatible ;)
(By the way Visual Studio 2005 DOES NOT allow you to produce a 64-bit binary on a 32-bit system.)
You can cross-compile it...You can't tell me with a straight face that Visual Studio's compiler doesn't let you do that.
"Btw, expect to see a boom of softwares that make full use of multithreading."
I'd really like to know how do you conceive a 32-core, superscalar & OoO multithreading with uniform memory access.
If each core has its own large cache, NUMA will help. Or won't it? If a notebook processor has 8 cores and four memory links (one link for two cores) each operating at 800GHz, won't it consume less power than a processor with one super-fast link operating at 3.2GHz for all 8 cores? Yes, it's future talk, but that's my point of scalability.
"But there are also whole lot of benchmark running on real system, real apps and yet proving C2D superiority."
I think the Photoshop test the reviewer ran IS a real application on real systems. That's the partial proof I was talking about. You can dispute the validity of that test however you wish, though.
"If a notebook processor has 8 cores and four memory links (one link for two cores) each operating at 800GHz,"
Oops... I meant to say each memory link operating at 800Mhz.
"Core 2 is 32-bit! Oh that means Apple's OS-X is 32-bit and not 64-bit as they claim it to be.
The WinXP-64 that I bought [...] then is only 32-bit because [...]
I am a C++ programmer and a device driver developer. I have 32-bit and a 64-bit compiler. [...]"
Wow! With comments like that I think you should change career because your not very good.
Let me explain it to you in "programmer talk" then if you don't understand Sharikou's blog.
Think of the decorator pattern when you program, but instead of using software, think of it in terms of hardware.
So what Intel has basically done is they created a 'decorator' called 'Core 2 Duo' that wrapps a legacy object called the 'Pentium'.
So on the exterior it looks and feels like 64-bit (you can even count the bits with whatever tools you like to count your bits with), but internally Intel does whatever it has to do to make it work like a 64-bit processor.
Seriously... you must work for Microsoft?? Common admit it, we won't laugh!! :)
Intel should have stuck with regular registered ECC RAM. Those FB-DIMMs suck.
Yes their fully buffered, but they have extremely high latencies with multiple dimms per channel because of the AMBs. Their also alot hotter, allowing AMD to stay competitive with them for total system power usage. Last, but certainly not least, their more expensive.
If AMD didn't announce future support for FB-DIMM I would have expected it to go the way of RDRAM.
I also find it funny that when the Athlon64X2 is compared to the Core2Duo it is slower, but when the Opteron is compared to the Woodcrest it is alot more competitive clock-for-clock.
Guys,
Any thoughts -- Intel is back??
http://www.techworld.com/opsys/reviews/index.cfm?reviewID=438&pagtype=all
30% of global X86 market share by 2008 announced today and 40% of server share. IF Conroe & Woodcrest are so great, how come Intel is losing market share with these two fantastic CPU’s?
Notice to all Intel fanboy’s, either get off at the next stop or go to the back of the bus.
:-) i hope you mean the memory bus, becaz the intel fanboys love to hog the bus, as much as the intel proc loves to hog the shared FSB.
cheers
I have become convinced that this blog is either:
a) A phych experiment/paper - or
b) An attempt to graduate to paid
blogs.
The proof among countless others is the laughable attempt to blame the "dual" Dell explosions on dual core Intel processors.
That was a riot. Honestly, you try and provoke science and insight - and I have a history in that, but revert to a photo which seems to have explosions eminating from either side of the laptop and then conclude that is was a dual core explosion?
That is honestly sophmoric, and leads to my initial conclusion about your blog. You may perhaps fool the masses for a paid site at least for a while, but you will never get this through any reputable educational institution.
It is quite entertaining however.
Smile, smile and smile all around...
Bait worked! now I know who the techie wanna be's are.
I like this site! The blog owner is good but most often a bit over. nyx and pointer gives very really good comments. But jeach! and the rest of the anonymous should just stay anonymous. Try using your common sense. C'mon, digital oscilloscope on data buses!
I a simple truck driver. Used to be with the big blue until I opted for early retirement. Just got hired by a factory in Taiwan to help them make the one true 64-bit proc.
- Mr. Device Developer
I have become convinced that this blog is either:
The blogger is a failed Phd pretender
The blogger is a fired former designer
The blogger has no life
The blogger is a lunatic AMD cumm sucker.
frag!
this story shows how desparate Sharikou is getting to enforce his "scientific point"!!!!!
Can you get more desparate than one day saying Intel fixed benchmarked C2D and Woodcrest while referring to a lame comment on a AMDzone.com?
AMD fannies: any comment?
this article is so lame, Mad Mike did not even comment on it:
Do you have reference to back up this claim? You probably mixed up network latency with memory latency. The former are usualy in the order of 10s ms; the second in 100s ns. There's a 100x difference between the two. That is, a 2 or 3 times of memory latency won't make the game feel sluggish at all (as if it had high network latency).
No...
I was talking about memory latency,
The data in games aren't so locale, and memory have a lot of indirections (ie: a variable wich stores a pointer to a structure or function), these is the worst case for any processor, it can't continue before copying the structure or function data to L1, but can't start copying before reading the variable, the CPU may wait for the latency of L2 or memory (worst), with a bigger cache Woodcrest have a bigger chance of findding the data in L2, let's say, with the 4MB Woddcrest need to acess memory just 15 times for 1000 instructions, with only 1MB Opteron need to acess memory 40 times for 1000 instructions, due to memory latency the 15 acess of Woodcrest take 3000-4500ns and the 40 acess of Opteron take only 2800ns.
I'd really like to know how do you conceive a 32-core, superscalar & OoO multithreading with uniform memory access.
If each core has its own large cache, NUMA will help. Or won't it? If a notebook processor has 8 cores and four memory links (one link for two cores) each operating at 800GHz, won't it consume less power than a processor with one super-fast link operating at 3.2GHz for all 8 cores? Yes, it's future talk, but that's my point of scalability.
1) It is possible to have 4 links, and yet still being UMA. It is significantly faster because it has no node delay and do not need special software optmization.
2) 4 links means minimum 4 dimms and bigger pin count. Not a good configuration for small form factor and pricing. Desktop replacement laptop is possible.
3) the scalability doesn't really apply here for the 1P system. For 1P multicore, read my statement number 1.
4) I do agree the scalability apply to the design team (not user) where if they have 1P4Core, the can combine 2 dies into one package and give customer a '1P8Core' with minimal modification.
5) try to imagine this complexity to the server MP level. it will be damn complex to have a internal NUMA and then a external NUMA. It is better to have internal UMA, and external NUMA.
I think the Photoshop test the reviewer ran IS a real application on real systems. That's the partial proof I was talking about. You can dispute the validity of that test however you wish, though.
yes, it is real apps. But, is it under real workload? ro a specific workload that is not generalize. There will be hundred if not thousands of use cases of a singles apps. What makes you think using this particula load doesn't make it meeting your own statement of confined working set? I would not claim C2D win in ALL working sets. But i would say C2D excel in most of the working sets in most of the vaailable apps as compares to AMD's.
Another day, another comparison, and Athlon gets another ass-kicking by Sheriff Conroe.
Maybe AMD can create another PC category. Like that "less than $100" category for poor countries...
Instead, AMD will have the "less than 100 IQ points" category for people who buy the Big Green Hype Machine's dud chips.
I want to let everybody know that RubyWorks is back and bigger than ever. Check out this link here to learn all about the new additions and don't forget to drop a line and let us know what you think.
To pointer:
How does multiple memory links (to multiple cores) remain UMA? By all means data read from one link by a core will have different latency from data read from another.
Also, unless my idea was terribly wrong, you don't need internal/external NUMA, but just different distances (locality) on "one NUMA level". So for 2 cores on the same die, the distance can be one; for 2 cores on different dies, the distance can be 2; and so on.
Lastly, I don't understand why NUMA has scalability for the designer but not the user. I thought it's just the other way around. If NUMA optimization is in the software, NUMA machines will offer better scalability for end-user performance; on the other hand, designers can always slap two dies into one chip package regardless of the memory architecture (Kentsfield?).
"The data in games aren't so locale, and memory have a lot of indirections (ie: a variable wich stores a pointer to a structure or function), these is the worst case for any processor,"
I disagree. Data in games are local because nothing jumps from one spot to 10 blocks away all the time. The data access is mostly continual. This is unlike say big file or database server where data can be retrieved from anywhere with a random mix of filters.
As for pointers, memory latency will be a problem if memory disambiguation fails and the data it's referring is not in cache. With Core 2 Duo's good memory disambiguation and large cache, this is really not a big problem. Again, we already know that memory latency affects K8 much more than Core 2.
For FB-DIMM, access latency won't be larger if the data to access is located in the closest module; for DDR2, with multiple modules (8x2GB in this case), latency will be much higher due to more contention on the link. Due to Opteron's NUMA nature it's actually 4 modules per link, but that's precisely the advantage of K8 over Core 2.
There is some memory and core scalability problem in Woodcrest, past 4 cores and 4GB, and it seems to me FB-DIMM latency isn't the main cause here.
'Instead, AMD will have the "less than 100 IQ points" category for people who buy the Big Green Hype Machine's dud chips.'
That's about the most laughable thing I've ever read in this blog--the less than 100 IQ point people who've been buying computers since socket 939 was introduced have been buying Intel chips.
I'm assuming your previous chip was a Prescott P4 based on your obvious hatred of AMD (what a winning cpu the prescott was!!!) since AMD makes "dud" procs.
Yes, current Conroe top-end is faster than top-end AMD--yay. It's been a month, they've obviously won the war. AMD has only ruled the performance roost for 3 years. Now Intel gets a month (yes, it's only been a month since the official release of Conroe, some of us have a sense of time) back on top and you say that AMD makes garbage?
Obviously you're not a computer lover. Maybe an Intel stock-holder--or perhaps a mac-user... the level of ignorance is about equal from what I've heard as far as computers go.
It's great that Intel has released such a wonderful cpu--that doesn't mean they'll be on top forever. Just the here and now.
Oh, and btw, your ridiculous reference had *nothing* to do with Woodcrest or Opteron. These are servers, not desktops. Maybe read the original post before commenting so ignorantly.
If ignorance is bliss, I think the guy I quotes must be a very very happy person...
Rich E
"Maybe AMD can create another PC category. Like that "less than $100" category for poor countries..."
Another level of ignorance from the same poster--Intel has been trying to get into "poor countries" for 10 years. AMD has for a little while as well. Most enthusiasts would rather buy a lower-clocked processor at a cheaper price, and overclock the crap out of it, not spend $1000 on a proc when a $200 proc can do the same thing with a little creative cooling.
My suggestion? People should learn a little about computer enthusiasm before they state a preference to a brand.
Rich E
How does multiple memory links (to multiple cores) remain UMA? By all means data read from one link by a core will have different latency from data read from another.
When all the cores are in the same dies, you can do a lot of thing. I just give you one bad (but workable) implementation here. Just imaging that the current chipset that has 2 FSB get integrated into the multicore CPU. it has 2 link and yet UMA. Got it? The real implementation of this of course should be smarter.
Also, unless my idea was terribly wrong, you don't need internal/external NUMA, but just different distances (locality) on "one NUMA level". So for 2 cores on the same die, the distance can be one; for 2 cores on different dies, the distance can be 2; and so on.
exactly. when you have different node distance, the optimization become hard. That's why AMD solution is trying to keep the remote node the same distance.
Lastly, I don't understand why NUMA has scalability for the designer but not the user. I thought it's just the other way around. If NUMA optimization is in the software, NUMA machines will offer better scalability for end-user performance; on the other hand, designers can always slap two dies into one chip package regardless of the memory architecture (Kentsfield?).
I am refering 1P multicore(read as one phisical package) here. read my argument clearly. I said for mobile, 1P is the solution for its form factor and power. Thus, it has nothing to do with user. I admit I was not clear on painting the system to you. The NUMA is good for the design team to add 2 dies into 1 package without losing much of the memory bandwidth, and with minimal modification. For kentsfield case, it need to increase either the cache or FSB or both. So, in some sense, the NUMA is good for the design team in doing this quickly.
to correct my example, in case the current 2 FSB solution out there has only one link (i dunno, lazy to do research on this), just use the similar approach, but some logic in between to route the memory accesses to the intended link. you might wanna there is a potential access from multiple CPU to the same region and hence an unique UMA problem. But there are way to reduce this possiblity or its impact. Again, just to point out this logic can be done because it is physically in the same die.
AMD aims for 40% server market share by 2009
http://techreport.com/onearticle.x/10610
Only 40% by 2009? Shouldn't it be closer to umm, 100%, since Intel would've long BANKRUPT by then? LOL
Eh Sharikou?
Anonymouse said:
My favorite part is about the digital oscilloscope. What did you do man? Connect a device on the motherboard traces? Or on the CPU pins? And I suppose it was able to measure signals that last only a few ns. Not to mention the fact that you have a serious confusion about the data bus width and the width of a processor's registers. Dude the data bus has been 64-bit wide since the first Pentium, modern CPUs have a 128-bit wide data bus (Athlon 64 S939/AM2 in dual channel configuration).
It's actually quite easy to measure signals on the FSB using a digital oscilloscope. Not quite sure why you'd do it that way, since using a logic analyzer is a MUCH easier way to see what is going on.
It's even possible to measure the signals on a serial interconnect using a digital oscilloscope (and they have eye widths usually only a few hundred ps). Hell, I use a 20Gs/s scope to measure PCIE signals all the time, and there are much more powerful scopes available.
In short: this is very possible, and it's done ALL the time. So if you don't know something, it's really best just to keep it to yourself.
No comment on the Turion 63 X2 benchmarks from Tom's hardware yet, sharikou?
"compared to an Intel platform based on the Core Duo and the company’s own GM 945 chipset, the combination of AMD CPU and ATI chipset is inferior in terms of battery time and multitasking performance. Therefore, under equal conditions, it can only be regarded as the second choice - if it is worth getting at all. The Core Duo 2, Intel’s next generation of laptop processors is already at hand, and first measurements show that the Core Duo 2 is even more powerful while not consuming more power."
The truth comes out from Intel's lair in Oregon.
http://www.oregonlive.com/business/oregonian/index.ssf?/base/business/1156301711290660.xml&coll=7&thispage=1
I feel really sorry for some of my friends working at Intel now. (I am from Portland, Oregon.)
-Longan-
P.S. I studied a lot about microprocessor architecture. As an engineer, I felt something "weird" about Woodcrest. I has been suspecting it was designed to pass bench-mark or had some design hole some where.
Why would Dell drop Intel just before Woodcrest announced? Would Dell got early engineering samples and knew something that we don't know???
"How does multiple memory links (to multiple cores) remain UMA? By all means data read from one link by a core will have different latency from data read from another."
The same way a dual core processor today with dual channel memory controller (say any modern Athlon/Opteron) does this just fine.
You could implement as many channels in your memory controller as you want, even "thread" them if you were using one of the various Rambus architectures.
The limitations of course are the cost of the RAM and the complexity of the memory controller design. And depending if you use parallel or serial RAM interfaces, the cost of the connection to the RAM. With parallel RAM, the pin counts ramp up quickly. The dual channel memory controller (for parallel RAM) is the big reason that Athlon/Opteron chips require so many pins.
One of the Alpha designs that did not get built had a multi-channel serial RAM (RDRAM) on-chip memory controller that was not NUMA, but would have offered very good performance for a multi-core processor.
"Lastly, I don't understand why NUMA has scalability for the designer but not the user. I thought it's just the other way around. If NUMA optimization is in the software, NUMA machines will offer better scalability for end-user performance; on the other hand, designers can always slap two dies into one chip package regardless of the memory architecture (Kentsfield?)."
NUMA requires the OS to be specially designed for memory and threads that are better accessed on one processor vs. another.
In some ways, NUMA can be looked at as a very fast cluster.
NUMA also requires applications to be aware of NUMA and deal with things like processor affinity. Do you allocate 4GB of RAM spread across two processors or do you allocate 2GB on one and 2GB on another processor and utilize different threads to process each 2GB? NUMA gives the application designer a lot of choices.
The user can only hope that his particular NUMA configuration was taken into account by the software developer. Some people, for instance, put all their memory on one of two processors. This actually works okay for Windows XP 32, but not for XP x64 / Server 2003.
I love this blog, it's my favourite place on the net at the moment.
I myself am a bit of an AMD fan. The reason is simple. AMD has allowed me to build computer systems for less money. Thus enabling to sell them at prices that compete with the likes of the Phantom Menace (Dell).
Without AMD, the little PC sales man like me would disappear and you'd all be stuck with the terrible support and non dynamic lines that large PC vendors provide.
I say bollox to Intel, the only way they could beat AMD's 3 year old architecture was to increase the onboard cache so that applications would rarely need to access the main system memory.
Imagine C2D on 90nm! that chip would be huge.
For the other Intel spacwads that continually use their tiny vocabulary to hurl abuse at Shak I say this:
With all the money in the world Intel could not beat AMD @ 90nm. Doesn't that tell you something?
What I see from AMD is passion, innovation and a will to win. This is a company that has had to fight tooth and nail to remain in an industry completely owned by Intel.
How many profitable OS providers exist today outside of Microsoft?
Shak, keep up the goodish work. I don't care if your Phd is real. I like the contentious posting.
"the less than 100 IQ point people who've been buying computers since socket 939 was introduced have been buying Intel chips."
Yes, socket 939 is great, simple as that. Only people with sub-100 IQ cannot see that. It was great 2 years ago because it allows drop-in upgrade 2 years later and have a dual-core performer with zero effort.
Of course, everyone is saying that s939 is dying. But the fact is, since in average it's only 20% slower clock-for-clock than Core 2 Duo (and the gap is smaller for 64-bit), the time a s939 box becomes obsolete, performance-wise, will be very close to when a Conroe does; I'd say within 2 quarters. Yeah... s939 is dying.
By the same token, due to K8's great scalability, I expect today's AM2 the same advantage 2 years later - drop-in upgrade with Rev.H quad-core K8L. I won't need to buy it until AM3, though. S939 has been serving me well. ;-)
Anyway, at some point you really feel happy to be using AMD-based systems. You spend less overall but still get decent performance. A real comparison: the S939 is bought earlier than my cousin's P4 box, and she (who did not listen to my advice) now only wish to have $$$ for a whole new system. (Alas, she'll probably still go for Intel - some people just never learn or change.)
Anyone following up the discussion on the forum? It's pretty lame IMO to discount the whole forum with its name while one doesn't read through the thread. Because if he does he'd find that discussions there (AMDZone) aren't biased like other places. The original poster (VENTURI) holds a very neutral stance toward his experiment - he is not trying to promote any CPU over the other, but merely stating the facts.
He claimed later in the thread that Woodcrest has advantage in 32-bit, sub-6GB RAM, non-RAID environment, whereas Opteron is better for 64-bit with large RAM and storage. He said: "Using artificial benchmarks, such as sandra or pcmark, the woodcrest wins on processor arithmatic and multimedia, but loses on memory benchmarks." (My opinion on that is that K8's core has better memory, IO and size scalability.)
There are other follow-up discussions on the board, too. One pointed out that the memory benchmark is synthetic itself. However, another person (hyc) said that from his experience with server (OpenLDAP), locality of data is minimal and memory performance IS important. (I'd like to note myself that this may not hold for games, where locality is much better as I explained perviously.) That I interprete as saying that memory benchmark is important to the overall benchmark of a processor, and that's why there is a memory benchmark in any decent processor benchmarking suite.
BTW the original poster also said that the Woodcrest he used were a pair of 5150 (2.66GHz). In terms of clockrate it's the closest he can find to compare with the 2.6GHz Opteron.
I suggest anyone who bashed Sharikou for inclusion of this forum review to follow the thread more carefully. IMO it is a good balancing comparison contrasting all those benchmarketing websites on the Internet (frankly, many of them just repeat the same biased results over and over).
"
"How does multiple memory links (to multiple cores) remain UMA? By all means data read from one link by a core will have different latency from data read from another."
The same way a dual core processor today with dual channel memory controller (say any modern Athlon/Opteron) does this just fine.
"
I think you're mistaken. Multiple memory links to a multi-core CPU, which would imply multiple memory controllers for the core, has nothing to do with multiple channels (from memory modules to the memory controllers). What I was suggesting, as I said, is future talk. It is not implemented in any x86 processor today.
Those other you said seem to be fair otherwise.
Sharikou,
I am a great fan of your posts.
Waiting to hear your take on this
http://news.com.com/Sun+recoups+server+market+share/2100-1010_3-6108453.html?tag=nefd.top
--Mahi
Edward said...
"...But the fact is, since in average it's only 20% slower clock-for-clock than Core 2 Duo (and the gap is smaller for 64-bit), the time a s939 box becomes obsolete, performance-wise, will be very close to when a Conroe does; I'd say within 2 quarters..."
How is Conroe, performance wise, going to be obsolete in 2 quarters?
"The NUMA is good for the design team to add 2 dies into 1 package without losing much of the memory bandwidth, and with minimal modification."
Agreed. And you have to admit that AMD has a better (scalable, powerful, and lower cost) system architecture with on-die memory controller and hypertransport / direct interconnect.
Intel is losing the most lucrative server market share with its latest fab technology and core design precisely due to the lack of these scalable features. I bet somewhere inside Intel must be working day and night to produce a competing alternative, just to preserve the company's pride of using its own technology (or buy it out for total control).
"It's actually quite easy to measure signals on the FSB using a digital oscilloscope."
I thought he meant to say that digital oscilloscope cannot measure the bit-ness of a processor. The memory bus width has nothing to do with the bit-ness of the CPU internal.
Edward said...
"I suggest anyone who bashed Sharikou for inclusion of this forum review to follow the thread more carefully."
At the time of Sharikou's posting the specs were not there, as you can see the specs were put there today, his post was on the 21st.
Imagine C2D on 90nm! that chip would be huge.
Yes, brilliant observation. Is there a point here? Maybe Intel should design on older, more expensive processes to make slower chips so that you can pay more for it? Or maybe to help AMD keep up? How about make less money for their shareholders?
With all the money in the world Intel could not beat AMD @ 90nm. Doesn't that tell you something?
Why yes. It tells me that they have a 65nm process ready before their competition, and therefore don't waste time designing for 90nm process and then spending more money migrating the design to 65nm. Are you implying thay maybe AMD should design K8L for 90nm process first, and then take it to 65nm? Do you really think that AMD wouldn't be shipping 65nm parts in volume today if they could? If so, I have a fab I want to sell you...
Sharikou,
After reading through today's comments it is apparent that you have about as much credibility left as Dick "They'll greet us with flowers, candy and kisses" Cheney, and George "Iraq oil with pay for the war" Bush.
Agreed. And you have to admit that AMD has a better (scalable, powerful, and lower cost) system architecture with on-die memory controller and hypertransport / direct interconnect.
having said all that, you know why NUMA is not the future for ALL market? it is for Server MP only. And for the single chip multicore, it is better to design as UMA instaed of NUMA internally. The same design would bring benefit be it the chip use in a UMA environment, or NUMA environment. No successful NUMA in laptop and desktop in a forseeable future. If the 4x4 is using NUMA, it is doomed to fail.
"How about make less money for their shareholders?"
Rest assured that they are doing that right now. ;-)
Agreed. And you have to admit that AMD has a better (scalable, powerful, and lower cost) system architecture with on-die memory controller and hypertransport / direct interconnect.
But guess what why AMD does come out with the 2 die in one package given that it has such SCALABLE design? because the multicore would ended up as NUMA intrnally, which is bad for the laptop, desktop or even to be used as a NODE in the NUMA for the server MP. :)
just notice my id changed ... i wonder why ...
Some clueless fanboy wrote:
"Since the blog owner rarely stays on topic himself (Journal for Pervasive 64-bit computing my ass!)"
The solution is simple, GET YOUR OWN BLOG and post your own 64-bit computing centered reporting there if you don't like Sharikou's.(obviously you do like otherwise you wouldn't be here) I would enjoy watching you trying to do a better job of it and I'll willing to bet the farm you'd be just as Intel biased about said news as Sharikou is biased towards AMD. Put simply, why do you post on his blog if you hate the man so much? It's called one man's opinion, not one man's factual comments, say your countering opinion, get some factual links to back up your opinion. If you both can't agree on it, then agree to disagree and move on. Like you and everyone else, usually our opinions have already been made before the discussion starts anyway.
"then I don't think it a big surprise that commenters are not on topic either... It is a rare day that Sharikou posts on 64-bit specific topics or posts information about 64-bit software enhancements."
What does it matter? Hardware or software, in company or in the industry it's all loosely centered around 64-bit specific topics in this day and age(thanks to it becoming the standard with Vista) even if it is a bit stretched. It's just like the news man, theres alot of topics, angles and viewpoints to cover, get over it.
"Also, there are other 64-bit chips out there but Sharikou seems to think the only one in the world is made by AMD."
Yea, because AMD's 64-bit processors are the only one that matters in the market duh, get with the program! ;^p
Seriously though, AMD started the x86-64 revolution with the Athlon64 and even forced Intel to adopt it via EM64T.(since Itanium/EPIC is sinking like the Titanic for a ISA solution, aka Itanic) Therefore I feel they are the leaders(in other areas as well too) and as I'm sure Sharikou feels, they deserve the attention.
However, this is not to say Intel is totally out of the limelight as AMD and Intel are literally tied at the hip when it comes to anything the microprocessor industry as they are the two major competitors in said market.
"Stay on topic? It's anything goes around here."
Thus sayeth you. Just because it happens doesn't mean it has to happen. I think once a point has been driven into the ground and beaten like a dead horse and both sides are not going to change thier mind it's time to move on personally. Hopefully some other egotistical individuals reading this blog with personal vendettas against it's owner will get this memo and go buy a clue.
"No successful NUMA in laptop and desktop in a forseeable future. If the 4x4 is using NUMA, it is doomed to fail."
Why? Please explain in more detail.
I'd say that NUMA is not only forseeable, but inevitable. That will be true as long as the memory and inter-core communication become relatively slower with every new processor generation.
IMO, if NUMA has no future on desktops, then the desktop PC has no future. Same goes with notebook (where it'll be replaced by handhelds). With greater number of cores and larger total memory, there is no way that uniform memory access can scale performance, less performance/watt. Unless photon-based devices or quantum computing become reality sooner than we all expect, of course.
But guess what why AMD does come out with the 2
I have the tendency of leaving out the word NOT. what i mean here is why AMD does NOT come out ...
Why? Please explain in more detail.
I'd say that NUMA is not only forseeable, but inevitable. That will be true as long as the memory and inter-core communication become relatively slower with every new processor generation.
IMO, if NUMA has no future on desktops, then the desktop PC has no future. Same goes with notebook (where it'll be replaced by handhelds). With greater number of cores and larger total memory, there is no way that uniform memory access can scale performance, less performance/watt. Unless photon-based devices or quantum computing become reality sooner than we all expect, of course.
as what I explained in all the previous post. NUMA is really for server MP which has physically apart processor communicate through the dedicated links.
Inherently UMA is better than NUMA if you are able to get rid of the memory bandwidth concern (read as either you do not need that same level of bandwidth as in server, or found alternative way to increase the memory bandwidth). I'm not bashing NUMA but one thing doesn't fit all. Different market just have different need. Also, i have put in a brief explanation on how to acheive the same level of IMC link in the multicore yet maintaining the UMA. In current AMD offering it has 2 CPU to 1 link UMA internally. What makes you think that it is impossible to have, say, 8 CPU to 2 link (UMA) internally. why do you think AMD want to come out with NATIVE Quadcore while knowing it off hand it might be late compare to intel's. Using AMD solution of combining 2 dies in a chip would inherently making the chip a internally NUMA, which is not good for (today, and foreseeable future) computing environment, and making its a bad choice for the server MP node as well (hard to optimize).
as of the 4x4, I wanted to modify that line but i just click publish. i write too much and confused myself with the in dies NUMA and the external NUMA. anyway, the 4x4 will have some LIMITED niche market. No matter how much you wanna deny this, here are my points:
1) it stands for expensive.
- don't tell games that can buy 2 cheaper CPU for that. If i do, it will defeat gamer purpose of buying that system
- software cost. there are so much unknown to the MP licensing terms.
- Must go for VISTA xxx (assuming it support NUMA). I'm pretty sure XP doesn't. Need to pay for additional OS cost
2) No currently HEAVILY multithreaded (not refereing to the short-live-thread) apps and not in foreseeable future. the VOODOOPC ppl has to say use 2 instance of the same game as its use case ... so, you know what i mean.
3) there is no MAGIC in NUMA support. do not ever think that a OS that support NUMA means it give you the NUMA optimization instantly. programmer has to take care of that part.
Call me a NAY sayer if you wish. AMD historically (read as 'as far as i know') hae never neen an industry/market enabler. The HT stuff will be its first ever industry enabling move. I'll wait and see if it dos success.
btw, to those that do not know what i mean as 'enabler'. Take interl for example.
PC needed a better bus, intel give and promote PCI
Want to sell its centrino, intel established and funded hotspots around the world.
want to sell mulitcore, intel HAS STARTED to promote multihtreaded apps (AMD did anything here as it is also selling multicore porcessor?)
For AMD, NUMA was in the server space for quite some time and its customer know how to do it for the server system and its apps. I know AMD is working with CRYSIS (may be i spell it wrongly) ... but to enable a new market ... you need more than that.
"AMD historically (read as 'as far as i know') hae never neen an industry/market enabler. The HT stuff will be its first ever industry enabling move."
You are too biased. Megahurtz myth was first broken (after numerous attempts from others) by AMD's Athlon 64. x86-64 was originally AMD64. Multi-core on x86 was first seen in Athlon X2, not the multiple-die-in-one-package Pentium-D.
Intel also "enabled" many things: PCI (and its variants), USB, the first 32-bit x86, and MMX/SSE. But please, not Centrino, because that's a lousy wireless implementation, even Intel engineers says that. Up until now (that is, not commenting on WiMax), Intel has been a marketer, not enabler, for wireless technology.
Edward, you might want to refer to this link:
http://www.xbitlabs.com/articles/cpu/display/amd-k8l.html
Also, are you qualified in this field?
Hopefully some other egotistical individuals reading this blog with personal vendettas against it's owner will get this memo and go buy a clue.
No personal vendetta here- but it would be nice if the blog owner applied some form of logical thought process to his arguments, and stick to posting that which he knows. He is clearly WAY out of his depth on all things fab process related, and his postings on Intel's Copy Exactly and AMD's APM are downright absurd. Does anyone really believe that Intel only runs 1 product per fab line, or that Itanium is still stuck at 130nm? Please explain to me how Montecito (>1.7 billion transistors) could fit in a stepper field at 130nm...
It's these sort of wild claims that devolves the discussion on this blog from useful to entertainment. And that's the only reason I'm still around- it's like going to the demolition derby.
You just don't get it... what ever you quoted below are just not industry/market enabling
You are too biased. Megahurtz myth was first broken (after numerous attempts from others) by AMD's Athlon 64.
You get confused of the marketing strategy. There was no myth about the Mhz as the more it was, the better the apps until recent years. While Intel was still using the Mhz as part of its marketing strategy (it worked), it also focus on power (mobile), SIMD, HyperThreading, etc. There is noting being enabled even with what you claimed that the myth being broken.
x86-64 was originally AMD64. Multi-core on x86 was first seen in Athlon X2, not the multiple-die-in-one-package Pentium-D.
What is the different of the multicore implementation to the industry? Nothing! Both enable the industry with multicore technology (x86). And intel has done much better job in enable the market (with its initiative with multi university on the multithreaded programmig related course, software ecosystem influence.) Anyway, it is not AMD or Intel that came out with the multicore (if not mistaken, IBM)
Yes, AMD is the one that come out with x86-64. Again, it is not that big deal 'alone' because there are already 64 bit Power, itanium, etc in the market. Anyway, they deserve that credit too. It did created a x86-64 bit market although i'm not sure if it is AMD or intel playing a major role in expanding it now.
Intel also "enabled" many things: PCI (and its variants), USB, the first 32-bit x86, and MMX/SSE. But please, not Centrino, because that's a lousy wireless implementation, even Intel engineers says that. Up until now (that is, not commenting on WiMax), Intel has been a marketer, not enabler, for wireless technology.
read my post, i said it enabled the laptop wireless market. There was a chicken and egg problem. Intel provided the egg, and the chicken followed. This is called market enabling. side story, there was the same situation when intel came out with USB. The vendor just don't want to produce USB device because there was no much USB enable PC. and the PC vendor see no point of having USB port because no much USB device. Intel seeded it. (Intel actually enabled USB technologically and market wise). Other industry including AMD then ride on it.
Btw, i'm not sure how lousy the wireless chip implementation is. I'm using a centrino, and i love its wireless capability. By saying lousy, which chip and which charateristic you refer to? distance? signal recovering? standard? power? Any broadcom chip comparison?
"You just don't get it... what ever you quoted below are just not industry/market enabling"
Well, I never intend to discuss industry/market enabling, but technology enabling. Whoever has bigger pocket can "enable" the market quicker and better, of course, even with inferior technology. Be sure that I have no respect for that, but I respect the fact that you respect it.
x86-64 was not possible until AMD came up with it. HyperTransport was not possible on x86 until AMD came up with it. Multi-core approach (instead of MegaHurtz hype) was not possible until AMD started it, even though Intel beats it with an earlier product introduction.
Intel didn't enable wireless technology from technical point of view. Long before Centrino supports 802.11b there've been numerous superior implementations; then every respectable player moved to 802.11g before Centrino being almost the last that support it. Intel just marketed "Centrino" like no tomorrow - fine, it's market enabling, I guess Dell is the enabler of corporate PC, then.
"Edward, you might want to refer to this link:
http://www.xbitlabs.com/articles/cpu/display/amd-k8l.html"
It seems like some basic computer architecture knowledge with heavy speculation on both K8L and Conroe. It's probably not a bad read for people not in this field, but too simplified and sketchy to truly evaluate the strengths of both architectures (K8L v.s. Conroe).
The fact is, X-bit Labs knows not much about Conroe, and even less about K8L. It had this goal of making Conroe look good in front of K8L, and it tried very hard to reach that. I think it's just waste of everyone's time and energy.
"Multi-core approach (instead of MegaHurtz hype) was not possible until AMD started it,"
Oops... I meant to say it was not possible on the x86 market. I remember IBM and others had multi-core design long before both AMD and Intel (they gave up MegaHertz much earlier ;-)).
Edward, do you have an email so I can get in touch with you?
I believe I can discuss further things with you.
"Edward, do you have an email so I can get in touch with you?
I believe I can discuss further things with you."
edwardofblogspot at yahoo dot com...
please don't spam my email account; come and spam here instead (sorry Sharikou), thanks! ;-)
http://www.intel.com/pressroom/archive/releases/20060918corp.htm
The end for AMD, until the copy it of course.
Post a Comment
<< Home