Friday, March 30, 2007

This Pentium XE 965 killer is priced at $69

$69 to frag Pentium XE 965 in most benchmarks and free shipping.

22 Comments:

Blogger R said...

“Ooooweeee!”

I think you missed the main point.

Less than a year ago the 965 EXTREME 3.73 ghz, I think it came with 4 meg cache was Intel’s flagship processor.

Technology has made it possible for a $69.00 processor to frag last year’s hot rod.

I’m hoping Intel will frag the K10 next year and so it goes. Your glass is half empty.

6:03 PM, March 30, 2007  
Blogger Ho Ho said...

r
"Less than a year ago the 965 EXTREME 3.73 ghz, I think it came with 4 meg cache was Intel’s flagship processor

So it is. But don't forget that all Core2's are also faster than that CPU, though at a conisiderably higher price than that AMD.


In other news, XE 965 is newhere to be found. If this is caused by being too popular or Intel has just stopped producing them, I don't know. For some reason I think the latter might be the case.


Also there are some problems with s939 x2 availiability. Namely only 4200+ is availiablle. So much about having an option to keep upgrading your older socket CPU's

2:13 AM, March 31, 2007  
Blogger Unknown said...

This comment has been removed by the author.

3:06 AM, March 31, 2007  
Blogger Unknown said...

Less than a year ago the 965 EXTREME 3.73 ghz, I think it came with 4 meg cache was Intel’s flagship processor.

Less than a year ago the FX-60 was AMD's flagship and cost a thousand bucks. But now is not a year ago, so what's your point?

3:08 AM, March 31, 2007  
Anonymous Anonymous said...

dr,yield,phd,mba, said,

Yawn. Not newsworthy.

It must have been as you spent time in this blog reading & commenting on the subject.

Tick Tock Tick Tock

6:41 AM, March 31, 2007  
Blogger R said...

“Less than a year ago the FX-60 was AMD's flagship and cost a thousand bucks. But now is not a year ago, so what's your point?”

The FX60 was also the end of a socket 939, the point being technology moves on. Intel is moving ahead to 45nm because of pressure from AMD, technology is a good thing; just except it. Come on in, the waters fine.

8:09 AM, March 31, 2007  
Anonymous Anonymous said...

Is Intel copying AMD?

http://www.tgdaily.com/index.php?option=com_content&task=view&id=31428

1:38 PM, March 31, 2007  
Blogger Unknown said...

Hardly. Intel was playing around with IMCs with the Timna processor back in 2000. The Alpha processor was the first processor that shipped with an IMC.

With regards to graphics, Timna had integrated graphics as well. AMD's announced it's plans to do intergrated graphics, but that's it. They've shipped nothing. Nada. Nehalem is due for next year and will be here before Fusion. I don't believe AMD's even set a firm date for the release of Fusion, have they?

7:21 PM, March 31, 2007  
Blogger Unknown said...

Whatever happens, and who ever has the phattest Turbo on their proc... I will still get one hell of a deal come Crysis....
Moahahaha...
In the end, I win... not Intel, not AMD.... ME!!! MOI!!!

JOY!

10:16 AM, April 01, 2007  
Blogger lex said...

WOW and you know what Pretender

INTEL makes a lot of money moving these space heaters. Their 90nm factories are already depreciated and running at high yields. Sell at 69 bucks make AMD sell their expensive 65nm products at a loss..

Tick Tock Tick Tock

Is that Penrym and Nehalem on 45nm I hear coming. Priced at a couple hundred bucks and sporting the best benchmarks.. Poor sharikou and AMD.. LOL

5:13 PM, April 01, 2007  
Blogger Christian Jean said...

I thought this was a blog to discuss processors, technology, Intel and AMD... not Walmart specials?

This is like the third time in recent memory that you link to a 'deal'? Are you getting royalties?

This is like discussing OS/2 and Windows 3.11

"Intel was playing around with IMCs with the Timna processor [...] Timna had integrated graphics as well."

Dreaming won't put the bread and butter on the table... execution does. AMD executes!

In regards to the article 'Is Intel copying AMD?'

Great point about why Intel didn't peruse GPU+CPU before. So now that Intel will integrate their fisher price of a GPU on chip, revenues should drop for Intel. Nice!

By how much, I don't know... I'll check the numbers.

In a few years I don't believe there will be room for a discreet graphics market (or NVIDIA).

8:09 PM, April 01, 2007  
Blogger sharikouisallwaysright said...

Otherwise, there is only room for an integrated CPU...

9:02 PM, April 01, 2007  
Blogger Ho Ho said...

jeach!
"Dreaming won't put the bread and butter on the table... execution does. AMD executes!"

So it is but as we all know, Intel did (does) just fine without IMC. It did the research long ago and found it wasn't (yet) cost effective to use so it didn't.


"So now that Intel will integrate their fisher price of a GPU on chip, revenues should drop for Intel"

Wouldn't it be exactly the same for AMD/ATI?


"In a few years I don't believe there will be room for a discreet graphics market (or NVIDIA)."

When was the last time you heard that a CPU socket had memory bandwidth of >100GiB/s? Fastest I know goes up to around 25% of that, far too little for high-end GPU. Heck, even $100 GPUs have much more bandwidth than any x86 CPUs!

12:05 AM, April 02, 2007  
Blogger Unknown said...

Take a mainstream Intel or AMD platform using dual channel DDR2-800 memory. That's about 12.8GB/s of memory bandwidth. Compare that with a high end video card, the 8800 GTX will do nicely. It has 86.4GB/s of memory bandwidth. 12.8GB/s of memory bandwidth is nowhere near enough for a high end GPU. The G80 GPU is also 680 million transistors. It's not feasible to integrate something that advanced onto a CPU die.

12:23 AM, April 02, 2007  
Blogger core2dude said...


Take a mainstream Intel or AMD platform using dual channel DDR2-800 memory. That's about 12.8GB/s of memory bandwidth. Compare that with a high end video card, the 8800 GTX will do nicely. It has 86.4GB/s of memory bandwidth. 12.8GB/s of memory bandwidth is nowhere near enough for a high end GPU. The G80 GPU is also 680 million transistors. It's not feasible to integrate something that advanced onto a CPU die.

I am not a GPU expert, so this is more of an opinion than a fact. But isn't it true that GPUs today do not have that much of cache? With added cache, you might be able to reduce the memory bandwidth requirement.

For example, what if you stacked half a gig of DRAM onto the CPU? You can have tremendous amount of bandwidth to the stacked DRAM, though your bandwidth to the system memory may suck. But does that matter, if the stacked memory can hold your working set?

1:55 AM, April 02, 2007  
Blogger Ho Ho said...

Intel 45nm quads will have 840M transistors for two dies, a total area of a bit over 200mm^2 combined for the two. It wouldn't be impossible to put a midrange GPU and a dualcore in one package but surely there won't be nearly enough bandwidth.

Sure HT3 is nice and all but it still has way too little bandwidth to be usable for GPUs. Face the fact: no x86 CPU socket has enough bandwidth. I don't remember exact numbers but Power5 might have come close to high-end GPUs


core2dude
"But isn't it true that GPUs today do not have that much of cache? With added cache, you might be able to reduce the memory bandwidth requirement."

You know why GPUs are good at stream processing? It is because they stream loads of data through their pipes. There is rather little data locality and that is the reason why there are no caches.

In short, no way in hell could you be able to add enough cache to be of any use. You might get around 30% off by using local memory like Xenos but you'd need at least 100 megs of it to be usable for high resolutions and (multiple) FP16/32 buffers. That would require several billion transistors.

Xenos also has only 32GiB/s throughput to the cache in the daughterdie and it is a bit higher bandwidth than to Vram, that is around 22GiB/s. That would mean if GPU has >100GiB/s bandwidth to Vram and should need around as much to the cahce or it wouldn't be of much use.



"For example, what if you stacked half a gig of DRAM onto the CPU?"

You can't, dram is bigger than your CPU die. Though Intel's 80 core thingie will have some dram ontop of it but it is considerably smaller, I remember it being around 32 megs.

You could say "But hey, we'll have >100GiB/s off-chip bandwidht once Intel has its silicon optics working!". Well, anyone wants to guess how much bandwidth will GPUs use that day? My guess is that in four years we'll see GPUs with >500GiB/s bandwidth. It isn't all that much actually. 1024 bit bus with 4GHz RAM can give you just that.

3:10 AM, April 02, 2007  
Blogger Christian Jean said...

It did the research long ago and found it wasn't (yet) cost effective to use so it didn't.

You actually believe your nonsense? So your telling me Intel chose:

a) save a few hundred million dollars in R&D costs and production costs

Instead of...

b) loose billions in revenue and market share because the Athlon did

Wouldn't it be exactly the same for AMD/ATI?

No because AMD never had a graphics business to loose. They bought it to produce Fusion, which will add to their bottom line. Intel on the other hand does.

When was the last time you heard that a CPU socket had memory bandwidth of >100GiB/s? Fastest I know goes up to around 25% of that, far too little for high-end GPU. Heck, even $100 GPUs have much more bandwidth than any x86 CPUs!

Well guess what! Intel really wanted to keep the x87 chips, but reality (and demand) made it one. In time so will the GPU.

Use common sense? There are several cost efficient alternatives available. Look at when Intel packaged their Level II cache on their slot architecture.

You could have the same thing. A slot which has GPU+CPU, VRAM, Level 4 cache (shared by GPU + CPU). Possibilities are limitless... if you want it.

6:16 AM, April 02, 2007  
Blogger Unknown said...

This is EXACTLY what I was looking for!
AMD pulls an Intel-trick and squeezes 2 X Barcelona on a chip and makes it an Octa-core.
Nice... Sweep the rug out from under Intel's wabbly high-horse...

http://www.theinquirer.net/default.aspx?article=38634

Crysis.. here I come!

8:24 AM, April 02, 2007  
Blogger Ho Ho said...

First I apologise. 1024bit 4GHz DDR would give a bandwidth of terabyte per second, not half a terabyte. Intel's silicon optical connection was terabit per second. Big difference there.

16bit HT3 should give around 166Gbit or 20.8GiByte per second. Basically you would need to have around 200bit HT3 link just only for the GPU to provide comparable bandwidth and even then it wouldn't be that good because of the increased latencies. 200bit HT3 would mean six 32bit links per socket, all running at 2.6GHz.

Just for comparison, 8800GTX has memory bandwidth of around 700gbit/s and R600 will have around 1.1tbit/s



jeach!
"loose billions in revenue and market share because the Athlon did"

AMD doesn't win marketshare only thanks to IMC, there are many other factors too. Also I think Core2 has if not reversed then at least put loosing server marketshare on hold (for a while?).


"No because AMD never had a graphics business to loose"

You should include ATI into that too, you know. Anything that has an effect on ATI products effects AMD.


"Intel really wanted to keep the x87 chips, but reality (and demand) made it one"

x86 and x87 emerged to lower latencies and improve speed. There is no use of combining GPU and CPU, they do two different tasks. Using GPU as a big SIMD addon doesn't make much sense since GPUs have insane latencies already, combining with CPU would give little benefit, if anything.


"There are several cost efficient alternatives available"

name one ...

"You could have the same thing. A slot which has GPU+CPU, VRAM, Level 4 cache (shared by GPU + CPU). Possibilities are limitless... if you want it. "

... that would work in real world. Also a short description and/or calculations wouldn't be bad.


Repeat after me:
1) there is no socket with enough bandwidth for high-end GPUs.
2) creating big enough cache is way too expensive
3) Combining CPU (~200mm^2) GPU (~200mm^2) and insanely huge cache (very roughly 100mm^2 per 6 megs) will create an awfully expensive heater with way too big die area to be cooled efficiently.

Also, do you know what is the througput of the fastest socket and bus?


kingrichard
"AMD pulls an Intel-trick and squeezes 2 X Barcelona on a chip and makes it an Octa-core."

Interesting. I wonder how big latency increase will the second core have compared to the one that is directly connected to the memory pool.

8:42 AM, April 02, 2007  
Blogger Ho Ho said...

A little clarification

ho ho
"There is no use of combining GPU and CPU, they do two different tasks"

What I meant was that in high end where maximum performance is needed integrated GPU doesn't have a chance. In low-power and low end systems they are actually pretty good, about the same performance as current IGP's only taking a bit less power.

8:44 AM, April 02, 2007  
Blogger Christian Jean said...

There is no use of combining GPU and CPU, they do two different tasks.

No really? I didn't know that!

Using GPU as a big SIMD addon doesn't make much sense

Oh, so the CELL processor doesn't make much sense? GPU will become graphics and/or SIMD/MIMD, whichever your little heart contents for your desired market.

It's just a matter of 'fusing' everything together and maturing the technology for this purpose.

GPUs have insane latencies already, combining with CPU would give little benefit, if anything.

???????????

1) there is no socket with enough bandwidth for high-end GPUs.

I never said anything about socket bandwidth!

2) creating big enough cache is way too expensive

And five years ago if you would have told Intel they would have a product coming out with 8MB of cache on die, they would have burst out laughing at you!

11:36 AM, April 02, 2007  
Blogger Ho Ho said...

jeach!
"No really? I didn't know that!"

You needed x87 to compute basic things. You don't need wide and deep MIMD coprosessor with insanely long latencies in your everyday programs.

Also I guess you didn't see my second post ...


"Oh, so the CELL processor doesn't make much sense?"

Cell is way more efficient general purpouse DSP, if you can say so, than any GPU in existence and doesn't have pipeline length of several hundreds of cycles. You are comparing a bit too different things here. Also if you take a look at F@H statistics you can see that even though Cells are given considerably more difficult work units they still perform better than GPUs.

Btw, in SIMD intensive applications core2quadro is as fast as Cell. Though at 65nm, Cell might have an edge thanks to massively increased clock speed. Of cource Cell also has 3-4x more memory bandwidth.


"GPU will become graphics and/or SIMD/MIMD, whichever your little heart contents for your desired market"

Radeon x1950xtx has around 400GFLOPS of computing power. When you throw semi-complex SIMD computation it won't be much, if any, faster than 2.66GHz Core2Quadro that only has around 20GFLOPS. I'd like to have my HW more effective. I would take a C2Q with added 16 SPUs from Cell, though. At 45nm it is archievable with under 400mm^2 die size.


"I never said anything about socket bandwidth!"

You should since cache bandwidth and capacity won't save you when you want to use rasterizing. How do you access the rest of the few GiB's of textures and data to stream them through GPU several times per frame? Integrate it to the other side of the die as with 80-core? Sorry, but there is not enough space to do it.


"And five years ago if you would have told Intel they would have a product coming out with 8MB of cache on die, they would have burst out laughing at you!"

No, they wouldn't have. Doubling of the cache size with changing transistor sizes is normal. I expect we will see around 10MiB of cache per dualcore on 45nm, ~20 on 32nm, 40 on 22 etc. It ends at around 10-16nm with less than 100MiB of it on die. After that either something new comes along or we stagnate.

Xenos daughterdie has around 100M transistors for 10MiB of eDRAM. That means 10M transistors per meg. eDRAM is more transistor-effective than sram in CPU caches, about the same as zram. That means with 1 billion transistors you could go as high as 100MiB of eDRAM. Though there will still be the problem of connecting the cache and the GPU/CPU and streaming all the other data that is in external RAM. 80-core resolves it by glueing it to the back of the CPU but then not each core has same speed access to every part of the cache and also as said, you can't fit much RAM there.


Of cource there is another possibility:
we ditch rasterizing and move over to ray tracing. It is considerably cheaper on memory bandwidth than rasterizing. Intel has already done quite a bit of research on the subject and to me it seems as they are taking this quite seriously.

Sure, RT can take a bit more computational power (debatable) but computing power is cheap. Bandwidth is awfully expensive and it only gets worse as time passes.


There used to be time when you could fill a register every clock cycle with almost neglible latency. Those days are far gone and things seem to get worser with every generation.


I'd still like to hear about fastest buses and sockets. Even better if you can do some research to see how fast have they developed during the last 5-10 years or so.

12:55 PM, April 02, 2007  

Post a Comment

<< Home