INTEL Bensley is only good for 2 way computing
After I wrote this article about Bensley copying Athlon MP, I realized that there is a major problem for INTEL next generation Bensley platform (Dempsey+Blackford): how can INTEL do four way SMP on this chipset?
You can see from the diagram that there are 2 buses off the Blackford chipset, these two buses must be correlated to provide cache coherence for the two CPUs. In a shared FSB, each CPU snoop on the same bus to maintain cache coherence, basically the CPU listens to the bus to see what other CPU is doing to memory. In the digram made by acehardware.com for Athlon MP there was the "snoop bus" for a separate channel, indicating the two buses "snoop" on each other to have cache coherence.
For the same reason, the two buses on the Blackford chipset can't be truly independent, they must carry cache coherence information, or the whole thing will be broken. For instance, suppose Dempsey CPU1 on bus 1 modified its cached copy of memory location X, this information must be propagated via bus1 to the Blackford chipset then thru bus 2 to Dempsey CPU 2.
If one compares the INTEL Blackford chipset to the IBM Hurricane chipset, they look very similar to each other: two FSBs off the chipset. The difference is, on the Hurricane, each FSB can have two CPUs attached, there are three scalability ports that can be connected to other Hurricane chipsets to form larger SMPs. We don't see such scalability ports on the Blackford chipset. ( I expect INTEL to copy the design of the Hurricane chipset.)
In the Hurriacane diagram, two CPUs share a 667MHZ FSB, which makes each CPU having an average of a small but acceptable 330MHZ bandwidth. However, since Dempsey is dual core, if I hang two Dempseys to one of the buses off the Blackford, I get 4 CPU cores competing for the single bus -- deja vu all over again -- we know Xeon with shared FSB scales badly to 4P.
It is like AMD introduced multi-core for the sole purpose of destroying Xeon scalability---INTEL doubles the number of buses? No problem, it has to double the cores also, and again INTEL gets the same core/bus ratio. It 's very doubtful that INTEL can put four buses on the chipset though.
With Direct Connect Architecture, AMD's Opteron 8xx can do glueless 8 way SMP without any chipset. AMD is readying glueless 32P computing with Direct Connect 2.0.