Tuesday, July 11, 2006

SUN X4600, X4500 servers and 8000 Blades frag everyone else's x86 system

Folks, checkout the new SUN x64 products, the X4600 8P 16 way server, the X4500 4U server with 48 SATA disks, the Sun Blade system 8000 with 4P 8 way blades, take the virtual tour and examine the insides. The SUN folks are so clever. If I wasn't so impressed by x4100, I am definitely awed by the creativity Andy Bechtolsheim showed in these new products.

I was wondering how one can fit 48 hotswappable disks into 4U and I guessed right (well, that's the only way possible). 48 SATA disks requires a lot of bandwidth, over 20GB/s, for system-to-disk I/O alone. No Intel server with obsolete FSB technology can handle a fraction of that. Only AMD64 technology can enable such massive amount of I/O and still able to pump the data over the network. Each Opteron 2xx CPUs has two 8GB/s non-coherent HyperTransport links, a 2P X4500 thus have a total of 32GB/s I/O bandwidth. My understanding is SUN is taking full advantage of the HT links found in Opteron 2xx CPUs. One application for x4500 immediately come to mind is mail and online forums. Yahoo can serve 100K users with one of these boxes.

The X4600 8P 16 way server and 8000 4P 8 way blades are equally impressive. Take your time to examine them. The x4600 supports 8 socket 940 Opterons, but it can be upgraded to quadcore Socket F Opterons by swapping the CPU modules. No HP machine can touch these. According to SUN's published benchmarks, the 8P Sun x4600 beats the best 16P HP Integrity Superdome Itanium 2 server in both integer and floating point performance by up to 20%*. DELL? Not even worth mentioning, Intel doesn't even have scalable 4P technology to start with.

These SUN products kick ass!

55 Comments:

Blogger Mad Mod Mike said...

It seems Quad-Core's are still on schedule (link) despite morons and Intel fanboys saying otherwise. I can't wait to be able to buy Dual Quad-Core Opteron64's and laugh at Intel Fanboys and their W00lcrez & Conrads, lol!

10:55 AM, July 11, 2006  
Anonymous Anonymous said...

And you have the audacity to call other sites "paid pumpers"? This is just a commercial.

11:13 AM, July 11, 2006  
Blogger TheKhalif said...

yes, Sun hit one outof the park on these systems. Thinsg like this will make AMD servers more mainstream and even smaller companies will buy these for Solaris and Linux.

AMD is not facing doom and gloom as everyone who is blinded by Intel "bling-bling" thinks. They still sold 73% of the desktop IN THE RETAIL market.

I betthere lawyers are drawing up the bankruptcy papers now.

What a bunch of morons.

11:32 AM, July 11, 2006  
Anonymous Anonymous said...

Mad mod mike... I like your blog, but please dont start doing what the DR. does and link to your blog as if its an actual source, I am hoping it was an honest mistake.

12:00 PM, July 11, 2006  
Anonymous Anonymous said...

They are nice boxes but seem a little late. AMD was supposed to announce socket F's today. Is this why theyre waiting until Aug 1? When will Sun have Socket F modules for these things?

12:10 PM, July 11, 2006  
Blogger Sharikou, Ph. D said...

When will Sun have Socket F modules for these things?

Pay attention to the x4600 design. Each CPU+memory is on the pluggable module, as John Fowler said, you can just swap the CPU+memory module with newer ones when socket F comes out. This is possible only with AMD64, because of HyperTransport.

The SUN blade server goes even further, you can even hotswap I/O modules.

12:15 PM, July 11, 2006  
Anonymous Anonymous said...

Yes, I understand that. I was just hoping they would be socket F from the beginning. I imagine it will be at least 4-6 months before they provide socket F modules. Meanwhile a lot socket F stuff will be announced 3 weeks from now.

12:22 PM, July 11, 2006  
Anonymous Anonymous said...

Quote:
"And you have the audacity to call other sites "paid pumpers"? This is just a commercial."

If I say that the new BMW car blows out the competition because is a piece of art and technology and has superior performance and quality, I'm automatically labeled as a BMW pumper?
Those machines are great, SUN is now taking the max out of AMD64 architecture with "on-the-shelf" products, not strange Itanic stuff.

1:01 PM, July 11, 2006  
Blogger Steel Smack said...

"Mad mod mike... I like your blog, but please dont start doing what the DR. does and link to your blog as if its an actual source, I am hoping it was an honest mistake."

Here's the source that he wrote his article about...

Link

From what I'm hearing this won't be a desktop version of K8L, the desktop version won't be available till '08, which is supposedly what Henri Rodriguez was talking about a while back.

1:27 PM, July 11, 2006  
Anonymous Anonymous said...

Each CPU+memory is on the pluggable module, as John Fowler said, you can just swap the CPU+memory module with newer ones when socket F comes out. This is possible only with AMD64, because of HyperTransport.

Uh, no it isn't. I've seen Pentium Pros with CPU/Memory boards like that. AMD64 does make it easier since you don't have to include the northbridge, but it isn't really anything new.

1:48 PM, July 11, 2006  
Anonymous Graham said...

"And you have the audacity to call other sites "paid pumpers"? This is just a commercial."

It is even worse than that... SHarikou is a pumper all right but just hasn't figured out how to get paid for all his pumping. That just makes him a fool with an axe to grind.

2:10 PM, July 11, 2006  
Blogger Fritz said...

F has a different memory bus, but we've already designed CPU modules for that and the engineering samples run just fine. They'll be ready to go pretty much as soon as AMD makes the F generally available.

2:25 PM, July 11, 2006  
Anonymous Anonymous said...

Sun's new designs are very innovative and interesting.

But the price points are not going to build market share for Sun.

As much as Sun needs these super wing-ding-bling models, they need to redo their x2100, x4100, and x4200 so they are priced competitively with Dell.

Maybe once Sun fires those 4,000 people then the server prices will become more reasonable.

2:29 PM, July 11, 2006  
Anonymous Anonymous said...

Thanks Steel Smack.

2:30 PM, July 11, 2006  
Blogger Mad Mod Mike said...

"From what I'm hearing this won't be a desktop version of K8L, the desktop version won't be available till '08, which is supposedly what Henri Rodriguez was talking about a while back."

You're hearing wrong, that is talking the processor for Desktops, not servers. Server Quad-Core are due in 2H07 and possible H1, Desktop K8L is due 1H07.

2:40 PM, July 11, 2006  
Blogger Sharikou, Ph. D said...

But the price points are not going to build market share for Sun.

The 8P x4600 starts around $25K. You pay about the same for a 2P Wodcrest with 8GB of RAM. Go to hp.com and check.

3:24 PM, July 11, 2006  
Anonymous Anonymous said...

Haha, software RAID. Yeah, the big businesses will be jumping on that!

This is low end crap, might be fit for a small or medium business for non-critical use, but software RAID makes it a non starter for real usage.

3:32 PM, July 11, 2006  
Anonymous Mark said...

As a small business owner whose firm relies heavily on high-traffic server performance, I am thrilled about these new Sun products. We converted to Opteron in early 2005 and it's nice to see that AMD is still making strides forward.

When people predict the impending downfall of AMD, they are obviously ignoring the server market. I haven't seen a company say they're ditching Opteron for those new Intel server chips.

At this point, Conroe looks like it'll be a commercial success, at least upon launch. Nonetheless, Intel's talk of dominance carries little credibility until they kill the Opteron. The overclockers drool over benchmarks and caches, but you won't move many server chips on hype.

3:38 PM, July 11, 2006  
Blogger Sharikou, Ph. D said...

Haha, software RAID. Yeah, the big businesses will be jumping on that!

You don't have a clue. The so called hardware RAID is just a board with a 100MHZ CPU and some memory to do the RAID stuff. For outdated FSB based junk architecture, those 100MHZ CPU boards can save the FSB and PCI-BUS from overloaded by RAID traffic. But for AMD64 with Direct Connect Architecture, there is no such problem, Opteron has plenty of bandwidth, all those SATAs can be directly connected to hypertransport via a bridge. Software raid can be much faster because 10% of Opteron is 5x faster than a slow 100MHZ CPU.

3:40 PM, July 11, 2006  
Anonymous Anonymous said...

I doubt that we'll be seeing K8L for servers until at least the 3rdQ 2007. It'd be stupid of AMD to release QC K8 in Q1 and then release K8L only 2 quarters or worse 1 quarter later

As for desktops I seriously doubt that we'll see K8L desktop before 3rdQ 2007 and we'll most likely see it in Q1 2008 or less likely in Q2 2008. It'll probably debut with the launch of AM3.

The time period where AMD position and status in the market begins to come into question is one years from now when according to INTEL's roadmap it introduces penryn and two years from now when INTEL intro's nehalem. Penryn will be to conroe what rev. G is to rev. F while nehalem will be to penryn what K8L is to rev. G.

It seems that we are going to have a true duopoly with each company leapfroging the other and to that I have to say "HIP HIP HURRAH!!!". *Throws hands in the air and dances like Bill Cosby*

Oh BTW his name is Henri RICHARDS NOT RODRIGUEZ. :)

3:44 PM, July 11, 2006  
Blogger Sharikou, Ph. D said...

I've seen Pentium Pros with CPU/Memory boards like that.

No. There is a fundamental difference here. The x4600 can change CPUs without changing the motherboard and I/O, or it can change I/O without changing the CPUs. The x4600 can be fitted with Socket 1207 Opterons. Direct Connect Architecture separated the various links, so it's very flexible. FSB based crap can't do such upgrade at all.

3:45 PM, July 11, 2006  
Blogger Sharikou, Ph. D said...

When people predict the impending downfall of AMD, they are obviously ignoring the server market.

I projected that Intel will BK in 7 quarters. AMD may post lower revenue due to Intel's price war, but Intel is essentially slashing its own throat. We will see how Intel does in 2Q06 and see who is winning market share.

3:49 PM, July 11, 2006  
Blogger Mad Mod Mike said...

"As for desktops I seriously doubt that we'll see K8L desktop before 3rdQ 2007 and we'll most likely see it in Q1 2008 or less likely in Q2 2008. It'll probably debut with the launch of AM3."

I seriously doubt you have a brain. Quad-Core AMD Processors based on K8L for DESKTOP will be here in 1H 2007. Quad-Core AMD processors based on K8 for SERVERS will be here in 2H07 and K8L SERVERS should debut in 4Q07.

Intel fanboys stop trying to put off your Conrad Killer until 20xx when you know your beloved crap "CPU" can't even hold its own against a Sempron.

3:52 PM, July 11, 2006  
Anonymous Anonymous said...

If Sun is the best, you can't really complain that Intel is using it as a benchmark for their future processors.

6:10 PM, July 11, 2006  
Blogger TheKhalif said...

I doubt that we'll be seeing K8L for servers until at least the 3rdQ 2007. It'd be stupid of AMD to release QC K8 in Q1 and then release K8L only 2 quarters or worse 1 quarter later.

That's not how it works. By POSITIONING for different market quad core K8 CAN exist with quad core K8L.


I'm thinking that FX will be the first quad core desktop(they have cancelled 1MB chips and FX62-64 will be dual core), while through BullDozer AMD will fit K8L improvements to dual core.

Then K8L will be 8xx and quad K8 will be 2xx while dual K8L will be 1xx.

By using the rating sysytem a 2.2 quad could be labeled an FX66. They may wait until FX68 but I doubt it.

6:19 PM, July 11, 2006  
Anonymous Anonymous said...

As a sys admin I just want to say I don't turst software raid at all.

7:17 PM, July 11, 2006  
Anonymous Anonymous said...

48 SATA disks requires a lot of bandwidth, over 20GB/s, for system-to-disk I/O
Ummm... maybe you mean 20Gb/s (gigabits) or a little over 2 gigabytes/second? A SATA drive cannot do sustained transfers at 426 MB/s - more like 50 MB/s.

7:24 PM, July 11, 2006  
Anonymous Anonymous said...

The 8P x4600 starts around $25K. You pay about the same for a 2P Wodcrest with 8GB of RAM. Go to hp.com and check.

Well then hp.com is overpriced. You can get a Dell 2950 with dual Woodcrest 3.0, 8GB RAM and 3 73GB SAS drives for $8500.

7:28 PM, July 11, 2006  
Blogger Sharikou, Ph. D said...

Ummm... maybe you mean 20Gb/s (gigabits) or a little over 2 gigabytes/second? A SATA drive cannot do sustained transfers at 426 MB/s - more like 50 MB/s.

I am talking about buffer to buffer transfer, a SATA drive can do 600MB/s.

8:10 PM, July 11, 2006  
Blogger Sharikou, Ph. D said...

As a sys admin I just want to say I don't turst software raid at all.

As I said, hardware RAID is just moving the software to another CPU.

ZFS is very robust. You can ubplug the machine and it will maintain system integrity. Hardware RAID would require battery for the board.

8:13 PM, July 11, 2006  
Anonymous henry ford said...

As I said, hardware RAID is just moving the software to another CPU.

ZFS is very robust. You can ubplug the machine and it will maintain system integrity. Hardware RAID would require battery for the board.


Most good storage servers run TWO redundant RAID controllers with failover.

As far as I can tell, Sun's "thumper" is running *one* OS doing the RAID. So if there is a glitch, poof, down goes the storage server, maybe with corrupted files.

Plus a storage server is supposed to be simpler than Solaris, ZFS, and the endless administration you need for this stuff.

The reason the code is on a RAID controller is so it runs on dedicated hardware with dedicated RAM, battery, etc. It is a specialized system that runs independently.

With software RAID, you get some dumb OS bug or OS hot fix and your whole RAID and go up in smoke. No one in their right mind runs software RAID.

Never mind that "thumper" is a dumb design... pull the machine to do hot-swap, 48 heat generators in a small space, 48 vibration generators in a small space, etc. This is very poor industrial design that increases failure rates by orders of magnitude. BTW, having disks vertical instead of horizontal increases failure rates just by itself by 50% (much less air cushion for the heads in vertical mode and vibrational sensitivity is much higher without an air cushion). Disk drive companies love server makers who put in lots of vertical drives.

Sun blew it by trying to fit "thumper" into too small a space. 4U doesn't matter to the customer if the drives keep failing and the customer has to keep pulling the frakkin server out of the rack to change them at thousands of dollars each (vs. $90 street price for a 250GB drive).

While Sun does do interesting designs that look "gee-whiz", they need to hire some people that know how to design "real world" servers, not "demo" servers.

You know a company is in trouble when they have to rely on sending out free demo machines to get people interested.

Part of the problem is that Sun is pricing commodity x86 pricing like it is SPARC "golf course" stuff. Never mind the technology... just the pricing model is enough to slow-growth/no-growth Sun into oblivion over time.

Sun has to wake up and change their ways. Otherwise every single market advantage they gained with going with AMD will have been pissed down the drain.

8:52 PM, July 11, 2006  
Anonymous henry ford said...

I will add one more thing:

Sun WASTED two years of "go to market" time by going with ZFS *software* RAID.

For a $33K server, they could have put in 1-2 24-port hardware RAID controllers and shipped *TWO YEARS AGO*.

Because the reason "thumper" was out in the weeds so long is that Sun was adding software RAID code to ZFS!!!

Sun is basically just a bunch of morons who spend a lot of money on industrial design to make cool-looking servers.

They can't run a business and Mr. Yes-Man-Microsoft-Suckup-I-Love-San-Francisco CEO J. Schwartz doesn't seem to have any grasp on running a real business.

But damn he is good with buzzwords and blogs.

10:04 PM, July 11, 2006  
Anonymous Edward said...

Henry Ford said: "Most good storage servers run TWO redundant RAID controllers with failover."

That's because the RAID controller is a single point of failure. This is not the case with software RAID, though.

"As far as I can tell, Sun's "thumper" is running *one* OS doing the RAID. So if there is a glitch, poof, down goes the storage server, maybe with corrupted files."

You said a glitch, of what? The OS? The FS? If there's a glitch on the OS and the FS routine, will your hardware RAID survive the error? Will you even be able to tell whether the problem comes from the RAID controller or the OS?

"Plus a storage server is supposed to be simpler than Solaris, ZFS, and the endless administration you need for this stuff."

Oh, yes, by your logic, you could just buy a dozen 1TB external storage off Fry's. ;-)

"With software RAID, you get some dumb OS bug or OS hot fix and your whole RAID and go up in smoke. No one in their right mind runs software RAID."

If you get some dumb OS bug or OS hot fix, even hardware RAID won't save your data. As long as the software routines are modular and well debugged, the only think it may lack the hardware is probably performance. But I guess even RAID-5 operations are relatively cheap today.

10:19 PM, July 11, 2006  
Anonymous Anonymous said...

"I am talking about buffer to buffer transfer, a SATA drive can do 600MB/s."

Who cares? No disk can sustain that bandwidth. Explain how the FSB is a bottlenck for the sustainable bandwidth of 48 disks.

12:00 AM, July 12, 2006  
Anonymous Anonymous said...

The reason the code is on a RAID controller is so it runs on dedicated hardware with dedicated RAM, battery, etc. It is a specialized system that runs independently.

With software RAID, you get some dumb OS bug or OS hot fix and your whole RAID and go up in smoke. No one in their right mind runs software RAID.


No one in their right mind and knowledgable believes your spew. RAID cards can fail or partially fail and in some cases that means you can kiss your data bye bye too. eg: Mylex RAID cards that lose their configuration data.

Dedicated hardware, yeah, yeah. I am glad most new hardware RAID cards do not use a i960 anymore and finally use RAM instead of teeny ASICs in the case of RAID5 capable boards. Otherwise, they will never perform. You want to tell me that 1GB of RAM on an Areca 24-port card is a big enough buffer? On light loads perhaps.

As for OS bugs/hotfixes, I am sorry, your experience must be limited to Windows. I have never encountered software raid problems with Linux software raid and I have been making heavy use of it for over 4 years now. I believe one can expect the same from Solaris. The risk you bring up about software raid is possible but in practice it is very remote whereas hardware raid being flaky is something that can be seen in practice so you cannot just go out and get any hardware raid card. Flaky software raid drivers (Linux's software raid has proven solid as opposed to Promise, Highpoint proprietary implementations) are the same as flaky firmware the only difference being where they run. The first on the system cpu and system ram and the later on the board's cpu and ram.

Never mind that "thumper" is a dumb design... pull the machine to do hot-swap, 48 heat generators in a small space, 48 vibration generators in a small space, etc. This is very poor industrial design that increases failure rates by orders of magnitude. BTW, having disks vertical instead of horizontal increases failure rates just by itself by 50% (much less air cushion for the heads in vertical mode and vibrational sensitivity is much higher without an air cushion). Disk drive companies love server makers who put in lots of vertical drives.

Sun blew it by trying to fit "thumper" into too small a space. 4U doesn't matter to the customer if the drives keep failing and the customer has to keep pulling the frakkin server out of the rack to change them at thousands of dollars each (vs. $90 street price for a 250GB drive).


Now that is really funny. HP has this huge SCSI RAID box that has rows upon rows of 15K RPM SCSI drives all standing vertically. Shall I trust you or the engineers that work at HP and Sun designing big iron?

As for heat, I see there are gaps a good few millimetres wide between drives and ten huge heavy duty fans in the front to blow air through those gaps to take away heat generated by the drives.

All you are good at is making imaginary criticisms of Sun hardware and single-sided commentary on the supposed benefits of hardware RAID. You are no better than Sun's or HP's engineers.

12:14 AM, July 12, 2006  
Blogger Sharikou, Ph. D said...

No one in their right mind and knowledgable believes your spew. RAID cards can fail or partially fail and in some cases that means you can kiss your data bye bye too. eg: Mylex RAID cards that lose their configuration data.

In general, I think software RAID is more reliable, because
1) I trust a server CPU such as Opteron and server RAM more than the crappy stuff on a RAID card.

2) Software raid only depends on basic disk drivers. Hardware RAID relies on raid card drivers which may not be well tested for the target OS.

3) The software RAID software should be far more sophisticated than the software on RAID cards. Software RAID understands filesystem, hardware raid only understand blocks.

4) Software raid may have better recovery capability.

5) Software RAID can perform better on Direct Connect Architecture where bandwidth is plenty,

1:06 AM, July 12, 2006  
Anonymous Anonymous said...

Now that is really funny. HP has this huge SCSI RAID box that has rows upon rows of 15K RPM SCSI drives all standing vertically. Shall I trust you or the engineers that work at HP and Sun designing big iron?

As for heat, I see there are gaps a good few millimetres wide between drives and ten huge heavy duty fans in the front to blow air through those gaps to take away heat generated by the drives.


As usual, another day, another moron.

A VP at Seagate told me about the vertical vs. horizontal drives. The failure rates are much higher for vertical mount drives. It is called PLANNED OBSOLESCENCE. It is very big in America, in the world. You should read up on it.

And when you look at the drives, you need to get the orientation MTBF specs. Which are not available except to large OEM customers.

But you wouldn't know this because you are a moron.

HP, Sun, Apple, and all the other companies that storage systems... make those systems... FOR MONEY.

So if they can sell you more expensive "Sun-certified" RAID drives for 10X-20X markup, they will.

For American companies like Sun, you have to think "MONEY" and "ONLY ENOUGH QUALITY SO THE MONEY DOESN'T STOP".

This is why "Thumper" has a fancy case and a retarded design inside that will eat hard drives alive.

No one designs reliable systems that have drives stacked that deep. No one besides the dummies at Sun who made "Thumper" who are going to pump this thing and then dump it on their customers... and sell their customers many many thousand dollar drives that cost Sun about $50.

Let us just wait and see what the market has to say about "thumper". I will wager it is discontinued in a couple years or less.

1:38 AM, July 12, 2006  
Anonymous Edward said...

"A VP at Seagate told me about the vertical vs. horizontal drives. The failure rates are much higher for vertical mount drives."

What the heck are you talking about? Are we supposed to believe a random word referred by you from 'some' Seagate VP (are you sure he's the technical one)?

Here's a googled result on this question in 20 sec... Just search for "Seagate" in the page. I hope this settles the issue.

Have your opinions, fine, but you should know better than calling others moron before you google properly first. Otherwise you're just making fun of yourself.

5:08 PM, July 12, 2006  
Anonymous Anonymous said...

Sharikou,

advocating software RAID is just crazy. Hardware RAID is even more about reliability than it is about the CPU utilization benefits of doing parity computations in an ASIC. Under software RAID, if the OS locks up during a disk write, the entire array stripe that the OS was writing to will be corrupted. The only way around this is with a journaling file system, with all of the performance costs that implies. Hardware RAID doesn't have this problem unless the IO controller ASIC/CPU crashes.

Even in terms of speed, the CPU overhead of computing RAID 5 or 6 parity blocks is considerable for any non-trivial writing load. Dedicated hardware will be faster at parity calculations than general purpose CPUs ever will be.

Software RAID is a non-starter for serious use. It's not reliable enough and too slow.

6:50 PM, July 12, 2006  
Blogger Sharikou, Ph. D said...

advocating software RAID is just crazy

I suggest you to watch the videos on this page to get educated on current filesystem and data integrity technology. Everything you knew is pretty much obsolete. No hardware raid can do what you will learn. Watch them, and come back to say thanks.

RAID5 is history. RAID-Z is the future.

6:56 PM, July 12, 2006  
Anonymous Anonymous said...

I suggest you to watch the videos on this page to get educated on current filesystem and data integrity technology. Everything you knew is pretty much obsolete. No hardware raid can do what you will learn. Watch them, and come back to say thanks.

RAID5 is history. RAID-Z is the future.


And I suggest that you take a napkin and wipe the Sun marketing spittle that is dripping from your face.

There may be elements of RAID-Z that are useful in "the future", but "the present" is here today and RAID-Z looks like a loser. It looks two years out of date. RAID-Z is RAID5 today vs. the rest of the world that has gone onto better things.

If you knew something about RAID, you would know how big RAID6 is today. RAID6 is big because every time a RAID5 disk fails and a rebuild is needed, the cost is immense. And it is not that uncommon for two RAID5 disks to fail giving you a big problem. RAID6 helps prevent rebuilds by providing an extra safety margin before a rebuild is needed.

To do its magic, RAID6 requires massive CPU overhead to do parity calculations (much more than RAID5). That is why hardware RAID cards have a dedicated ASIC chip for this purpose.

However, for any RAID systems to really evolve, they need market share and users. A few users buying 170LB super-vibrato heated disk-drawers are not going to evolve RAID-Z into anything but a Sun science-fair business.

Compare that to today's massive numbers of 24-port PCI-Express RAID cards with co-processors, parity ASICs, and local cache, and you will see the evolution is occuring on those $1100 cards (and cheaper every day). Tons of these cards are sold every day for installation into all sorts of servers. Not just Sun DiskDrawers.

When you have ever-advancing interconnects and ever-increasing processor density, the cost of "hardware" RAID is becoming cheaper and cheaper. And the capability is getting better every minute.

It is easy to imagine a 4-core RAID processor sitting on an x16 PCI-Express card, giving you 8GB/sec of hardware RAID bandwidth. That level of bandwidth covers a massive number of disks. Two of these cards would still be much cheaper than Sun's current RAID-Z system. And have the benefit of affordable "hot spare", "hot failover", scalability through adding more cards, etc.

And in the future, such a card may use HT, enabling it to handle any number of channels or disks.

When RAM gets cheaper, you will see hardware RAID with 4 DIMM slots supporting up to 16GB cache -- per card.

So with two cards you will have the capacity for 32GB cache connected to the host system through HT or 8GB/sec PCI-Express x16. That is with the current version of PCI-Express. For the next version of PCI-Express that 8GB/sec will be 16GB/sec.

Host RAID does not stand a chance to compare with good hardware RAID. There is no logic to software RAID. None. It is making a simple system -- storage -- and making it much more complex. And when you have a rebuild, there goes your server performance. Versus having super powerful co-processors doing the work.

In a way the same people who espouse the benefits of co-processors (via HT) are contradicting themselves and saying "you don't need co-processors, not really" for something as high overhead as RAID6 parity calculations. It is dumb and hypocritical.

This blog has become the home of too many blinded-by-the-Sun retards. In case you didn't notice Sun is laying off 4000 people because their business is failing. And giving out free demo servers because no one wants to shell out huge dollars for Sun's esoteric science-fair servers.

Wake up, Sharikou-Icarus!!! Your wings are melting!!!

7:50 PM, July 12, 2006  
Blogger Sharikou, Ph. D said...

There may be elements of RAID-Z that are useful in "the future", but "the present" is here today and RAID-Z looks like a loser.

Dude, open your mind. All the crap you wrote about we have already discussed. I am trying to educate you. Go back and read what we told you: hardware raid -- software raid on an separate board. Then restart the discussion from that base line of basic understanding.

8:00 PM, July 12, 2006  
Anonymous Anonymous said...

If you knew something about RAID, you would know how big RAID6 is today. RAID6 is big because every time a RAID5 disk fails and a rebuild is needed, the cost is immense. And it is not that uncommon for two RAID5 disks to fail giving you a big problem. RAID6 helps prevent rebuilds by providing an extra safety margin before a rebuild is needed.

To do its magic, RAID6 requires massive CPU overhead to do parity calculations (much more than RAID5). That is why hardware RAID cards have a dedicated ASIC chip for this purpose.


I have already pointed out that all RAID6 does is allow you to survive the failures of two disks. You are so full of nonsense such that all you do is try to cover up your stupid comments by using general phrases such as "RAID6 helps prevent rebuilds by providing an extra safety margin before a rebuild is needed". In the end, you still need to rebuild and running in degraded mode means extra calculations and that means performance hit and I guarantee you that any hardware RAID card will slow down whereas a box with Opterons will have no performance penalty whatsoever. Massive cpu overhead? Yeah, if you were running a 486 or a Pentium perhaps. RAID5/6 will at most account for 5% cpu overhead of a Duron (yes, that's old I know).

There was a discussion on what RAID array to use on our new 3ware 8508 boards in one of the companies I use to work for about two years ago. I was the only one who was against RAID5 on the team level plus one more in the management. I contended that we were better off running RAID1+0 which was seconded by one of the managers but the rest believed in RAID5 on hardware. They now run multiple RAID1 arrays instead.

I would not trust your 24-port board to do RAID6. I would create 12 mirrors and then RAID5/6 them together. I wonder how well the onboard processor will handle that. Nah, better trust Linux software raid or in this case ZFS.

Nor am I the only one who is in the know due to real world experience. http://archives.neohapsis.com/archives/dailydave/2004-q4/0170.html

Dream on. Hardware RAID were faster 8 years ago and necessary too.

Compare that to today's massive numbers of 24-port PCI-Express RAID cards with co-processors, parity ASICs, and local cache, and you will see the evolution is occuring on those $1100 cards (and cheaper every day). Tons of these cards are sold every day for installation into all sorts of servers. Not just Sun DiskDrawers.

Yeah right. Only Areca sells 24-port RAID cards and they are hard-pressed to finally get their driver into the Linux kernel. I would not pick up a Areca card when I have had solid experience with 3ware cards and 3ware has very good support and quality.

You are still in the nineties. Go back to your little hole in the past and do not trouble us.

8:54 PM, July 12, 2006  
Anonymous Anonymous said...

Even in terms of speed, the CPU overhead of computing RAID 5 or 6 parity blocks is considerable for any non-trivial writing load. Dedicated hardware will be faster at parity calculations than general purpose CPUs ever will be.

Software RAID is a non-starter for serious use. It's not reliable enough and too slow.


If you were talking about RAID1,1+0 then I would agree with you for the moment. At least in regard to the Linux software raid implementation, 3ware's hardware mirroring is done better than the Linux mirroring code.

As for needing dedicated hardware, that was true 8 years ago. Not since 6 years ago when hardware RAID cards using Intel i960 cpus got clobbered at RAID5 by Windows' software raid and at acceptable cpu overhead too.

Plus RAID5/6 alone is overrated.
http://archives.neohapsis.com/archives/dailydave/2004-q4/0170.html

Linux software raid is mature and fast. Writes at over 100MB/sec in RAID5 are common even with iSCSI devices. Linux admins will probably have no qualms about trying out ZFS but they might about tackling the subtle differences in Solaris 10 administration.

9:05 PM, July 12, 2006  
Anonymous Anonymous said...

"Linux software raid is mature and fast. Writes at over 100MB/sec in RAID5 are common even with iSCSI devices. Linux admins will probably have no qualms about trying out ZFS but they might about tackling the subtle differences in Solaris 10 administration."

As I said, this blog is full of retards.

A modern 3ware card:

"3ware 9590SE Serial ATA II RAID controllers deliver over 800 MB/s RAID 5 reads and exceed 380 MB/sec RAID 5 writes."

That is pushing FOUR TIMES the write performance you mentioned.

So someone could drop in a 3ware card and get nearly a 4X boost in RAID5 write performance. Only a retard would say "no, I want to stick with slow Linux code..."

Linux code is crap compared to RAID code from dedicated storage technology companies. 3ware, Areca, etc., sell a lot of adapters for Linux... it's not because Linux code is better than their hardware RAID... get a frakkin clue.

Just look at how weak Linux is compared to Windows... **running frakkin Linux apps** !!!! Just how weak is that? Linux cannot even run Linux apps as fast as Windows!!!

Is that the same slow code you want running your RAID??? HELL NO.

11:27 PM, July 12, 2006  
Blogger Sharikou, Ph. D said...

"3ware 9590SE Serial ATA II RAID controllers deliver over 800 MB/s RAID 5 reads and exceed 380 MB/sec RAID 5 writes."

Can you check what's the clockspeed of the CPU on that 9590SE card?

According to SUN's tests, a single Opteron can deliver 8GB/s for RAID5, unfortunately, the HDDs are not that fast. In the x4500, there are 48 drives, the sustained speed with ZFS is 2GB/s, that's close to 48x platter speed for 7200RPM drives. And ZFS with RAIDZ is 1000 times more reliable than plain RAID5.

11:54 PM, July 12, 2006  
Anonymous Anonymous said...

"Can you check what's the clockspeed of the CPU on that 9590SE card?

According to SUN's tests, a single Opteron can deliver 8GB/s for RAID5, unfortunately, the HDDs are not that fast. In the x4500, there are 48 drives, the sustained speed with ZFS is 2GB/s, that's close to 48x platter speed for 7200RPM drives. And ZFS with RAIDZ is 1000 times more reliable than plain RAID5."


I agree plain RAID5 is not a good solution for the modern enterprise.

But my original position stands in comparing a 3ware hardware RAID5 card vs. Linux soft RAID5. Linux soft RAID, in general, is nothing exceptional.

And when you look at price/performance, using 3Ware RAID5 instead of Linux soft RAID5 is a clear win for 3Ware.

---

Now, let's look at the high-end world of the "Thumper".

Instead of 3Ware, we will use Areca, whose new cards carry up to 4GB RAM cache per card and sustain buffered RAID5 writes at 700MB/sec.

So all it would take is three cards and you are in the same performance ballpark as the numbers you quote for ZFS/RAID-Z.

And remember, RAID-Z is merely Sun's proprietary implementation of RAID5.

"RAID Z

Sun's ZFS implements an integrated redundancy scheme similar to RAID 5 which it calls RAID Z. RAID Z avoids the RAID 5 "write hole" [2] and the need for read-modify-write operations for small writes by only ever performing full-stripe writes; small blocks are mirrored instead of parity protected, which is possible because the filesystem is aware of the underlying storage structure and can allocate extra space if necessary."


Even with all those optimizations, 3 ARECA cards are the same speed in aggregate.

But let us look at something very important -- data recovery. Because Sun's so-called "RAID" is really not RAID, but OS-required-RAID, it means there is no data recovery possible with existing RAID tools that work on accepted standard RAID levels.

So we have RAID-Z as Sun's first modern attempt at software RAID. So far it looks like the traditional set of strengths and weaknesses of software RAID. Impressive fake benchmarks (similar to how Intel Core Duo 2 with 4MB cache wins a lot of benchmarks), but a massive headache in the real world. Having to depend on Sun for RAID code is probably not the best thing in the world. Having a disk system that is really not a separate system but tied deeply into the OS makes data recovery very difficult, if not impossible. And of course a bug anywhere the entire stack, as RAID-Z depends on a lot more of the stack, now can trash your data. That is not a step forward.

It is hard to see what the customer win is. Especially as Sun is charging the customer more for software RAID than hardware RAID would cost. Maybe you can get some gee-whiz "in cache" benchmarks that Sun RAID-Z wins, but in the real world of tested components, test RAID code, and well defined and isolated storage architecture layers, there is no reason to go with Sun and take on that much extra risk with your data.

1:52 AM, July 13, 2006  
Anonymous Anonymous said...

Why I said Sun systems were WAY OVERPRICED.

So I took a little time and put together an off the shelf storage server that is remarkably like Sun's x4500, even down to the retarded top-loading drives:

AIC RMC5D2-RI-XPSM 5U RACK WITH 48X1" TOP LOADING HOT-SWAPS BAY MULTI-LANE, 1350W REDUNDANT PSU $4800

48 500GB SEAGATE 7200RPM SATA II DRIVES, 16MB CACHE PER DRIVE $11520

TYAN 2P MOTHERBOARD WITH 16 DIMM SLOTS $500

OPTERON 285 RETAIL BOX (x2) $2200

2GB DDR400 ECC DIMM (x16) $4400

ARECA 1170 24 PORT RAID6 with 1GB CACHE PER CONTROLLER (x2) $2700

ADDITIONAL 2-PORT GIGE PCI-E FOR TOTAL 4 PORT GIG-E $300

OPEN SOLARIS WITH ZFS & RAID-Z $0

TOTAL: $26,600

SUN TOTAL FOR SAME SPEC MACHINE: $69,995.00.

---

It is so obvious why a smart company like Google does its own system integration.

So they do not get screwed by companies like Sun that are GREEDY AS ALL SIN.

No wonder Sun's market share is not growing. No one is that dumb.

2:36 AM, July 13, 2006  
Anonymous Anonymous said...

OH, FOR ABOUT $3K MORE THAN THE 2P SERVER (~$30K TOTAL), YOU CAN GET A 4P SERVER with FOUR OPTERON 875 CHIPS (8 TOTAL CORES), GIVING YOU A SCREAMER THAT WILL ABSOLUTELY KILL THE SUN SYSTEM, HAVING TWICE THE MEMORY BANDWITH AND 17.6 GHZ OF AGGREGATE PROCESSING POWER VS. SUN AT 10.4 GHZ AGGREGATE.

FOR LESS THAN HALF PRICE, YOU ARE GETTING DOUBLE THE CORES, 69% MORE CPU CYCLES, AND 100% MORE MEMORY BANDWIDTH. HMMMM.

SO FOR THE SAME MONEY, YOU CAN GET 48 TB vs. 24 TB, 16 CORES vs. 8 CORES, 35.2 GHZ vs. 20.8 GHZ, ETC.

DAMN DO YOU HAVE A SYSTEM THAT WILL FRAG SUN.

2:50 AM, July 13, 2006  
Anonymous Anonymous said...

OR FOR $37K (UP FROM $30K), YOU CAN BUMP THAT 4P (8 CORE) SYSTEM UP TO 36TB USING 750GB DRIVES. TODAY.

AS ARECA IS SOLARIS CERTIFIED, YOU CAN TREAT THOSE DRIVES AS JBOD AND USE YOUR SOFTWARE RAID-Z OR YOU CAN USE NATIVE HARDWARE RAID6 (WITH DEDICATED ASIC FOR SPEED) OR YOU CAN MIX AND MATCH DEPENDENT ON YOUR NEEDS.

MAYBE YOU WANT 24TB OF RAID-Z AND 12TB OF RAID6. WHY NOT?

TAKE YOUR PICK.

SO YOU GET TO CHOOSE WHAT IS BEST FOR YOU VS. NO CHOICE GOING WITH SUN.

REMEMBER, SUN'S ENTRY LEVEL 12TB SYSTEM IS $33K.

AND REMEMBER, ALL THESE SUN SERVERS HAVE *HALF* THE MEMORY, 16GB vs. 32GB.

WHICH ALL LEADS TO A MAJOR FRAG OF SUN'S PRICE/PERFORMANCE.

WHY PAY FOR SUN EXECS TO PARTY ALL DAY AT THE GOLF COURSE?

3:01 AM, July 13, 2006  
Blogger Sharikou, Ph. D said...

Please do not post in all caps.
It's an interesting exercise to see the cost of buidling it yourself. I would have to say the SUN system is fairly priced.

3:24 AM, July 13, 2006  
Anonymous Anonymous said...

"It's an interesting exercise to see the cost of buidling it yourself. I would have to say the SUN system is fairly priced."

What makes you say $43K over street price (Sun would get a much better price on parts) is "fair"??

The DIY system even has 32GB RAM; the Sun only comes with 16GB.

And if you need spare parts for the DIY system, they are all off the shelf and available easily.

Somehow I cannot see why someone would pay $43K+ for a hexagonal grid cut case and some silver spray paint.

3:39 AM, July 13, 2006  
Anonymous Anonymous said...

The other thing about NOT buying overpriced stuff from Sun is that you get some amazing functionality that is not available from Sun.

For instance, if you choose a motherboard for your DIY-Thumper that has a HT slot (such as ones from Supermicro), you can add Pathscale's communication processor [PDF]:

"InfiniPathâ„¢ InfiniBandâ„¢ Interconnect

* Built on HyperMessaging Architecture
* Highest message transmission rate (10X-MR)
* Lowest MPI & TCP latency
* InfiniBand 4X compatible
* Open IB & IBTA compliant
* PCI-Express or HyperTransport (HTX) interfaces
"


For the DIY option, you could actually afford two systems and have super-fast communication between them based on a dedicated communication processor sitting on the HT bus.

Which gives you the ability to do a very nice cluster that absolutely flies. All for far less money than Sun.

Maybe Google should take their integration expertise and start selling servers...

1:53 PM, July 13, 2006  
Blogger Mori said...

Anonymous, your arguments seem devoid of practical, real-world experience.

As I posted elsewhere, I'm in the industry. We have a machine room full of racks upon racks of Dell PowerEdge servers. Those servers run a mix of Linux (Red Hat Enterprise Linux) and Windows.

The Linux servers all use Linux software RAID. Because we haven't found any really good software RAID solutions for Windows, the Windows servers all use the add-in hardware RAID daughtercards.

Our experience has been that Linux software RAID is vastly more reliable, mature, and easier to configure/manage than the hardware RAID solutions:

- The code (firmware) for the hardware RAID controllers is buggy, and requires firmware update after firmware update. Applying those firmware requires performing a series of specialized steps on each server, a process which cannot easily be automated. In contrast, I perform "firmware updates" on our Linux software RAID systems whenever Red Hat releases a new kernel package, and we have existing infrastructure to easily disseminate and install new kernel packages.

- Creating new hardware RAID containers is a PITA, because you either need to use the configuration system embedded in the controller BIOS (requiring you to be on the machine's console and press a certain key sequence during the POST) or else use specific programs provided by the manufacturer that can speak to the RAID controller. Under Windows, using the programs isn't too bad, but the Linux solutions all suck. It's painfully obvious that Linux is still an alien concept to most hardware RAID companies, despite the fact that it's in damn near every server room now. In contrast, using a single command-line Linux program ("mdadm"), you can perform any action on Linux software RAID that you wish.

- The hardware RAID controllers can't handle even simply-nested RAID setups. For example, the storage array for our main mail server is a stripe of mirrors, which Linux software RAID can do. The hardware RAID controllers can't do it.

- Time and time again, we've experienced inexplicable failures with the hardware RAID systems. Arrays disappear, drive letters get remapped, and other bizarre things happen. To be fair, some of these problems might be a result of specific interactions between Windows and the hardware RAID systems, but it is nonetheless the case that the hardware RAID systems simply are not reliable.

- Modern hard drives have extremely sophisticated internal diagnostic capabilities (i.e., the S.M.A.R.T. subsystem). Using Linux software RAID, you are speaking directly to the hard drives, and have access to the SMART subsystem. Using smartmontools, we proactively monitor and test the health of our drives. In contrast, the "diagnostic" capacility of hardware RAID systems is a joke: you'll be notified if a drive has "failed" and that's it. (I would hope that the hardware RAID firmware takes advantage of SMART capabilities, but who knows what the hell the firmware is doing? They certainly won't let us see the source code for it.) There's no way to query the drive directly to ask it what it thinks has failed. In most of the cases, if we take the "failed" drive out of a hardware RAID array and place it in a Linux system (so that we can talk to the drive), we discover that the drive thinks it's perfectly healthy.

- It's been a while since we've performed benchmarks, but the last time we did, Linux software RAID beat the performance of hardware RAID systems handily. Furthermore, the CPU and I/O effort required to drive the software RAID system is negligible. Remember, most servers today are hideously underutilized.

The conclusion we drew long ago is simple: software RAID systems, when implemented correctly (e.g. Linux software RAID), are more reliable, efficient, mature, and easier to configure/manage than hardware RAID systems.

If you disagree with me, I'd welcome a rebuttal based on real-world experience.

If you don't have real-world experience, then please shut your noise-hole.

3:00 AM, July 15, 2006  
Anonymous Anonymous said...

I wonder how many here lamenting the cost of Sun gear have actually purchased it, because the prices listed on Sun's site are essentially the MSRP. For better or worse, Sun doesn't have a direct model a la Dell; they rely primarily on their channel partners (though they'll happily handle large accounts directly). Any Sun VAR, even on a small order, will discount from the prices listed on Sun.com. While I don't think anyone will ever consider Sun cheap (though we picked up several of their quad Opteron servers for a song when Sun was directly auctioning them on eBay awhile ago), they probably aren't as expensive as you think. Sun isn't competing with white box servers put together in some guy's basement; they're competing against IBM, HP, etc. And in that regard they're generally price competitive.

9:32 PM, July 15, 2006  

Post a Comment

Links to this post:

Create a Link

<< Home