Clusters: Affordable and Available

By Larry Stevens

High availability, scalability and relatively reasonable pricing are the reasons that users are choosing cluster systems, but they may not be the best idea for all applications.

Two heads may be better than one, but two complete bodies--or four, six or eight--are better yet. That's the theory behind clusters. Along with massively parallel processing (MPP) and symmetric multiprocessing (SMP), clusters form a trio of technologies that aim to boost server functionality, performance and availability.

Each of these three technologies has its pros and cons. SMP machines, which allow you to add processors that share other resources, are good for boosting processing power incrementally. For example, if you have 1,000 users accessing a database application and you want to be able to double or triple the number of users without investing in an entire new system, SMP offers scalability at the lowest cost and development time.

However, SMP processors have local caches but share a single main memory. As a result, they don't always scale well. Each time you add a processor, the performance boost is smaller, because you're increasing the burden on central memory and the memory-processor bus. At a certain point, adding another processor may actually decrease performance. Accordingly, SMP machines rarely go beyond eight processors. At that limit, the normal choice is to replace the SMP box with MPP--a process that will require much reprogramming--or buy a second SMP system and connect it to the first in a cluster.

In MPP, sometimes called "shared nothing," each processor has its own local memory and operates independently. Accordingly, MPP systems, which may have hundreds of processors, don't suffer from the bottlenecks inherent in SMP. But MPP machines are expensive and present some programming challenges. The use of MPP systems in business, as opposed to scientific or engineering applications, is still new. Most of the early commercial adopters plan to use them to replace high-end mainframes.

In contrast, clustering offers a way to scale up to if not hundreds at least about 64 processors, less expensively than with MPP, incrementally, and in some cases with minimum reprogramming. A cluster is a set of two or more loosely coupled systems that act as one complete server. Clusters generally are made up of SMP machines, which usually have from two to eight processors each, a shared-disk subsystem and software that takes over in the case of a failure.

The key advantage of clusters over the other technologies for many users is that they provide a degree of high availability at a much lower cost than traditional fault-tolerant systems (or hot standby) systems. Many companies that want high availability but are not content to have the extra processors sit idle when they're not needed opt for clusters.

"It's easy for an SMP machine to go down if one processor fails," says Derek Kaufman, middleware manager for clothing manufacturer Levi Strauss and Co. in San Francisco. "MPP is still too new and expensive, although that might be changing. So far clusters are the most flexible and reasonably priced solution for scaling up and for high availability."

New to Unix

Unlike SMP and MPP, clustering is not a new technology. The VAXCluster from Digital Equipment Corp., which uses the OpenVMS operating system, has been around since the early 1980s. But clusters are new to Unix. It wasn't until companies began to use Unix servers for mission-critical applications that high availability, and therefore clusters, became important in the Unix arena.

In 1993 DEC leveraged its clustering know-how and introduced its Unix-based DECAdvantage. The same year, IBM, with the help of Clam Associates of Cambridge, MA, introduced High-Availability Cluster Multi-Processing (HACMP) for AIX, its Unix variant. Now over half a dozen companies provide Unix clusters, including Data General, Hewlett-Packard, Pyramid Technology, Sequent Computer Systems and Sun Microsystems.

Since most clusters are made up of SMP machines, virtually all cluster makers also sell SMP systems. MPP, on the other hand, is new and specialized. IBM and Pyramid are two cluster companies that also sell MPP systems, but only Pyramid allows you to include its MPP system in a cluster.

Hot for Uptime

While clusters give companies both increased scalability and high availability, currently most users are more interested in availability. They also value the fact that members of a cluster can be busy with processing tasks while standing ready to take over another member's work in the event of a failure.

"Our hot standby system gave us high availability, but clustering allows us to make the best use of our investment," says Peter Smith, systems manager for TMI Communications in Ottawa, Canada. The company provides MSAT satellite communications to telecommunications carriers. It currently has two DEC AlphaServer 2100s (each with two processors), but it only uses one. The second is a hot standby connected to the first via DECsafe Available Server.

Clusters achieve their high availability through redundancy. A fail-over system, such as DECsafe, includes "heartbeat" software, in which each member of the cluster continually checks the others to ensure that they are running properly. As soon as a member senses a failure, it takes over the jobs of the failed processors. Normally users who were logged on to the failed machine are automatically relogged into another machine, a process which may take several seconds to several minutes. Automatic fail-over is important when purchasing a cluster for high availability. "Without it, getting users back online is manual and can take too long," says Wayne Kernochan, director of commercial systems research at the Aberdeen Group in Boston.

MSAT, an alternative to cellular technology, provides communications via pager, fax or voice phone to users through resellers such as Bell Mobility. Currently, one AlphaServer 2100 is enough to run TMI's Customer Management Information System (CMIS) applications, which hold data for each MSAT customer it serves. The server is used to activate the service for customers and to collect usage data for billing purposes. While TMI provides 24-hour-by-7-day service, employees work 9 to 5 and not on weekends. "We have pagers, but we don't want to be wakened in the middle of the night," Smith says.

Soon, TMI will convert the pair of servers to a cluster solution DEC calls TruCluster. It includes a wide-channel bus technology called Memory Channel. Based on Peripheral Component Interconnect (PCI, a bus standard developed by Intel) and licensed from Encore Computer Corp. of Ft. Lauderdale, FL, Memory Channel enables clustered servers to process queries much faster than through SCSI I/O channels. Also part of TruCluster is the Oracle Parallel Server, which can distribute a database query across multiple nodes in the cluster. This activity is transparent to the user.

The hot standby solved the problem of availability, but Smith says with Memory Channel the company can make better use of both machines. He plans to put all system software on one machine and user information and applications on the other. Memory Channel will allow the two systems to work together as a single machine, increasing overall system performance. In the event of a failure of one server, all the processing tasks (both systems and applications) will be moved to the second server. In that case, performance will degrade a bit, but Smith isn't worried. "In the worst-case scenario, in the event of a failure, the performance of the system will revert to what it is now, and that's acceptable," he says.

TMI's requirement is 99 percent uptime, and Smith expects to beat that when the cluster system is installed. But the main advantage, in his view, is economic. "We're getting the most bang for the buck, because we can utilize both machines most of the time," he says.

Not all of the system's availability comes from the cluster itself. The fact is that, in a stand-alone mode, many of today's servers already have availability over 99 percent. To increase that percentage, you can add things like mirrored or duplexed drives, data striping, RAID disk arrays and hot-swap features. With a carefully designed configuration, a company can achieve about 99.9 percent on a stand-alone SMP machine. What clusters bring to the party is availability above that number.

To demand instead of 99.9 percent, say, 99.9999 percent availability may seem obsessive. But translated into actual downtime, that means moving from 500 minutes a year to only 30 seconds. Virtually all applications can stand 30 seconds of downtime, but many companies cannot accept having their sales or service applications down for an hour a month.

Down on Purpose

While clusters provide a slight decrease of unexpected downtime compared with stand-alone servers, their effect is more dramatic when it comes to scheduled downtime. Shutdowns for maintenance, which can total as much as 15 hours a month for a large operation, are becoming increasingly unacceptable. Clusters let you shift processing from a node on which you want to perform maintenance before powering off the machine.

"The only time we need to shut down is when we're upgrading our cluster software. Otherwise, moving processing to another node takes seconds," says Phil Zmigrodski, director of software development at Hygrade Furniture Distribution and Delivery Systems in South Kearney, NJ. Hygrade uses a two-node IBM RS/6000 HACMP cluster. Overall, Zmigrodski says the company achieves about 99.99 percent availability.

Hygrade purchased a HACMP cluster a year and a half ago to support its new video catalog system. Furniture retailers purchase or lease the video catalog, which is a PC combined with a satellite hookup to the RS/6000 model 570 at Hygrade's headquarters. Each day the 570 downloads an upgraded video catalog to each store. Customers shop through the catalog and, if they choose to make a purchase, enter name, address and product choice. The data is sent back to the RS/6000 570, which in turn sends it to the company's RS/6000 model 590. The 590 system handles fulfillment functions such as routing the order information to the appropriate shipping dock facility.

Because many of Hygrade's retailers rely on the video catalog and don't stock floor models of the furniture, high availability of the system is essential. Even a few minutes of downtime could result in lost sales. But purchasing a hot standby for each of the firm's RS/6000 servers could increase the price of the system by over $200,000, according to Zmigrodski.

The solution was to cluster the two servers. "The 570 and the 590 have very different jobs to do, but when we bought them, we kept in mind that either one had to be able to take over the job of the other," he says.

Hygrade hasn't had an emergency shutdown since purchasing the system, but it has moved processes from one machine to the other when upgrading software. Of course, these activities usually are performed at off-peak periods. But Zmigrodski says that when he does so, he sees no degradation of performance.

Flexible Options

Although clusters don't provide the 100 percent availability offered by fault-tolerant machines, they are more economical and flexible. That's because, when configuring your cluster, you can select where on the scalability/availability continuum your requirements lie. For example, to optimize for availability, you can have one machine in the cluster standing idle for each one that is running. You can move a bit closer to the scalability side of the continuum by having one machine idle for every two, three or four machines operating at full capacity. Or you can optimize for scalability by using all the machines at full tilt. If a member of the cluster goes down, performance on the other machines, which now have an added burden, will degrade. But for some companies, that's an acceptable trade-off.

And there are ways to configure a system to minimize degradation in performance. For example, you can configure the cluster to shut down all batch operations in the event of a failure. The multitude of options is what makes clusters so flexible.

The emphasis on high availability doesn't minimize the need for scalability, a key item on nearly everyone's IT features list these days. Data, applications and the number of users who access them are growing constantly, and computer systems have to stay ahead of this curve.

The ultimate in scalability comes not only from clusters but combining them with SMP or MPP technology. If you want to replicate the processing power and availability of a mainframe operation with a Unix system, MPP, with its hundreds of microprocessors, may be the only option. Until recently, clustering SMP was the norm, and doing so with MPP was virtually unheard of. Last March, Pyramid Technology announced the Reliant RM1000 Cluster Server, which lets users integrate the MPP-based Reliant1000 Parallel Server, which can run up to 300 nodes, with up to 16 Nile or RM600 SMP servers (a total possible 256 processors) in a single system running Pyramid's Reliant Unix variant.

This is an example of what Kernochan of Aberdeen Group calls "fusion technology," which he defines as the tight integration of SMP and MPP in a cluster. "It lets you solve the problem with the right technology, whatever that might be," he says. For example, if you need more processing power, the cheapest way to get it is to add another processor to the SMP machine. But that could result in bottlenecks, so the next option might be to add a second SMP machine and create a cluster. Of if you want to move a decision support system or online transaction processing (OLTP) onto the cluster and need a tremendous amount of power, add an MPP machine. "This makes it virtually impossible to hit a computational limit for commercial applications," says Kernochan.

Not for All Uses

Despite the advantages enumerated above, cluster technology has limitations. In particular, when compared to SMP or MPP in terms of speed, "Clusters always lose," says Jonathan Eunice, an analyst at Illuminata, Inc., a consulting firm in Nashua, NH.

According to Eunice, the greatest benefit of clustering will come when it can be put to use in parallel processing, where multiple queries or tasks are handled simultaneously because they're assigned to different processors. Parallel processing is especially important for OLTP and other realtime applications in industries such as banking and insurance. In order to be successful, he estimates, throughput needs to be at least in the range of 200 to 300 megabytes per second (mbps). In the past, when 20mbps to 40mbps over SCSI storage interconnect was the norm, those applications ran poorly on clusters, he says. DEC's TruCluster with Memory Channel, announced in April, has a bandwidth of 100mbps. Eunice calls this a significant improvement, though he hasn't been able to evaluate its effect on these types of applications.

Another roadblock when you're ready to move your application from a single machine to a cluster is whether you'll have to reprogram your applications to be cluster-ready. Programming an application that runs on a single SMP or MPP machine to run on a cluster may take little or no work, or it may require extensive rewriting. In general, moving from a single SMP to a cluster that includes MPP does require reprogramming. However, if you're running a parallel server on an SMP machine, moving to a clustered solution may be easy, because the database application runs on the cluster just as it would on a single SMP machine.

According to Patrick Smyth, director of marketing for DEC's Unix business segment in Maynard, MA, clustering is a good fit for an application such as SAP America's R/3 suite, where the configuration is a huge database server surrounded by many application servers. "You can manage the cluster as one database environment," Smyth says.

While in the near term Digital has raised the prospects of Unix servers by creating fast clusters, at the same time it's romancing users of Microsoft Windows NT. Digital will use the memory interconnect with the clustering software on its Intel processor-based Prioris servers running NT. However, skepticism is rampant regarding NT's scalability above four processors.

Still, running either operating system, clustering is a technology that most analysts say will continue to make gains. "In five years every serious server will be clustered," predicts Eunice. He asserts that even if the circa 1999 stand-alone servers are powerful enough to handle all business applications, customers will want clusters for high availability. "It's a simple matter of not wanting to put all your processing eggs in one basket," he says.

If your Unix machines and client/server systems are taking on larger and more important applications that used to be run by mainframes, you may need a range of options. You'll have to find some way--preferably more than one--to replicate the mainframe's large number of millions of processes per second as well as high availability or fault tolerance. Clustering SMP machines allows you to boost the performance of an already powerful server while at the same time providing high availability. Adding to the cluster an MPP machine, while expensive and sometimes difficult to program, will allow your company to offload virtually any mission-critical system to a Unix base.

Larry Stevens writes about business and technology from Monson, MA. He can be reached at 71412.631@compuserve.com.