Computing
Cluster Computers
The invention of the mass reproducible microprocessor by Ted Hoff of Intel some twenty years ago paved the way towards a 'network-centric' computing. Relegating mainframe or 'host-centric' computing to the background, it ushered in the era of a distributed 'client-server' approach.
The present generation of RSIC microprocessors scores over mainframes in terms of costs and performances. Collection of RSIC microprocessors assembled together as a parallel computer can even outperform the vector supercomputers.
Clusters of high performance workstations can be realistically used for a variety of applications either to replace mainframes, vector supercomputers and parallel computers or to better manage already installed collection of workstations. True, cluster computers have limitations, yet the substantial benefit that can be derived is attracting many institutions and companies towards exploring this option.
Software to maintain such clusters is still at an embroynic stage of development. However cluster computing is a rapidly maturing technology that is certain to play a dominant role in the network-centric computing future. Use of clusters of workstations to increase the throughput of user application is becoming popular in the US and Europe.
Six current CMS ( Cluster Management Software ) package, two public domain and four commercial, have been identified as being worth serious investigation.
If finances permit, it is wise to choose one of the commercial packages. This will minimize the load of the on-site staff and leave the responsibility on vendor to ensure that their s/w is installed, used and supported properly.
Nearly all the CMS packages are designed to run on UNIX workstations and MPP ( Massively Parallel Processor ) systems. Some of the public domain package support Linux, which runs on PCs. Codeine, a commercial package, supports Linux. JP1[1] from Hitachi Ltd. is designed to run on Windows NT platforms. No CMS package supports Windows-95.
WWW software and HTTP protocols can be used as part of an integrated CMS package. Most CMS package work completely outside the kernel, and on top of a machine existing operating system. This means its installation does not require modification of the kernel, and so basically the CMS package is installed like any other software on the machine.
Cluster Computing Software and the means of managing and scheduling applications to run on these systems are becoming increasingly commonplace. CMS packages have come about for a number of reasons; including load balancing, utilizing spare CPU cycles, providing fault tolerant systems, managed access to powerful systems and so on. But overall the main reason for their existence is their ability to provide a increased and reliable throughput of user application on the system they manage.
Assessing the features and functionality of a CMS package against a set of criteria is heavily based upon that devised by Kaplan and Nelson at NASA.
A job description file which includes the job's name, maximum run time and the desired platform is sent by the client software resident on the user's workstation to a master scheduler whose main task is to evenly balance the load on the resources that it is managing. So, when a new job is submitted it not only has to match the requested resources with those that are available, but also needs to ensure that the resources being used are load balanced.
Another responsibility of the master scheduler is ensuring jobs that will complete successfully. It does this by monitoring jobs until they successfully finish. However if a job fails, due to problem other than an application runtime error, it will reschedule the jobs to run again.
Two useful features of a CMS package are process migration and check pointing, for ensuring load balancing and completion of a job properly.
With the advent of relatively cheap inter-processor communication (Giganet, Gigabyte Ethernet, Myrinet etc.) complete parallel HPC systems (16 - 256 processors) are relatively inexpensive to purchase and maintain. But even so, personnel, environment (power, cooling), software, and system support must still be factored into the costs of a production system. The latter expenses do not follow the commodity component price curves.
04-Jan-2000
More by :
Subhajit Ghosh
Top | Computing