Computer cluster

clusterclusteringclusterscluster computingclusteredcomputer clusterscomputing clusterCluster management softwarecompute clusterscomputing clusters
A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system.wikipedia
545 Related Articles

Grid computing

gridgridscomputing grid
Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.
Grid computing is distinguished from conventional high-performance computing systems such as cluster computing in that grid computers have each node set to perform a different task/application.

Open Source Cluster Application Resources

OSCAR
In most circumstances, all of the nodes use the same hardware and the same operating system, although in some setups (e.g. using Open Source Cluster Application Resources (OSCAR)), different operating systems can be used on each computer, or different hardware.
Open Source Cluster Application Resources (OSCAR) is a Linux-based software installation for high-performance cluster computing.

Distributed computing

distributeddistributed systemsdistributed system
Computer clusters emerged as a result of convergence of a number of computing trends including the availability of low-cost microprocessors, high-speed networks, and software for high-performance distributed computing.
Distributed programming typically falls into one of several basic architectures: client–server, three-tier, n-tier, or peer-to-peer; or categories: loose coupling, or tight coupling.

Linux

GNU/LinuxLinLinux operating system
The developers used Linux, the Parallel Virtual Machine toolkit and the Message Passing Interface library to achieve high performance at a relatively low cost. The Linux-HA project is one commonly used free software HA package for the Linux operating system.
Adoption of Linux in production environments, rather than being used only by hobbyists, started to take off first in the mid-1990s in the supercomputing community, where organizations such as NASA started to replace their increasingly expensive machines with clusters of inexpensive commodity computers running Linux.

Stone Soupercomputer

An early project that showed the viability of the concept was the 133-node Stone Soupercomputer.
A group of lab employees including William W. Hargrove and Forrest M. Hoffman applied for a grant to build a cluster in 1996, but it was rejected.

VMScluster

clusterHSC50VAX-Cluster
Clustering per se did not really take off until Digital Equipment Corporation released their VAXcluster product in 1984 for the VAX/VMS operating system (now named as OpenVMS).
A VMScluster is a computer cluster involving a group of computers running the OpenVMS operating system.

OpenVMS

VMSVAX/VMSOpenVMS AXP
Clustering per se did not really take off until Digital Equipment Corporation released their VAXcluster product in 1984 for the VAX/VMS operating system (now named as OpenVMS).
The system offers high availability through clustering and the ability to distribute the system over multiple physical machines.

Load balancing (computing)

load balancingload balancerload-balancing
"Load-balancing" clusters are configurations in which cluster-nodes share computational workload to provide better overall performance.
In computing, load balancing improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives.

High-availability cluster

Failover Clusteringhigh availabilityhigh-availability
In either case, the cluster may use a high-availability approach.
They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail.

Beowulf cluster

BeowulfBeowulf clustersBeowulf class cluster
A basic approach to building a cluster is that of a Beowulf cluster which may be built with a few personal computers to produce a cost-effective alternative to traditional high performance computing.
A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them.

Clustered file system

distributed file systemnetwork file systemshared disk file system
If you have a large number of computers clustered together, this lends itself to the use of distributed file systems and RAID, both of which can increase the reliability and speed of a cluster.
There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for each node).

Linux-HA

Heartbeat
The Linux-HA project is one commonly used free software HA package for the Linux operating system.
The Linux-HA (High-Availability Linux) project provides a high-availability (clustering) solution for Linux, FreeBSD, OpenBSD, Solaris and Mac OS X which promotes reliability, availability, and serviceability (RAS).

Tandem Computers

TandemHP NonStop (Tandem) serversTandem Himalaya
Two other noteworthy early commercial clusters were the Tandem Himalayan (a circa 1994 high-availability product) and the IBM S/390 Parallel Sysplex (also circa 1994, primarily for business use).
The first system was the Tandem/16 or T/16, later re-branded NonStop I. The machine consisted of between two and 16 cpus, organized as a fault-tolerant computer cluster packaged in a single rack.

DEGIMA

DEGIMA cluster
A special purpose 144-node DEGIMA cluster is tuned to running astrophysical N-body simulations using the Multiple-Walk parallel treecode, rather than general purpose scientific computations.
The DEGIMA (DEstination for Gpu Intensive MAchine) is a high performance computer cluster used for hierarchical N-body simulations at the Nagasaki Advanced Computing Center, Nagasaki University.

Computer

computerscomputer systemdigital computer
A computer cluster is a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system.
A "complete" computer including the hardware, the operating system (main software), and peripheral equipment required and used for "full" operation can be referred to as a computer system. This term may as well be used for a group of computers that are connected and work together, in particular a computer network or computer cluster.

Single point of failure

single points of failurecentral point of failurecauses the entire circuit to "open" or stop operating
HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure.
In a high-availability server cluster, each individual server may attain internal component redundancy by having multiple power supplies, hard drives, and other components.

PlayStation 3 cluster

build supercomputersCondor Clustersmall scale distributed computing
Some examples of game console clusters are Sony PlayStation clusters and Microsoft Xbox clusters.
The National Center for Supercomputing Applications had already built a cluster based on the PlayStation 2.

MapReduce

map reducemap-reducemap/reduce
This is an area of ongoing research; algorithms that combine and extend MapReduce and Hadoop have been proposed and studied.
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster.

Apache Hadoop

HadoopHDFSApache
This is an area of ongoing research; algorithms that combine and extend MapReduce and Hadoop have been proposed and studied.
Originally designed for computer clusters built from commodity hardware still the common useit has also found use on clusters of higher-end hardware.

Supercomputer

high-performance computingsupercomputinghigh performance computing
A basic approach to building a cluster is that of a Beowulf cluster which may be built with a few personal computers to produce a cost-effective alternative to traditional high performance computing. They have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest supercomputers in the world such as IBM's Sequoia.
In another approach, a large number of processors are used in proximity to each other, e.g. in a computer cluster.

OCFS2

OCFS2 1.6Oracle Cluster File System
Examples include the IBM General Parallel File System, Microsoft's Cluster Shared Volumes or the Oracle Cluster File System.
The first version of OCFS was developed with the main focus to accommodate Oracle's database management system that used cluster computing.

Fencing (computing)

fencingpersistent reservation
When a node in a cluster fails, strategies such as "fencing" may be employed to keep the rest of the system operational.
Fencing is the process of isolating a node of a computer cluster or protecting shared resources when a node appears to be malfunctioning.

Scalability

scalablescalescale-out
Fault tolerance (the ability for a system to continue working with a malfunctioning node) allows for scalability, and in high performance situations, low frequency of maintenance routines, resource consolidation(e.g. RAID), and centralized management.
To scale horizontally (or scale out/in) means to add more nodes to (or remove nodes from) a system, such as adding a new computer to a distributed software application. An example might involve scaling out from one Web server system to three. As computer prices have dropped and performance continues to increase, high-performance computing applications such as seismic analysis and biotechnology workloads have adopted low-cost "commodity" systems for tasks that once would have required supercomputers. System architects may configure hundreds of small computers in a cluster to obtain aggregate computing power that often exceeds that of computers based on a single traditional processor. The development of high-performance interconnects such as Gigabit Ethernet, InfiniBand and Myrinet further fueled this model. Such growth has led to demand for software that allows efficient management and maintenance of multiple nodes, as well as hardware such as shared data storage with much higher I/O performance. Size scalability is the maximum number of processors that a system can accommodate.

Scheduling (computing)

schedulingschedulerscheduling algorithm
When a large multi-user cluster needs to access very large amounts of data, task scheduling becomes a challenge.
Long-term scheduling is also important in large-scale systems such as batch processing systems, computer clusters, supercomputers, and render farms.

Digital Equipment Corporation

DECDigitalDigital Equipment Corporation (DEC)
Clustering per se did not really take off until Digital Equipment Corporation released their VAXcluster product in 1984 for the VAX/VMS operating system (now named as OpenVMS).
DEC also invented clustering, an operating system technology that treated multiple machines as one logical entity. Clustering permitted sharing of pooled disk and tape storage via the HSC50/70/90 and later series of Hierarchical Storage Controllers (HSC). The HSCs delivered the first hardware RAID 0 and RAID 1 capabilities and the first serial interconnects of multiple storage technologies. This technology was the forerunner to architectures such as Network of Workstations which are used for massively cooperative tasks such as web-searches and drug research.