Understanding Clusters

A cluster, plainly: multiple physical machines wired together with fast networking so they act as one larger pool of compute and memory for a single job, instead of several separate computers doing separate jobs.

Real datacenter clustering

NVIDIA's DGX and rack-scale systems (DGX H100, GB200 NVL72) connect GPUs with NVLink/NVSwitch inside a node and InfiniBand between nodes — purpose-built, extremely high-bandwidth links that let dozens or hundreds of GPUs share one addressable memory pool and stay synchronized with microsecond-scale latency. Software like NVIDIA's NCCL library coordinates the GPUs so a huge model can be split across all of them with only a small efficiency loss. These systems are engineered as a single product: matched cooling, matched power delivery, vendor-supported firmware.

NVIDIA DGX H100 (8x H100 node) NVIDIA press image

2x DGX H100 nodes over NVLink + InfiniBand (a real cluster)

GPU / systemNVIDIA DGX H100 (8x H100 node) × 2
Combined memory1,280 GB
Total price$700,000
17,000 W sustained draw
🏠 = 14.17x an average home (~1,200W continuous)
🔋 = drains a 90 kWh EV battery in 5.29 hrs

A pile of desktop cards

Consumer GPUs (RTX 40- and 50-series) have no NVLink — NVIDIA removed it after the RTX 3090 — so multiple cards in one desktop talk over ordinary PCIe, routed through the CPU. That's dramatically slower for GPU-to-GPU traffic than NVLink, and there's no unified memory: each card's VRAM stays separate, so software has to manually shard the model and pay a communication penalty every step. A normal home electrical circuit (15-20A/120V, roughly 1,800-2,400W) becomes the hard ceiling long before you run out of PCIe slots.

NVIDIA GeForce RTX 5090 NVIDIA press image

4x RTX 5090 in one desktop case (a pile of desktop cards)

GPU / systemNVIDIA GeForce RTX 5090 × 4
Combined memory128 GB
GPU cost$7,996
Host system (est.)$4,000
Total price$11,996
2,550 W sustained draw
🏠 = 2.12x an average home (~1,200W continuous)
🔋 = drains a 90 kWh EV battery in 35.29 hrs
These 4 cards do NOT share memory. A model has to be manually split across them over ordinary PCIe, which is far slower than NVLink, and there's no vendor support for this as a unified system.
The honest difference: the DGX cluster above costs 58x more than the pile of desktop cards, but its 1,280GB behaves as one real, fast, vendor-supported memory pool. The desktop pile's 128GB is really four separate 32GB pools that a model must be hand-split across — it is not a substitute for a cluster, just several computers sharing a case.