Understanding Clusters

A cluster, plainly: multiple physical machines wired together with fast networking so they act as one larger pool of compute and memory for a single job, instead of several separate computers doing separate jobs.

Real datacenter clustering

NVIDIA's DGX and rack-scale systems (DGX H100, GB200 NVL72) connect GPUs with NVLink/NVSwitch inside a node and InfiniBand between nodes — purpose-built, extremely high-bandwidth links that let dozens or hundreds of GPUs share one addressable memory pool and stay synchronized with microsecond-scale latency. Software like NVIDIA's NCCL library coordinates the GPUs so a huge model can be split across all of them with only a small efficiency loss. These systems are engineered as a single product: matched cooling, matched power delivery, vendor-supported firmware.

NVIDIA press image

2x DGX H100 nodes over NVLink + InfiniBand (a real cluster)

GPU / systemNVIDIA DGX H100 (8x H100 node) × 2

Combined memory1,280 GB

Total price$700,000

⚡ 17,000 W sustained draw

🏠 = 14.17x an average home (~1,200W continuous)

🔋 = drains a 90 kWh EV battery in 5.29 hrs

A pile of desktop cards

Consumer GPUs (RTX 40- and 50-series) have no NVLink — NVIDIA removed it after the RTX 3090 — so multiple cards in one desktop talk over ordinary PCIe, routed through the CPU. That's dramatically slower for GPU-to-GPU traffic than NVLink, and there's no unified memory: each card's VRAM stays separate, so software has to manually shard the model and pay a communication penalty every step. A normal home electrical circuit (15-20A/120V, roughly 1,800-2,400W) becomes the hard ceiling long before you run out of PCIe slots.

NVIDIA press image

4x RTX 5090 in one desktop case (a pile of desktop cards)

GPU / systemNVIDIA GeForce RTX 5090 × 4

Combined memory128 GB

GPU cost$7,996

Host system (est.)$4,000

Total price$11,996

⚡ 2,550 W sustained draw

🏠 = 2.12x an average home (~1,200W continuous)

🔋 = drains a 90 kWh EV battery in 35.29 hrs

These 4 cards do NOT share memory. A model has to be manually split across them over ordinary PCIe, which is far slower than NVLink, and there's no vendor support for this as a unified system.

The honest difference: the DGX cluster above costs 58x more than the pile of desktop cards, but its 1,280GB behaves as one real, fast, vendor-supported memory pool. The desktop pile's 128GB is really four separate 32GB pools that a model must be hand-split across — it is not a substitute for a cluster, just several computers sharing a case.