Understanding Clusters
A cluster, plainly: multiple physical machines wired together with fast networking so they act as one larger pool of compute and memory for a single job, instead of several separate computers doing separate jobs.
Real datacenter clustering
NVIDIA's DGX and rack-scale systems (DGX H100, GB200 NVL72) connect GPUs with NVLink/NVSwitch inside a node and InfiniBand between nodes — purpose-built, extremely high-bandwidth links that let dozens or hundreds of GPUs share one addressable memory pool and stay synchronized with microsecond-scale latency. Software like NVIDIA's NCCL library coordinates the GPUs so a huge model can be split across all of them with only a small efficiency loss. These systems are engineered as a single product: matched cooling, matched power delivery, vendor-supported firmware.
NVIDIA press image
2x DGX H100 nodes over NVLink + InfiniBand (a real cluster)
A pile of desktop cards
Consumer GPUs (RTX 40- and 50-series) have no NVLink — NVIDIA removed it after the RTX 3090 — so multiple cards in one desktop talk over ordinary PCIe, routed through the CPU. That's dramatically slower for GPU-to-GPU traffic than NVLink, and there's no unified memory: each card's VRAM stays separate, so software has to manually shard the model and pay a communication penalty every step. A normal home electrical circuit (15-20A/120V, roughly 1,800-2,400W) becomes the hard ceiling long before you run out of PCIe slots.
NVIDIA press image