Three models ranked on arena.ai's agent leaderboard,
filtered to fully open licenses (MIT / Apache 2.0), all 100B+ total parameters.
The memory math: minimum GPU memory (GB) = total parameters (in billions) × 1 GB, plus 20% working room
for context, attention cache, and overhead. A mixture-of-experts model only computes with a fraction of its
parameters per token, but every parameter still has to sit in memory — so the total parameter count is what determines
minimum hardware, not the "active" count.
Apache 2.0
Mixtral 8x22B — Mistral AI
Mixture-of-experts: only 39B of the 141B total parameters activate per token, but all 141B must still be resident in memory.
Total parameters141B
Every parameter is a number the model stores and must load into GPU memory — more parameters generally means more capable, but also more hardware to run it.
Active parameters / token39B
How many parameters actually do math for each word generated. Lower than the total because this is a "mixture of experts" model — but it does NOT reduce the memory needed.
LicenseApache 2.0
Fully permissive — commercial use, modification, and redistribution with no restrictions.
Datacenter-grade alternative: 3x H100 SXM (datacenter-grade alternative) —
240GB, $94,000.
More expensive and still needs a server chassis with NVLink support — shown for comparison only.
Apache 2.0
Qwen3-235B-A22B — Alibaba
Ranked on arena.ai's open-weight leaderboard; mixture-of-experts with 22B active parameters per token.
Total parameters235B
Every parameter is a number the model stores and must load into GPU memory — more parameters generally means more capable, but also more hardware to run it.
Active parameters / token22B
How many parameters actually do math for each word generated. Lower than the total because this is a "mixture of experts" model — but it does NOT reduce the memory needed.
LicenseApache 2.0
Fully permissive — commercial use, modification, and redistribution with no restrictions.
Datacenter-grade alternative: 1x DGX H100 node (datacenter-grade alternative) —
640GB, $350,000.
Far more expensive, but adds huge concurrency headroom for serving many users at once.
MIT License
DeepSeek-V3 — DeepSeek AI
The DeepSeek V3/V4 family leads arena.ai's open-weight leaderboard as of mid-2026, trading blows with closed frontier models.
Total parameters671B
Every parameter is a number the model stores and must load into GPU memory — more parameters generally means more capable, but also more hardware to run it.
Active parameters / token37B
How many parameters actually do math for each word generated. Lower than the total because this is a "mixture of experts" model — but it does NOT reduce the memory needed.
LicenseMIT License
Fully permissive — one of the most capable open-weight models under the least restrictive common license.
Math shown: 671B × 1 GB = 671 GB, + 20% working room = 805.2 GB.
NVIDIA press image
2x DGX H100 nodes, networked as a cluster
GPU / systemNVIDIA DGX H100 (8x H100 node) × 2
Combined memory1,280 GB
Total price$700,000
⚡ 17,000 W sustained draw
🏠 = 14.17x an average home (~1,200W continuous)
🔋 = drains a 90 kWh EV battery in 5.29 hrs
No single GPU or node available on the market today holds 805GB alone — this genuinely requires multiple machines working together. See the Clusters page.
Datacenter-grade alternative: 11x H100 SXM (raw GPU count, for comparison) —
880GB, $334,000.
11 GPUs don't fit in one 8-GPU DGX node, so this still means 2+ physical machines in practice.