Kimi K2.5 on RTX 3060 with Optane Memory: AI Breakthrough
Summary
A trillion-parameter AI model has run on a single mid-range graphics card. A Chinese AI enthusiast showed Moonshot AI's Kimi K2.5 model, with 1 trillion total parameters, running on an Nvidia RTX 3060 GPU. This was paired with 768 gigabytes of Intel Optane Persistent Memory. What's interesting is the setup achieved roughly four tokens per second. This is slow by production standards, but remarkable for the hardware used. Kimi K2.5 doesn't use all parameters at once; only 32 billion are active for each token generated. The full Kimi K2.5 model is enormous, weighing approximately 630 gigabytes. Quantized versions are still around 381 gigabytes. This is why 768 gigabytes of Intel Optane Persistent Memory were needed. Optane PMem DIMMs are slower than traditional DRAM but much cheaper per gigabyte. The RTX 3060 launched in early 2021 and was designed for gaming, not running advanced AI models. High-performance inference for Kimi K2.5 typically uses up to 8 high-end GPUs, delivering much faster speeds. Kimi K2.5 is an open-weight model, meaning anyone can download and run it. This demonstration shows what's possible with unconventional hardware setups.
This is an AI-generated audio summary. Always check the original source for complete reporting.