How Distillation Makes AI Models Smaller and Cheaper

# How Distillation Makes AI Models Smaller and Cheaper ![rw-book-cover](https://www.quantamagazine.org/wp-content/themes/quanta2024/frontend/images/favicon.png) ## Metadata - Author: [[Amos Zeeberg]] - Full Title: How Distillation Makes AI Models Smaller and Cheaper - Category: #articles - Summary: Distillation is a method that uses a large AI model to train a smaller, cheaper one without losing much accuracy. It helps companies run powerful AI tools more efficiently by passing "soft" knowledge from a big "teacher" model to a smaller "student" model. This technique is widely used and continues to improve AI by making it faster and more affordable. - URL: https://www.quantamagazine.org/how-distillation-makes-ai-models-smaller-and-cheaper-20250718/ ## Highlights - In January, the NovaSky lab at the University of California, Berkeley, [showed that distillation works well for training chain-of-thought reasoning models](https://novasky-ai.github.io/posts/sky-t1/), which use multistep “thinking” to better answer complicated questions. The lab says its fully open-source Sky-T1 model cost less than $450 to train, and it achieved similar results to a much larger open-source model. ([View Highlight](https://read.readwise.io/read/01k0f22rbd631yzkbwrwbjd3qy))