Knowledge Distillation

Scaling smarter: How Knowledge Distillation powers Large Language Models?