Learn/Core Concept How does model distillation work? Model distillation trains a smaller student model to mimic a larger teacher model's outputs, transferring knowledge without copying weights. The student learns from the teacher's probability distributions rather than just final predictions, capturing nuanced decision-making patterns. It's how we get GPT-4 level performance in a model that runs locally on our laptop. QuantisationPruning |