Learn/Multiple Mentions What is quantisation doing everywhere? Quantisation compresses neural network weights from 32-bit floats down to 8-bit, 4-bit, or even 1-bit integers, dramatically reducing model size and memory usage. It's everywhere because it makes powerful models practical on consumer hardware. Modern quantisation techniques maintain most of the original performance while shrinking models by 4-8x. Today's issue shows quantisation enabling 1-bit Bonsai models running in browsers at just 290MB, and appearing in projects like oMLX for Apple Silicon inference. For devs, quantisation is the difference between needing expensive GPUs versus running models on laptops and mobile devices. CompressionEdge-deployment |